Automating Adobe Acrobat Pro with python
Posted November 23, 2013 at 10:34 AM | categories: pdf, automation | tags:
Table of Contents
I have a need to automate Adobe Pro for a couple of applications:
- I could use Adobe Pro to automatically add rubric pages to assignments before grading them. The rubric has embedded javascript that stores the grade inside the pdf file.
- I could use Adobe Pro to extract information, e.g. grades, stored in a set of PDF files for analysis.
I came across this script to automate Adobe Pro using python and OLE automation. Two other useful references are:
- http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/iac_api_reference.pdf
- http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/iac_developer_guide.pdf
In this post, we look at some simple code to get data out of a pdf. We start with just opening a PDF file.
import os from win32com.client.dynamic import Dispatch src = os.path.abspath('writing-exams-in-orgmode.pdf') app = Dispatch("AcroExch.AVDoc") app.Open(src, src) app.Close(-1) # do not save on close
Opening and closing a file is not that useful. Here, we can get some information out of the file. The pdf we looked at above has a custom property PTEX.Fullbanner
from pdflatex. We can extract it like this.
import os from win32com.client.dynamic import Dispatch src = os.path.abspath('writing-exams-in-orgmode.pdf') app = Dispatch("AcroExch.AVDoc") app.Open(src, src) pddoc = app.GetPDDoc() print pddoc.GetInfo('PTEX.Fullbanner') print pddoc.GetNumPages() app.Close(-1) # do not save on close
This is MiKTeX-pdfTeX 2.9.4535 (1.40.13) 5
Finally, let us try inserting pages. I have a rubric file that I want to insert at the end of the
writing-exams-in-orgmode.pdfabove. We will open both documents, insert the rubric, and save the result as a new file.
import os from win32com.client.dynamic import Dispatch src = os.path.abspath('../../CMU/classes/06-625/rubric/rubric.pdf') src2 = os.path.abspath('writing-exams-in-orgmode.pdf') # It seems I need two of these avdoc1 = Dispatch("AcroExch.AVDoc") avdoc2 = Dispatch("AcroExch.AVDoc") # this is the rubric avdoc1.Open(src, src) pddoc1 = avdoc1.GetPDDoc() N1 = pddoc1.GetNumPages() # this is the other doc avdoc2.Open(src2, src2) pddoc2 = avdoc2.GetPDDoc() N2 = pddoc2.GetNumPages() # Insert rubric after last page of the other doc. pages start at 0 pddoc2.InsertPages(N2 - 1, pddoc1, 0, N1, 0) # save as a new file. 1 means full save at absolute path provided. pddoc2.Save(1, os.path.abspath('./woohoo.pdf')) # close files. avdoc1.Close(-1) avdoc2.Close(-1)
Here is our result: woohoo.pdf . I went ahead and gave myself an A ;).
1 Summary
It looks like I can replace the dependence of my box-course code on all the python-based pdf libraries (which are not fully functional, and do not work on all pdfs), and on pdftk, with this automation approach of Adobe Pro. It is unfortunate that it is not a free program, but i would expect it to work on all PDF files, and it provides features like combining PDFs with their javascript, that no other PDF package has. I have tried other PDF programs to combine the rubric and assignment page, but they all lose the javascript. With this method, I could keep a set of enriched rubric files for different types of assignments, and add them to assignments as part of the assessment process.
Copyright (C) 2013 by John Kitchin. See the License for information about copying.