Table of Contents
I have a need to automate Adobe Pro for a couple of applications:
- I could use Adobe Pro to extract information, e.g. grades, stored in a set of PDF files for analysis.
I came across this script to automate Adobe Pro using python and OLE automation. Two other useful references are:
In this post, we look at some simple code to get data out of a pdf. We start with just opening a PDF file.
import os from win32com.client.dynamic import Dispatch src = os.path.abspath('writing-exams-in-orgmode.pdf') app = Dispatch("AcroExch.AVDoc") app.Open(src, src) app.Close(-1) # do not save on close
Opening and closing a file is not that useful. Here, we can get some information out of the file. The pdf we looked at above has a custom property
PTEX.Fullbanner from pdflatex. We can extract it like this.
import os from win32com.client.dynamic import Dispatch src = os.path.abspath('writing-exams-in-orgmode.pdf') app = Dispatch("AcroExch.AVDoc") app.Open(src, src) pddoc = app.GetPDDoc() print pddoc.GetInfo('PTEX.Fullbanner') print pddoc.GetNumPages() app.Close(-1) # do not save on close
This is MiKTeX-pdfTeX 2.9.4535 (1.40.13) 5
Finally, let us try inserting pages. I have a rubric file that I want to insert at the end of the
writing-exams-in-orgmode.pdfabove. We will open both documents, insert the rubric, and save the result as a new file.
import os from win32com.client.dynamic import Dispatch src = os.path.abspath('../../CMU/classes/06-625/rubric/rubric.pdf') src2 = os.path.abspath('writing-exams-in-orgmode.pdf') # It seems I need two of these avdoc1 = Dispatch("AcroExch.AVDoc") avdoc2 = Dispatch("AcroExch.AVDoc") # this is the rubric avdoc1.Open(src, src) pddoc1 = avdoc1.GetPDDoc() N1 = pddoc1.GetNumPages() # this is the other doc avdoc2.Open(src2, src2) pddoc2 = avdoc2.GetPDDoc() N2 = pddoc2.GetNumPages() # Insert rubric after last page of the other doc. pages start at 0 pddoc2.InsertPages(N2 - 1, pddoc1, 0, N1, 0) # save as a new file. 1 means full save at absolute path provided. pddoc2.Save(1, os.path.abspath('./woohoo.pdf')) # close files. avdoc1.Close(-1) avdoc2.Close(-1)
Here is our result: woohoo.pdf . I went ahead and gave myself an A ;).
Copyright (C) 2013 by John Kitchin. See the License for information about copying.