Automating Adobe Acrobat Pro with python

| categories: pdf, automation | tags:

Table of Contents

I have a need to automate Adobe Pro for a couple of applications:

  1. I could use Adobe Pro to automatically add rubric pages to assignments before grading them. The rubric has embedded javascript that stores the grade inside the pdf file.
  2. I could use Adobe Pro to extract information, e.g. grades, stored in a set of PDF files for analysis.

I came across this script to automate Adobe Pro using python and OLE automation. Two other useful references are:

  1. http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/iac_api_reference.pdf
  2. http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/iac_developer_guide.pdf

In this post, we look at some simple code to get data out of a pdf. We start with just opening a PDF file.

import os
from win32com.client.dynamic import Dispatch
src = os.path.abspath('writing-exams-in-orgmode.pdf')

app = Dispatch("AcroExch.AVDoc")

app.Open(src, src)

app.Close(-1)  # do not save on close

Opening and closing a file is not that useful. Here, we can get some information out of the file. The pdf we looked at above has a custom property PTEX.Fullbanner from pdflatex. We can extract it like this.

import os
from win32com.client.dynamic import Dispatch
src = os.path.abspath('writing-exams-in-orgmode.pdf')

app = Dispatch("AcroExch.AVDoc")

app.Open(src, src)
pddoc = app.GetPDDoc()
print pddoc.GetInfo('PTEX.Fullbanner')

print pddoc.GetNumPages()
app.Close(-1)  # do not save on close
This is MiKTeX-pdfTeX 2.9.4535 (1.40.13)
5

Finally, let us try inserting pages. I have a rubric file that I want to insert at the end of the

writing-exams-in-orgmode.pdf
above. We will open both documents, insert the rubric, and save the result as a new file.

import os
from win32com.client.dynamic import Dispatch
src = os.path.abspath('../../CMU/classes/06-625/rubric/rubric.pdf')
src2 = os.path.abspath('writing-exams-in-orgmode.pdf')

# It seems I need two of these
avdoc1 = Dispatch("AcroExch.AVDoc")
avdoc2 = Dispatch("AcroExch.AVDoc")

# this is the rubric
avdoc1.Open(src, src)
pddoc1 = avdoc1.GetPDDoc()
N1 = pddoc1.GetNumPages()

# this is the other doc
avdoc2.Open(src2, src2)
pddoc2 = avdoc2.GetPDDoc()
N2 = pddoc2.GetNumPages()

# Insert rubric after last page of the other doc. pages start at 0
pddoc2.InsertPages(N2 - 1, pddoc1, 0, N1, 0)

# save as a new file. 1 means full save at absolute path provided.
pddoc2.Save(1, os.path.abspath('./woohoo.pdf'))

# close files.
avdoc1.Close(-1)
avdoc2.Close(-1)

Here is our result: woohoo.pdf . I went ahead and gave myself an A ;).

1 Summary

It looks like I can replace the dependence of my box-course code on all the python-based pdf libraries (which are not fully functional, and do not work on all pdfs), and on pdftk, with this automation approach of Adobe Pro. It is unfortunate that it is not a free program, but i would expect it to work on all PDF files, and it provides features like combining PDFs with their javascript, that no other PDF package has. I have tried other PDF programs to combine the rubric and assignment page, but they all lose the javascript. With this method, I could keep a set of enriched rubric files for different types of assignments, and add them to assignments as part of the assessment process.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter