<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <atom:link href="http://kitchingroup.cheme.cmu.edu/blog/feed/index.xml" rel="self" type="application/rss+xml" />
    <title>The Kitchin Research Group</title>
    <link>https://kitchingroup.cheme.cmu.edu/blog</link>
    <description>Chemical Engineering at Carnegie Mellon University</description>
    <pubDate>Sat, 01 Nov 2025 13:47:46 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    
    <item>
      <title>Automating Adobe Acrobat Pro with python</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/11/23/Automating-Adobe-Acrobat-Pro-with-python</link>
      <pubDate>Sat, 23 Nov 2013 10:34:47 EST</pubDate>
      <category><![CDATA[pdf]]></category>
      <category><![CDATA[automation]]></category>
      <guid isPermaLink="false">Fj9H1bNGBBznCSDxvIDlwY2PcgE=</guid>
      <description>Automating Adobe Acrobat Pro with python</description>
      <content:encoded><![CDATA[


&lt;div id="table-of-contents"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;div id="text-table-of-contents"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec-1"&gt;1. Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;
I have a need to automate Adobe Pro for a couple of applications:
&lt;/p&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;I could use Adobe Pro to automatically add rubric pages to assignments before grading them. The rubric has embedded javascript that stores the grade inside the pdf file.
&lt;/li&gt;
&lt;li&gt;I could use Adobe Pro to extract information, e.g. grades, stored in a set of PDF files for analysis.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
I came across this &lt;a href="http://win32com.goermezer.de/content/view/232/288/"&gt;script&lt;/a&gt; to automate Adobe Pro using python and OLE automation.  Two other useful references are:
&lt;/p&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;&lt;a href="http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/iac_api_reference.pdf"&gt;http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/iac_api_reference.pdf&lt;/a&gt; 
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/iac_developer_guide.pdf"&gt;http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/iac_developer_guide.pdf&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
In this post, we look at some simple code to get data out of a pdf.  We start with just opening a PDF file.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; os
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; win32com.client.dynamic &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; Dispatch
&lt;span style="color: #8b008b;"&gt;src&lt;/span&gt; = os.path.abspath(&lt;span style="color: #228b22;"&gt;'writing-exams-in-orgmode.pdf'&lt;/span&gt;)

&lt;span style="color: #8b008b;"&gt;app&lt;/span&gt; = Dispatch(&lt;span style="color: #228b22;"&gt;"AcroExch.AVDoc"&lt;/span&gt;)

app.Open(src, src)

app.Close(-1)  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;do not save on close&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Opening and closing a file is not that useful.  Here, we can get some information out of the file. The pdf we looked at above has a custom property &lt;code&gt;PTEX.Fullbanner&lt;/code&gt; from pdflatex. We can extract it like this.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; os
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; win32com.client.dynamic &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; Dispatch
&lt;span style="color: #8b008b;"&gt;src&lt;/span&gt; = os.path.abspath(&lt;span style="color: #228b22;"&gt;'writing-exams-in-orgmode.pdf'&lt;/span&gt;)

&lt;span style="color: #8b008b;"&gt;app&lt;/span&gt; = Dispatch(&lt;span style="color: #228b22;"&gt;"AcroExch.AVDoc"&lt;/span&gt;)

app.Open(src, src)
&lt;span style="color: #8b008b;"&gt;pddoc&lt;/span&gt; = app.GetPDDoc()
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; pddoc.GetInfo(&lt;span style="color: #228b22;"&gt;'PTEX.Fullbanner'&lt;/span&gt;)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; pddoc.GetNumPages()
app.Close(-1)  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;do not save on close&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
This is MiKTeX-pdfTeX 2.9.4535 (1.40.13)
5
&lt;/pre&gt;


&lt;p&gt;
Finally, let us try inserting pages. I have a &lt;a href="/media/2013-11-23-Automating-Adobe-Acrobat-Pro-with-python/rubric.pdf"&gt;rubric file &lt;/a&gt; that I want to insert at the end of the &lt;pre&gt;writing-exams-in-orgmode.pdf&lt;/pre&gt; above. We will open both documents, insert the rubric, and save the result as a new file.
&lt;/p&gt;


&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; os
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; win32com.client.dynamic &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; Dispatch
&lt;span style="color: #8b008b;"&gt;src&lt;/span&gt; = os.path.abspath(&lt;span style="color: #228b22;"&gt;'../../CMU/classes/06-625/rubric/rubric.pdf'&lt;/span&gt;)
&lt;span style="color: #8b008b;"&gt;src2&lt;/span&gt; = os.path.abspath(&lt;span style="color: #228b22;"&gt;'writing-exams-in-orgmode.pdf'&lt;/span&gt;)

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;It seems I need two of these&lt;/span&gt;
&lt;span style="color: #8b008b;"&gt;avdoc1&lt;/span&gt; = Dispatch(&lt;span style="color: #228b22;"&gt;"AcroExch.AVDoc"&lt;/span&gt;)
&lt;span style="color: #8b008b;"&gt;avdoc2&lt;/span&gt; = Dispatch(&lt;span style="color: #228b22;"&gt;"AcroExch.AVDoc"&lt;/span&gt;)

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;this is the rubric&lt;/span&gt;
avdoc1.Open(src, src)
&lt;span style="color: #8b008b;"&gt;pddoc1&lt;/span&gt; = avdoc1.GetPDDoc()
&lt;span style="color: #8b008b;"&gt;N1&lt;/span&gt; = pddoc1.GetNumPages()

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;this is the other doc&lt;/span&gt;
avdoc2.Open(src2, src2)
&lt;span style="color: #8b008b;"&gt;pddoc2&lt;/span&gt; = avdoc2.GetPDDoc()
&lt;span style="color: #8b008b;"&gt;N2&lt;/span&gt; = pddoc2.GetNumPages()

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Insert rubric after last page of the other doc. pages start at 0&lt;/span&gt;
pddoc2.InsertPages(N2 - 1, pddoc1, 0, N1, 0)

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;save as a new file. 1 means full save at absolute path provided.&lt;/span&gt;
pddoc2.Save(1, os.path.abspath(&lt;span style="color: #228b22;"&gt;'./woohoo.pdf'&lt;/span&gt;))

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;close files.&lt;/span&gt;
avdoc1.Close(-1)
avdoc2.Close(-1)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Here is our result: &lt;a href="/media/2013-11-23-Automating-Adobe-Acrobat-Pro-with-python/woohoo.pdf"&gt;woohoo.pdf&lt;/a&gt; . I went ahead and gave myself an A ;). 
&lt;/p&gt;

&lt;div id="outline-container-sec-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; Summary&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
It looks like I can replace the dependence of my box-course code on all the python-based pdf libraries (which are not fully functional, and do not work on all pdfs), and on pdftk, with this automation approach of Adobe Pro. It is unfortunate that it is not a free program, but i would expect it to work on all PDF files, and it provides features like combining PDFs with their javascript, that &lt;i&gt;no&lt;/i&gt; other PDF package has. I have tried other PDF programs to combine the rubric and assignment page, but they all lose the javascript. With this method, I could keep a set of enriched rubric files for different types of assignments, and add them to assignments as part of the assessment process. 
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/11/23/Automating-Adobe-Acrobat-Pro-with-python.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
  </channel>
</rss>
