Automatic downloading of a pdf from a journal site

Posted May 23, 2014 at 11:44 AM | categories: emacs, bibtex | tags:

Many bibliography software packages can automatically download a pdf for you. In this post, we explore how that can be done from emacs. The principle idea is that the pdf is obtained from a url, and that you can calculate the url by some method. Then you can download the file.

For example, consider this article in Phys. Rev. Lett. http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.99.016105 . There is a link to get the pdf for this article at http://journals.aps.org/prl/pdf/10.1103/PhysRevLett.99.016105 . It is not difficult to construct that url; you just replace /abstract/ with /pdf/.

The trick is how to get the first url. We have previously seen that we can construct a bibtex entry from a doi. In fact, we can use the doi to get the url above. If you visit https://doi.org/10.1103/PhysRevLett.99.016105 , you will be redirected to the url. It so happens that you can use code to get the redirected url. In emacs-lisp it is a little convoluted; you have to use url-retrieve, and provide a callback that sets the redirect. Here is an example. It appears you need to run this block twice to get the right variable setting. That seems like some kind of error in what I have set up, but I cannot figure out why.

(defvar *doi-utils-redirect*)

(defun callback (&optional status)
 (when status ;  is nil if there none
   (setq *doi-utils-redirect* (plist-get status :redirect))))

(url-retrieve
  "https://doi.org/10.1103/PhysRevLett.99.016105"
  'callback)

(print *doi-utils-redirect*)

"http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.99.016105"

From there, creating the pdf url is as simple as

(replace-regexp-in-string "prl/abstract" "prl/pdf" "http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.99.016105")

http://journals.aps.org/prl/pdf/10.1103/PhysRevLett.99.016105

And finally we download the file with

(url-copy-file "http://journals.aps.org/prl/pdf/10.1103/PhysRevLett.99.016105" "PhysRevLett.99.016105.pdf" nil)

So that is the gist of automating pdf downloads. You do these steps:

Get the DOI
Get the url that the DOI redirects to
Calculate the link to the pdf
Download the pdf

Each publisher does something a little bit different, so you have to work this out for each one. I have worked alot of them out at https://github.com/jkitchin/jmax/blob/master/user/doi-utils.el . That file is a work in progress, but it is a project I intend to use on a regular basis.

org-mode source

Org-mode version = 8.2.6

The Kitchin Research Group

Chemical Engineering at Carnegie Mellon University

Automatic downloading of a pdf from a journal site