Automatic downloading of a pdf from a journal site

| categories: emacs, bibtex | tags:

Many bibliography software packages can automatically download a pdf for you. In this post, we explore how that can be done from emacs. The principle idea is that the pdf is obtained from a url, and that you can calculate the url by some method. Then you can download the file.

For example, consider this article in Phys. Rev. Lett. http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.99.016105 . There is a link to get the pdf for this article at http://journals.aps.org/prl/pdf/10.1103/PhysRevLett.99.016105 . It is not difficult to construct that url; you just replace /abstract/ with /pdf/.

The trick is how to get the first url. We have previously seen that we can construct a bibtex entry from a doi. In fact, we can use the doi to get the url above. If you visit https://doi.org/10.1103/PhysRevLett.99.016105 , you will be redirected to the url. It so happens that you can use code to get the redirected url. In emacs-lisp it is a little convoluted; you have to use url-retrieve, and provide a callback that sets the redirect. Here is an example. It appears you need to run this block twice to get the right variable setting. That seems like some kind of error in what I have set up, but I cannot figure out why.

(defvar *doi-utils-redirect*)

(defun callback (&optional status)
 (when status ;  is nil if there none
   (setq *doi-utils-redirect* (plist-get status :redirect))))

(url-retrieve
  "https://doi.org/10.1103/PhysRevLett.99.016105"
  'callback)

(print *doi-utils-redirect*)
"http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.99.016105"

From there, creating the pdf url is as simple as

(replace-regexp-in-string "prl/abstract" "prl/pdf" "http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.99.016105")
http://journals.aps.org/prl/pdf/10.1103/PhysRevLett.99.016105

And finally we download the file with

(url-copy-file "http://journals.aps.org/prl/pdf/10.1103/PhysRevLett.99.016105" "PhysRevLett.99.016105.pdf" nil)
t

So that is the gist of automating pdf downloads. You do these steps:

  1. Get the DOI
  2. Get the url that the DOI redirects to
  3. Calculate the link to the pdf
  4. Download the pdf

Each publisher does something a little bit different, so you have to work this out for each one. I have worked alot of them out at https://github.com/jkitchin/jmax/blob/master/user/doi-utils.el . That file is a work in progress, but it is a project I intend to use on a regular basis.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter

Converting a doi to a bibtex entry

| categories: bibtex | tags:

Many citation management packages allow you to download a bibliography entry from a doi. I want to be able to do that in emacs. I found this page that shows it is possible to get metadata about a doi with an http request, and from that data, we can create a bibtex entry. So, here is the basic code for getting metadata about a doi. We specify that we want json code, and then use json.el to view the results.

We temporarily set a few url-* variables with affect the url-retrieve results. And we rely on url-http-end-of-headers which tells us the end of the headers that get returned, so we can use the remaining text as the data.

(require 'json)

(let ((url-request-method "GET")
      (url-mime-accept-string "application/citeproc+json")
      (json-object-type 'plist)
      (results))
  (setq results
	(with-current-buffer (url-retrieve-synchronously "https://doi.org/10.1016/S0022-0248(97)00279-0")
	  (json-read-from-string (buffer-substring url-http-end-of-headers (point-max))))))

(:volume 181 :indexed (:timestamp 1389218884442 :date-parts 2014 1 8) :publisher Elsevier BV :source CrossRef :URL https://doi.org/10.1016/S0022-0248(97) 00279-0 :ISSN [0022-0248] :DOI 10.1016/s0022-0248(97)00279-0 :type journal-article :title Effect of growth conditions on formation of TiO2-II thin films in atomic layer deposition process :issue 3 :deposited (:timestamp 1386028800000 :date-parts 2013 12 3) :page 259-264 :reference-count nil :container-title Journal of Crystal Growth :author [(:given Jaan :family Aarik) (:given Aleks :family Aidla) (:given Väino :family Sammelselg) (:given Teet :family Uustare)] :prefix http://id.crossref.org/prefix/10.1016 :score 1.0 :issued (:date-parts 1997 11) :subject [Condensed Matter Physics Inorganic Chemistry Materials Chemistry] :subtitle [])

That data is now sufficient for us to consider constructing a bibtex entry. For an article, a prototypical entry looks like:

@Article{,
  author = 	 {},
  title = 	 {},
  journal = 	 {},
  year = 	 {},
  OPTkey = 	 {},
  OPTvolume = 	 {},
  OPTnumber = 	 {},
  OPTpages = 	 {},
  OPTmonth = 	 {},
  OPTnote = 	 {},
  OPTannote = 	 {}
}

Let us create a function that takes a doi and constructs a bibtex entry. I do not use all the metadata, so I just store the json data in the annote field. Maybe I should use another field for that, but annote seems ok since I do not use if for anything. I am going to use a template expansion function I developed earlier to make the bibtex entry template easier to write and read. Here is the code.

(require 'json)

(defun expand-template (s)
  "expand a template containing %{} with the eval of its contents"
  (replace-regexp-in-string "%{\\([^}]+\\)}"
                            (lambda (arg)
                              (let ((sexp (substring arg 2 -1)))
                                (format "%s" (eval (read sexp))))) s))

(defun doi-to-bibtex-article (doi)
 "insert a bibtex entry for doi at point"
 (interactive "sDOI: ")
 (let ((url-request-method "GET")
       (url-mime-accept-string "application/citeproc+json")
       (json-object-type 'plist)
       type
       results
       author
       title
       journal
       year
       volume
       number
       pages
       month
       url json-data)

   (setq results
	 (with-current-buffer
	     (url-retrieve-synchronously
	      (concat "https://doi.org/" doi))
	 (json-read-from-string (buffer-substring url-http-end-of-headers (point-max))))
         type (plist-get results :type)
	 author (mapconcat (lambda (x) (concat (plist-get x :given) " " (plist-get x :family)))
		     (plist-get results :author) " and ")
	 title (plist-get results :title)
	 journal (plist-get results :container-title)
	 volume (plist-get results :volume)
	 issue (plist-get results :issue)
	 year (elt (elt (plist-get (plist-get results :issued) :date-parts) 0) 0)
	 month (elt (elt (plist-get (plist-get results :issued) :date-parts) 0) 1)
	 pages (plist-get results :page)
	 doi (plist-get results :DOI)
	 url (plist-get results :URL)
	 json-data (format "%s" results))

   (when (string= type "journal-article")

     (expand-template "@article{,
  author = 	 {%{author}},
  title = 	 {%{title}},
  journal = 	 {%{journal}},
  year = 	 {%{year}},
  volume = 	 {%{volume}},
  number = 	 {%{issue}},
  pages = 	 {%{pages}},
  doi =          {%{doi}},
  url =          {%{url}},
  month = 	 {%{month}},
  json = 	 {%{json-data}}
}"))))

(doi-to-bibtex-article "10.1016/s0022-0248(97)00279-0")
@article{,
  author = 	 {Jaan Aarik and Aleks Aidla and Väino Sammelselg and Teet Uustare},
  title = 	 {Effect of growth conditions on formation of TiO2-II thin films in atomic layer deposition process},
  journal = 	 {Journal of Crystal Growth},
  year = 	 {1997},
  volume = 	 {181},
  number = 	 {3},
  pages = 	 {259-264},
  doi =          {10.1016/s0022-0248(97)00279-0},
  url =          {https://doi.org/10.1016/s0022-0248(97)00279-0},
  month = 	 {11},
  json = 	 {(:volume 181 :indexed (:timestamp 1389218884442 :date-parts [[2014 1 8]]) :publisher Elsevier BV :source CrossRef :URL https://doi.org/10.1016/s0022-0248(97)00279-0 :ISSN [0022-0248] :DOI 10.1016/s0022-0248(97)00279-0 :type journal-article :title Effect of growth conditions on formation of TiO2-II thin films in atomic layer deposition process :issue 3 :deposited (:timestamp 1386028800000 :date-parts [[2013 12 3]]) :page 259-264 :reference-count nil :container-title Journal of Crystal Growth :author [(:given Jaan :family Aarik) (:given Aleks :family Aidla) (:given Väino :family Sammelselg) (:given Teet :family Uustare)] :prefix http://id.crossref.org/prefix/10.1016 :score 1.0 :issued (:date-parts [[1997 11]]) :subject [Condensed Matter Physics Inorganic Chemistry Materials Chemistry] :subtitle [])}
}

That looks excellent. Note there are some non-ascii characters in it, which would have to be fixed. Let us try it on an ASAP article.

(doi-to-bibtex-article "10.1021/ie403744u")
@article{,
  author = 	 {José A. Delgado and V. I. Águeda and M. A. Uguina and J. L. Sotelo and P. Brea and Carlos A. Grande},
  title = 	 { Adsorption and Diffusion of H 2 , CO, CH 4 , and CO 2 in BPL Activated Carbon and 13X Zeolite: Evaluation of Performance in Pressure Swing Adsorption Hydrogen Purification by Simulation },
  journal = 	 {Industrial & Engineering Chemistry Research},
  year = 	 {2014},
  volume = 	 {nil},
  number = 	 {nil},
  pages = 	 {140117091024005},
  doi =          {10.1021/ie403744u},
  url =          {https://doi.org/10.1021/ie403744u},
  month = 	 {1},
  json = 	 {(:indexed (:timestamp 1392935578089 :date-parts [[2014 2 20]]) :publisher American Chemical Society (ACS) :source CrossRef :URL https://doi.org/10.1021/ie403744u :ISSN [0888-5885 1520-5045] :DOI 10.1021/ie403744u :type journal-article :title  Adsorption and Diffusion of H 2 , CO, CH 4 , and CO 2 in BPL Activated Carbon and 13X Zeolite: Evaluation of Performance in Pressure Swing Adsorption Hydrogen Purification by Simulation  :deposited (:timestamp 1389916800000 :date-parts [[2014 1 17]]) :page 140117091024005 :reference-count nil :container-title Industrial & Engineering Chemistry Research :author [(:given José A. :family Delgado) (:given V. I. :family Águeda) (:given M. A. :family Uguina) (:given J. L. :family Sotelo) (:given P. :family Brea) (:given Carlos A. :family Grande)] :prefix http://id.crossref.org/prefix/10.1021 :score 1.0 :issued (:date-parts [[2014 1 17]]) :subject [Chemistry(all) Industrial and Manufacturing Engineering Chemical Engineering(all)] :subtitle [])}
}

You see that nil is put in for missing entries. That is probably ok. There is an & in the journal that needs to be cleaned up, but that is easily done with org-ref-clean-bibtex-entry. In summary, this looks like a very convenient way to get bibtex entries inside emacs. I should probably have the function insert that string to a buffer at point, but that is not difficult to do.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter

org-shift hooks for ordering citations

| categories: bibtex, org-mode | tags:

I wrote a function that sorts citations by year, but there might be a reason to order them some other way. Here we develop a method to use shift-arrow keys to do the ordering. We will need to write a function that gets the citations in a link, gets the key under point, and then swap with neighboring keys depending on the arrow pressed. It is trivial to get the key under point (org-ref-get-bibtex-key-under-cursor), and we saw before it is easy to get the keys in a link. Let us examine swapping elements of a list here. This is an old algorithm, we store the first value, replace it with the second value, and then set the second value.

(defun org-ref-swap-keys (i j keys)
 "swap the keys in a list with index i and j"
 (let ((tempi (nth i keys)))
   (setf (nth i keys) (nth j keys))
   (setf (nth j keys) tempi))
  keys)

(org-ref-swap-keys 2 3 '(1 2 3 4))
1 2 4 3

So, we need to get the keys in the link at point, the key at point, the index of the key at point, and then we can swap them, and reconstruct the link. Here is the function that does this, and that adds the hooks.

(defun org-ref-swap-citation-link (direction)
 "move citation at point in direction +1 is to the right, -1 to the left"
 (interactive)
 (let* ((object (org-element-context))	 
        (type (org-element-property :type object))
	(begin (org-element-property :begin object))
	(end (org-element-property :end object))
	(link-string (org-element-property :path object))
        (key (org-ref-get-bibtex-key-under-cursor))
	(keys (org-ref-split-and-strip-string link-string))
        (i (index key keys)) point) ;; defined in org-ref
   (if (> direction 0) ;; shift right
     (org-ref-swap-keys i (+ i 1) keys)
     (org-ref-swap-keys i (- i 1) keys))	
  (setq keys (mapconcat 'identity keys ","))
  ;; and replace the link with the sorted keys
  (cl--set-buffer-substring begin end (concat type ":" keys))
  ;; now go forward to key so we can move with the key
  (re-search-forward key) 
  (goto-char (match-beginning 0))))

(add-hook 'org-shiftright-hook (lambda () (org-ref-swap-citation-link 1)))
(add-hook 'org-shiftleft-hook (lambda () (org-ref-swap-citation-link -1)))
lambda nil (org-ref-swap-citation-link -1)

kanan-2008-in-situ,kanan-2009-cobal,lutterman-2009-self-healin,mcalpin-2010-epr-eviden,liu-2014-spect-studies!

That is it! Wow, not hard at all. Check out this video of the code in action: http://screencast.com/t/YmgA0fnZ1Ogl

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter

Sorting citation links by year

| categories: bibtex | tags:

When there are several citations grouped together, I like them sorted by year. For example, I do not like this liu-2014-spect-studies,mcalpin-2010-epr-eviden,kanan-2009-cobal,lutterman-2009-self-healin,kanan-2008-in-situ. I prefer kanan-2008-in-situ,kanan-2009-cobal,lutterman-2009-self-healin,mcalpin-2010-epr-eviden,liu-2014-spect-studies. It is just a preference, but it seems appropriate to cite things in chronological order.

It is actually a little tedious to sort this by hand though. Hence, today we examine some tools to automate the sorting. The idea is to make a function that will get the keys, sort them by year, and then replace the link with the sorted text.

Let us try some sorting. We will construct a set of cons cells with a year and key, sort that list by year, and then concatenate the keys. Here is an example of the sorting. The years will come as strings from the bibtex file.

(setq data '(("2014" . "key1") ("2012" . "key2")("2016" . "key3")))
(setq data 
	(cl-sort data (lambda (x y) (< (string-to-int (car x)) (string-to-int (car y))))))
(mapconcat (lambda (x) (cdr x)) data ",")
key2,key1,key3

That is easy enough. Now, a function to get the year, and then the function to sort a link.

(defun org-ref-get-citation-year (key)
  "get the year of an entry with key"
  (interactive)
  (let* ((results (org-ref-get-bibtex-key-and-file key))
	 (bibfile (cdr results))
	 (cb (current-buffer)))
    (message "---------%s %s" key bibfile)
    (set-buffer (find-file-noselect bibfile))
    (bibtex-search-entry key nil 0)
    (prog1 (reftex-get-bib-field "year" (bibtex-parse-entry t))
      (set-buffer cb))))

(defun org-ref-sort-citation-link ()
 "replace link at point with sorted link by year"
 (interactive)
 (let* ((object (org-element-context))	 
        (type (org-element-property :type object))
	(begin (org-element-property :begin object))
	(end (org-element-property :end object))
	(link-string (org-element-property :path object))
	keys years data)
  (setq keys (org-ref-split-and-strip-string link-string))
  (setq years (mapcar 'org-ref-get-citation-year keys)) 
  (setq data (mapcar* (lambda (a b) `(,a . ,b)) years keys))
  (setq data (cl-sort data (lambda (x y) (< (string-to-int (car x)) (string-to-int (car y))))))
  ;; now get the keys separated by commas
  (setq keys (mapconcat (lambda (x) (cdr x)) data ","))
  ;; and replace the link with the sorted keys
  (cl--set-buffer-substring begin end (concat type ":" keys))
))

Now, you put your cursor on a link, run M-x org-ref-sort-citation-link, and the magic happens kanan-2008-in-situ,kanan-2009-cobal,lutterman-2009-self-healin,mcalpin-2010-epr-eviden,liu-2014-spect-studies! It would also be nice to have some arrow commands so you could do something like manually reorder them with S-arrow or something like in the calendar, but that will be another day.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter

Creating bibliographies in other formats with org-ref

| categories: bibtex, org-mode | tags:

org-ref automatically generates bibliographies in LaTeX export, and it does a reasonable job automatically generating HTML bibliographies (ox-bibtex probably does this better, but it relies on an external program, whereas this approach is all elisp). Here we illustrate how to generate other formats, e.g. plain text, or org-mode formatted.

org-ref provides a convenient function that generates a bibliography entry for a key formatted according to the variable org-ref-bibliography-entry-format. This variable is a string that uses the reftex percent escapes to create an entry. The default is setup for an HTML entry like this:

  "%a, %t, <i>%j</i>, <b>%v(%n)</b>, %p (%y). <a href=\"%U\">link</a>. <a href=\"https://doi.org/%D\">doi</a>."

We can redefine it temporarily to get other formats. Here is an example of getting an org-formatted entry with some italics and bold text.

(let ((org-ref-bibliography-entry-format "%a, %t, /%j/, *%v(%n)*, %p (%y). [[%U][link]]. [[https://doi.org/%D][doi]]."))
(org-ref-get-bibtex-entry-citation "andriotis-2014-infor"))

"Andriotis, Mpourmpakis, , Broderick, Rajan, Datta, Somnath, Sunkara \& Menon, Informatics guided discovery of surface structure-chemistry relationships in catalytic nanoparticles, The Journal of Chemical Physics, 140(9), 094705 (2014). link . doi .

Now, we put some citations of various types in for water splitting mccrory-2013-bench-heter, CO2 capture alesi-2012-evaluat-primar, and microfluidic devices voicu-2014-microf-studies. We will convert these links to a bibliography shortly.

Next, we generate an org-formatted bibliography. We will create a bracketed label at the beginning of the entry, and the org-format after that. This is a functional enough bibliography to be useful I think, and it illustrates the ideas. We will do some light transforming to replace escaped & with regular & in the bibliography.

;; temorarily redefine the format
(let ((org-ref-bibliography-entry-format "%a, %t, /%j/, *%v(%n)*, %p (%y). [[%U][link]]. [[https://doi.org/%D][doi]]."))

  (mapconcat
   (lambda (key)
     (format "[%s] %s" key
	     (replace-regexp-in-string
	      "\\\\&"
	      "&" (org-ref-get-bibtex-entry-citation key))))
   (org-ref-get-bibtex-keys) "\n\n"))

[alesi-2012-evaluat-primar] Alesi & Kitchin, Evaluation of a Primary Amine-Functionalized Ion-Exchange Resin for \ce{CO_2} Capture, Industrial & Engineering Chemistry Research, 51(19), 6907-6915 (2012). link . doi .

[mccrory-2013-bench-heter] McCrory, Jung, Peters, Jonas & Jaramillo, Benchmarking Heterogeneous Electrocatalysts for the Oxygen Evolution Reaction, J. Am. Chem. Soc., 135(45), 16977–16987 (2013). link . doi .

[voicu-2014-microf-studies] Voicu, Abolhasani, Choueiri, Rachelle, Lestari, Seiler, , Menard, Greener, Guenther, Axel, Stephan & Kumacheva, Microfluidic Studies of \ce{CO_2} Sequestration by Frustrated {L}ewis Pairs, Journal of the American Chemical Society, 0(0), null (2014). [[][link]]. doi .

You can see some minor issues with the formatting, e.g. sometimes the link is empty, if there is no url in the bibtex entry. There is no easy way to fix that. The 0 and null values in the last entry are because that is an ASAP article, and that is what is in the bibtex entry. I do not try to expand the latex code, and do not plan to do that. I do not know why there appears to be a blank author in the last entry, or why the author full names are not used. Those are reftex issues and low priority to fix for me. They do not exist in the LaTeX export. The main point here is to get a reasonably useful bibliography that you can adapt as you want.

Bibliography

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter
« Previous Page -- Next Page »