Using org-ref to keep your bibtex files in order

| categories: emacs, bibtex | tags:

Maintaining an accurate, useful bibliography of references is critical for scientific writing. It is also not trivial. While it is easy to download and copy bibliographic entries to your database, these entries are often incomplete, not consistently formatted, and can contain invalid characters. org-ref provides several utility functions to help with this.

1 "cleaning" a bibtex entry

Consider this bibtex entry from http://pubs.acs.org/action/showCitFormats?doi=10.1021%2Fie500588j .

@article{doi:10.1021/ie500588j,
author = {Okada, Tomohiko and Ozono, Shoya and Okamoto, Masami and Takeda, Yohei and Minamisawa, Hikari M. and Haeiwa, Tetsuji and Sakai, Toshio and Mishima, Shozi},
title = {Magnetic Rattle-Type Core–Shell Particles Containing Iron Compounds with Acid Tolerance by Dense Silica},
journal = {Industrial & Engineering Chemistry Research},
volume = {0},
number = {0},
pages = {null},
year = {0},
doi = {10.1021/ie500588j},

URL = {http://pubs.acs.org/doi/abs/10.1021/ie500588j},
eprint = {http://pubs.acs.org/doi/pdf/10.1021/ie500588j}
}

On the surface it looks fine, but there are the following issues with it:

  1. The bibtex key is hard to remember. I like systematically named keys.
  2. There is a bare & in the journal title, which is not legal in LaTeX.
  3. There is no year entry, even though it is a 2014 entry. The pages, volume, and number are also problematic, but this is an ASAP article and the reference does not have those yet.
  4. It is hard to see, but the dash between core and shell is a non-ascii character, which can cause problems in LaTeX.
  5. The entry is not very nicely aligned or indented.

You can fix these problems by putting your cursor on the bibtex entry, and typing M-x org-ref-clean-bibtex-entry. This will fix the bibtex key to a standard form, align and indent the entry, escape the & so it is legal syntax, prompt you for a year, and show you the non-ascii characters so you can replace them. The resulting, nicely formatted entry is shown below.

@article{okada-2014-magnet-rattl,
  author =	 {Okada, Tomohiko and Ozono, Shoya and Okamoto, Masami
                  and Takeda, Yohei and Minamisawa, Hikari M. and
                  Haeiwa, Tetsuji and Sakai, Toshio and Mishima,
                  Shozi},
  title =	 {Magnetic Rattle-Type Core-Shell Particles Containing
                  Iron Compounds with Acid Tolerance by Dense Silica},
  journal =	 {Industrial \& Engineering Chemistry Research},
  volume =	 0,
  pages =	 {null},
  year =	 2014,
  doi =		 {10.1021/ie500588j},
  number =	 0,
  url =		 {http://pubs.acs.org/doi/abs/10.1021/ie500588j},
  eprint =	 {http://pubs.acs.org/doi/pdf/10.1021/ie500588j},
}

The key formatting comes from these definitions:

;; variables that control bibtex key format for auto-generation
;; I want firstauthor-year-title-words
;; this usually makes a legitimate filename to store pdfs under.
(setq bibtex-autokey-year-length 4
      bibtex-autokey-name-year-separator "-"
      bibtex-autokey-year-title-separator "-"
      bibtex-autokey-titleword-separator "-"
      bibtex-autokey-titlewords 2
      bibtex-autokey-titlewords-stretch 1
      bibtex-autokey-titleword-length 5)

You should develop a discipline to clean each entry as you add them, and before you cite them. It is a pain to change the key, and then find and change all the places you used that key before. Now that you have a systematic key, go ahead and download the pdf for the article, and save it in your pdf directory by that key name. Set the variable org-ref-pdf-directory to this directory, and later when you click on citations you will be able to open the pdf easily.

2 Validating your bibliography

elisp:bibtex-validate
will check your bibliography for valid syntax. This is a bibtex command.

org-bib.bib

3 Sorting your bibtex file

It is a good idea to keep your bibtex file sorted. This will facilitate finding duplicate entries, and will make it easier to find things. I usually add entries to the top of the file, and then clean them. Then run the command

elisp:bibtex-sort-buffer
. This will sort the entries for you. This is also a bibtex command.

org-bib.bib

4 Make a full bibliography pdf

A good way to check your bibliography for duplicates, spelling errors, and invalid formats is to make a pdf containing all the entries. Open your bibtex file, and run

elisp:org-ref-build-full-bibliography
. If all goes well, you will get a pdf of your bibliography that you can check for accuracy. If there are errors, you will have to fix them until the pdf is generated.

Try it out: org-bib.bib

5 Finding bad citation links

Sometimes you will get bad citation links in your document. Maybe there is no corresponding entry, maybe you typed in the wrong key, maybe you changed the key. Either way, you need to find them and fix them. Run the command

elisp:org-ref-find-bad-citations
to find them.
cite:test

6 Extracting citations entries

You will often work from your default bibliography for your own work. Eventually you will need to extract the entries cited so you can send them to someone. The command

elisp:org-ref-extract-bibtex-entries
will do that for you. If I have cited something
cite:calle-vallejo-2010-trend-stabil
.

7 Summary

You can see a screen cast of this post here: http://screencast.com/t/yZCOdO6kJ

8 References

9 Bibtex entries

#+BEGINSRC: text :tangle extract-bib7108tYg.bib @article{calle-vallejo-2010-trend-stabil, author = {Calle-Vallejo, F. and Martinez, J. I. and Garcia- Lastra, J. M. and Mogensen, M. and Rossmeisl, J.}, title = {Trends in Stability of Perovskite Oxides}, journal = "Angewandte Chemie-International Edition", volume = 49, number = 42, pages = {7699-7701}, year = 2010, doi = {10.1002/anie.201002301}, keyword = {density functional calculations heats of formation perovskites thermochemistry transition-metals catalysts ferroelectricity}, } #+ENDSRC

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter

Finding missing citation entries in an org-file

| categories: org-mode, bibtex | tags:

Today we consider how to find citations in a document that have no corresponding entries in a bibtex file. There are a couple of pieces to this which we work out in stages below. First, we specify the bibtex file using a bibliography link defined in jorg-bib.el.

jorg-bib provides a function that gives us the relevant bibliography files found in this file.

(cite-find-bibliography)
bib1.bib bib2.bib

We can get a list of keys in these files

(let ((bibtex-files (cite-find-bibliography)))
(bibtex-global-key-alist))
(adams-1993-orien-imagin . t) (aarik-1997-effec-tio2 . t) (aruga-1985-struc-iron . t)

Now, here are some citations that we want to include in this document.

cite:aruga-1985-struc-iron,aarik-1997-effec-tio2

Here is a citation that is not in the bibtex file

cite:kitchin-2016-nobel-lecture

To find out if any of these are missing, we need a list of the citation keys in this document. We first get all the content from the cite links. We parse the buffer, and for each cite link, we get the path of the link, which contains our keys.

(let ((parsetree (org-element-parse-buffer)))
  (org-element-map parsetree 'link
    (lambda (link)       
      (let ((type (nth 0 link))
            (plist (nth 1 link))
            (content (nth 2 link)))
	(when (equal (plist-get plist ':type) "cite")
	  (plist-get plist ':path))))))
aruga-1985-struc-iron,aarik-1997-effec-tio2 kitchin-2016-nobel-lecture

That is almost what we need, but we need to separate the keys that are joined by commas. That function already exists in jorg-bib as cite-split-keys. We need to make a slight variation to get a list of all the entries, since the cite-split-keys returns a list of entries for each link. Here is on approach to that.

(let ((parsetree (org-element-parse-buffer))
      (results '()))
  (org-element-map parsetree 'link
    (lambda (link)       
      (let ((plist (nth 1 link)))
	(when (equal (plist-get plist ':type) "cite")
	  (setq results (append results (cite-split-keys (plist-get plist ':path))))))))
results)
aruga-1985-struc-iron aarik-1997-effec-tio2 kitchin-2016-nobel-lecture

Ok, now we just need to check each entry of that list against the list of entries in the bibtex files, and highlight any that are not good. We use an index function below to tell us if an element is in a list. This index function works for strings. We use the strange remove-if-not function, which requires something like triple negative logic to get the list of keys that are not in the bibtex files.

(require 'cl)

(defun index (substring list)
  "return the index of string in a list of strings"
  (let ((i 0)
	(found nil))
    (dolist (arg list i)
      (if (string-match substring arg)
	  (progn 
	    (setq found t)
	    (return i)))
      (setq i (+ i 1)))
    ;; return counter if found, otherwise return nil
    (if found i nil)))

;; generate the list of bibtex-keys and cited keys
(let* ((bibtex-files (cite-find-bibliography))
       (bibtex-keys (mapcar (lambda (x) (car x)) (bibtex-global-key-alist)))
       (parsetree (org-element-parse-buffer))
       (cited-keys))
  (org-element-map parsetree 'link
    (lambda (link)       
      (let ((plist (nth 1 link)))			     
	(when (equal (plist-get plist ':type) "cite")
	  (setq cited-keys (append cited-keys (cite-split-keys (plist-get plist ':path))))))))

(princ (remove-if-not (lambda (arg) (not (index arg bibtex-keys))) cited-keys))
)
(kitchin-2016-nobel-lecture)

The only improvement from here would be if this generated a temporary buffer with clickable links to find that bad entry! Let us take a different approach here, and print this to a temporary buffer of clickable links.

(require 'cl)

(defun index (substring list)
  "return the index of string in a list of strings"
  (let ((i 0)
	(found nil))
    (dolist (arg list i)
      (if (string-match substring arg)
	  (progn 
	    (setq found t)
	    (return i)))
      (setq i (+ i 1)))
    ;; return counter if found, otherwise return nil
    (if found i nil)))

;; generate the list of bibtex-keys and cited keys
(let* ((bibtex-files (cite-find-bibliography))
       (bibtex-keys (mapcar (lambda (x) (car x)) (bibtex-global-key-alist)))
       (bad-citations '()))

  (org-element-map (org-element-parse-buffer) 'link
    (lambda (link)       
      (let ((plist (nth 1 link)))			     
	(when (equal (plist-get plist ':type) "cite")
	  (dolist (key (cite-split-keys (plist-get plist ':path)) )
	    (when (not (index key bibtex-keys))
	      (setq bad-citations (append bad-citations
			    `(,(format "%s [[elisp:(progn (find-file \"%s\")(goto-char %s))][not found here]]\n"
		      key (buffer-file-name)(plist-get plist ':begin)))))
			    ))))))

(mapconcat 'identity bad-citations ""))

kitchin-2016-nobel-lecture

elisp:(progn (find-file "/home-research/jkitchin/Dropbox/blogofile-jkitchin.github.com/_blog/blog.org")(goto-char 1052))

That is likely to come in handy. I have put a variation of this code in jorb-bib, in the function called jorg-bib-find-bad-citations.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter

A popup menu for citation links in org-mode

| categories: org-mode, bibtex | tags:

I have been exploring ways to get more information out of links in org-mode. I have considered popups , and right-clicking . Here I show how to get a popup menu on a citation link. The idea is that clicking or opening the ditation link should give you a menu. The menu should give you some context, e.g. if the bibtex key even exists. If it does, you should be able to get a quick view of the citation in the minibuffer. You should be able to open the entry in the bibtex file from the menu. If you have a pdf of the reference, you should have an option to open it. You should be able to open the url associated with the entry from the menu too.

Here is the function. We use https://github.com/auto-complete/popup-el , and some code from https://github.com/jkitchin/jmax/blob/master/jorg-bib.el .

(org-add-link-type
 "cite"
 ;; this function is run when you click on the link
 (lambda (link-string) 
   (let* ((menu-choice)
         ;; this is in jorg-bib.el
         (results (get-bibtex-key-and-file))
	 (key (car results))
	 (cb (current-buffer))
         (pdf-file (format (concat jorg-bib-pdf-directory "%s.pdf") key))
         (bibfile (cdr results)))
     (setq menu-choice
	   (popup-menu* 
	    (list (popup-make-item (if 
				       (progn
					 (let ((cb (current-buffer)) result)					
					   (find-file bibfile)
					   (setq result (bibtex-search-entry key))
					   (switch-to-buffer cb)
					   result))
				       "Simple citation"
				     "No key found")  :value "cite")
		  (popup-make-item (if
				       (progn
					 (let ((cb (current-buffer)) result)					  
					   (find-file bibfile)
					   (setq result (bibtex-search-entry key))
					   (switch-to-buffer cb)
					   result))
				       (format "Open %s in %s" key bibfile)
				     "No key found") :value "bib")
		  (popup-make-item 
		   ;; check if pdf exists.jorg-bib-pdf-directory is a user defined directory.
                   ;; pdfs are stored by bibtex key in that directory
		   (if (file-exists-p pdf-file)
		       (format "Open PDF for %s" key)
		     "No pdf found") :value "pdf")
		  (popup-make-item "Open URL" :value "web")
		  (popup-make-item "Open Notes" :value "notes")
		  )))

     (cond
      ;; goto entry in bibfile
      ((string= menu-choice "bib")       
       (find-file bibfile)
       (bibtex-search-entry key))

      ;; goto entry and try opening the url
      ((string= menu-choice "web")   
       (let ((cb (current-buffer)))
	 (save-excursion
	   (find-file bibfile)
	   (bibtex-search-entry key)
	   (bibtex-url))
	 (switch-to-buffer cb)))
       
      ;; goto entry and open notes, create notes entry if there is none
      ((string= menu-choice "notes")   
       (find-file bibfile)
       (bibtex-search-entry key)       
       (jorg-bib-open-bibtex-notes))

     ;; open the pdf file if it exists
     ((string= menu-choice "pdf")
      (when (file-exists-p pdf-file)
	  (org-open-file pdf-file)))

     ;; print citation to minibuffer
     ((string= menu-choice "cite")
      (let ((cb (current-buffer)))	
	(message "%s" (save-excursion (find-file bibfile)
				      (bibtex-search-entry key)  
				      (jorg-bib-citation)))
	(switch-to-buffer cb))))))
 ;; formatting
(lambda (keyword desc format)
   (cond
    ((eq format 'html) (format "(<cite>%s</cite>)" path))
    ((eq format 'latex)
     (concat "\\cite{"
	     (mapconcat (lambda (key) key) (cite-split-keys keyword) ",")
	     "}")))))

cite:daza-2014-carbon-dioxid,mehta-2014-ident-poten,test,ahuja-2001-high-ruo2

Here you can see an example of a menu where I have the PDF:

Here is an example menu of a key with no entry:

And, and entry with no PDF:

Here is the simple citation:

And a reference from the other bibliography:

Not bad! I will probably replace the cite link in jorg-bib with something like this.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter

A better insert citation function for org-mode

| categories: org-mode, bibtex | tags:

I have setup a reftex citation format that inserts a cite link using reftex like this.

(eval-after-load 'reftex-vars
  '(progn
      (add-to-list 'reftex-cite-format-builtin
                   '(org "Org-mode citation"
                         ((?\C-m . "cite:%l"))))))

I mostly like this, but it does not let me add citations to an existing citation; doing that leads to the insertion of an additional cite within the citation, which is an error. One way to make this simple is to add another cite format which simple returns the selected keys. You would use this with the cursor at the end of the link, and it will just append the results.

(add-to-list 'reftex-cite-format-builtin
                   '(org "Org-mode citation"
                         ((?\C-m . "cite:%l")
			  (?a . ",%l"))))

That actually works nicely. I would like a better approach though, that involves less keywork. Ideally, a single function that does what I want, which is when on a link, append to it, and otherwise insert a new citation link. Today I will develop a function that fixes that problem.

(defun insert-cite-link ()
  (interactive)
  (let* ((object (org-element-context))
	 (link-string-beginning (org-element-property :begin object))
	 (link-string-end (org-element-property :end object))
	 (path (org-element-property :path object)))    
    (if (and (equal (org-element-type object) 'link) 
               (equal (org-element-property :type object) "cite"))
	(progn
	  (goto-char link-string-end)
	  (insert (concat "," (mapconcat 'identity (reftex-citation t ?a) ","))))
      (insert (concat "cite:" (mapconcat 'identity (reftex-citation t) ",")))
      )))

That function is it! Org-mode just got a lot better. That function only puts a cite link in, but since that is all I use 99.99+% of the time, it works fine for me!

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter

Multiple search criteria to find bibtex entries

| categories: bibtex | tags:

I have been thinking about ways to search my bibtex file with multiple criteria. Eventually, I want a decent natural language search like "au=kitchin and alloy" to find papers authored by me about alloys. For now, I am going to settle with a way to find these. This strategy will create a search function that prints the entries that are found. Here is the prototype idea:

(defun my-search (key start end)
  (when (and (re-search-forward "kitchin" end t)
           (re-search-forward "alloy" end t))
      (princ (format "%s\n" (buffer-substring start end)))))

(with-temp-buffer
  (insert-file-contents "../../bibliography/references.bib")
  (bibtex-map-entries  'my-search))
@ARTICLE{inoglu-2011-ident-sulfur,
  pdf =		 {[[file:bibtex-pdfs/inoglu-2011-ident-sulfur.pdf]]},
  org-notes =
                  {[[file:~/Dropbox/bibliography/notes.org::inoglu-2011-ident-sulfur]]},
  author =	 {Inoglu, Nilay and Kitchin, John R.},
  title =	 {Identification of Sulfur-Tolerant Bimetallic
                  Surfaces Using {DFT} Parametrized Models and
                  Atomistic Thermodynamics},
  journal =	 {ACS Catalysis},
  year =	 2011,
  pages =	 {399--407},
  abstract =	 {The identification of sulfur-tolerant alloys for
                  catalytic applications is difficult due to the
                  combinatorially large number of alloy compositions
                  and surface structures that may be
                  considered. Density functional theory calculations
                  (DFT) are not fast enough to enumerate all the
                  possible structures and their sulfur tolerance. In
                  this work, a DFT parametrized algebraic model that
                  accounts for structure and composition was used to
                  estimate the d-band properties and sulfur adsorption
                  energies of 370 transition metal-based bimetallic
                  alloy surfaces.  The estimated properties were
                  validated by DFT calculations for 110 of the surface
                  structures. We then utilized an atomistic
                  thermodynamic framework that includes surface
                  segregation, the presence of adsorbates, and effects
                  of environmental conditions to identify alloy
                  compositions and structures with enhanced sulfur
                  tolerance that are likely to be stable under the
                  environmental conditions. As a case study, we show
                  how this database can be used to identify
                  sulfur-tolerant Cu-based catalysts and compare the
                  results with what is known about these catalysts
                  experimentally.},
  doi =		 {10.1021/cs200039t},
  issn =	 {null},
  type =	 {Journal Article}
}
@ARTICLE{kitchin-2008-alloy,
  pdf =		 {[[file:bibtex-pdfs/kitchin-2008-alloy.pdf]]},
  org-notes =
                  {[[file:~/Dropbox/bibliography/notes.org::kitchin-2008-alloy]]},
  author =	 {Kitchin, J. R. and Reuter, K. and Scheffler, M.},
  title =	 {Alloy surface segregation in reactive environments:
                  First-principles atomistic thermodynamics study of
                  \ce{Ag_3Pd}(111) in oxygen atmospheres},
  journal =	 {Physical Review B},
  year =	 2008,
  volume =	 77,
  number =	 7,
  abstract =	 {We present a first-principles atomistic
                  thermodynamics framework to describe the structure,
                  composition, and segregation profile of an alloy
                  surface in contact with a (reactive)
                  environment. The method is illustrated with the
                  application to a Ag3Pd(111) surface in an oxygen
                  atmosphere, and we analyze trends in segregation,
                  adsorption, and surface free energies. We observe a
                  wide range of oxygen adsorption energies on the
                  various alloy surface configurations, including
                  binding that is stronger than on a Pd(111) surface
                  and weaker than that on a Ag(111) surface. This and
                  the consideration of even small amounts of
                  nonstoichiometries in the ordered bulk alloy are
                  found to be crucial to accurately model the Pd
                  surface segregation occurring in increasingly O-rich
                  gas phases.},
  doi =		 {https://doi.org/10.1103/PhysRevB.77.075437},
  pages =	 075437,
  issn =	 {1098-0121},
  type =	 {Journal Article}
}
@ARTICLE{tierney-2009-hydrog-dissoc,
  pdf =		 {[[file:bibtex-pdfs/tierney-2009-hydrog-dissoc.pdf]]},
  org-notes =
                  {[[file:~/Dropbox/bibliography/notes.org::tierney-2009-hydrog-dissoc]]},
  author =	 {Tierney, H. L. and Baber, A. E. and Kitchin,
                  J. R. and Sykes, E.  C. H.},
  title =	 {Hydrogen Dissociation and Spillover on Individual
                  Isolated Palladium Atoms},
  journal =	 {Physical Review Letters},
  year =	 2009,
  volume =	 103,
  number =	 24,
  abstract =	 {Using a combination of low-temperature scanning
                  tunneling microscopy and density functional theory
                  it is demonstrated how the nature of an inert host
                  metal of an alloy can affect the thermodynamics and
                  kinetics of a reaction pathway in a much more
                  profound way than simply a dilution, electronic, or
                  geometric effect. This study reveals that
                  individual, isolated Pd atoms can promote H-2
                  dissociation and spillover onto a Cu(111) surface,
                  but that the same mechanism is not observed for an
                  identical array of Pd atoms in Au(111).},
  pages =	 246102,
  doi =		 {10.1103/PhysRevLett.103.246102},
  issn =	 {0031-9007},
  url =		 {http://prl.aps.org/abstract/PRL/v103/i24/e246102},
  type =	 {Journal Article}
}

That is not too bad. If I had a parser like this one , I could do some reasonable searches. I could try integrating it with reftex or something similar for selecting citations. I would like that a lot.

What if I wanted to find articles with Kitchin as an author, and alloy in the title? This is my best effort at doing that, where I explicitly match the fields in the bibtex entries.

(find-file "~/Dropbox/bibliography/references.bib")
(bibtex-map-entries (lambda (bibtex-key start end)
                      (let* ((entry (bibtex-parse-entry))
                             (title (cdr (assoc "title" entry)))
                             (authors (cdr (assoc "author" entry))))
                        (when (and title (string-match "alloy" title)
                                   authors (string-match "kitchin" authors))
                          (princ (buffer-substring start end)))))))
@ARTICLE{kitchin-2008-alloy,
  pdf =		 {[[file:bibtex-pdfs/kitchin-2008-alloy.pdf]]},
  org-notes =
                  {[[file:~/Dropbox/bibliography/notes.org::kitchin-2008-alloy]]},
  author =	 {Kitchin, J. R. and Reuter, K. and Scheffler, M.},
  title =	 {Alloy surface segregation in reactive environments:
                  First-principles atomistic thermodynamics study of
                  \ce{Ag_3Pd}(111) in oxygen atmospheres},
  journal =	 {Physical Review B},
  year =	 2008,
  volume =	 77,
  number =	 7,
  abstract =	 {We present a first-principles atomistic
                  thermodynamics framework to describe the structure,
                  composition, and segregation profile of an alloy
                  surface in contact with a (reactive)
                  environment. The method is illustrated with the
                  application to a Ag3Pd(111) surface in an oxygen
                  atmosphere, and we analyze trends in segregation,
                  adsorption, and surface free energies. We observe a
                  wide range of oxygen adsorption energies on the
                  various alloy surface configurations, including
                  binding that is stronger than on a Pd(111) surface
                  and weaker than that on a Ag(111) surface. This and
                  the consideration of even small amounts of
                  nonstoichiometries in the ordered bulk alloy are
                  found to be crucial to accurately model the Pd
                  surface segregation occurring in increasingly O-rich
                  gas phases.},
  doi =		 {https://doi.org/10.1103/PhysRevB.77.075437},
  pages =	 075437,
  issn =	 {1098-0121},
  type =	 {Journal Article}
}

This is a more precise search, which yields only one entry. That is not exactly nimble searching, but it does provide precision. I need to think about this some more.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter
« Previous Page -- Next Page »