Using org-ref to keep your bibtex files in order

| categories: emacs, bibtex | tags:

Maintaining an accurate, useful bibliography of references is critical for scientific writing. It is also not trivial. While it is easy to download and copy bibliographic entries to your database, these entries are often incomplete, not consistently formatted, and can contain invalid characters. org-ref provides several utility functions to help with this.

1 "cleaning" a bibtex entry

Consider this bibtex entry from http://pubs.acs.org/action/showCitFormats?doi=10.1021%2Fie500588j .

@article{doi:10.1021/ie500588j,
author = {Okada, Tomohiko and Ozono, Shoya and Okamoto, Masami and Takeda, Yohei and Minamisawa, Hikari M. and Haeiwa, Tetsuji and Sakai, Toshio and Mishima, Shozi},
title = {Magnetic Rattle-Type Core–Shell Particles Containing Iron Compounds with Acid Tolerance by Dense Silica},
journal = {Industrial & Engineering Chemistry Research},
volume = {0},
number = {0},
pages = {null},
year = {0},
doi = {10.1021/ie500588j},

URL = {http://pubs.acs.org/doi/abs/10.1021/ie500588j},
eprint = {http://pubs.acs.org/doi/pdf/10.1021/ie500588j}
}

On the surface it looks fine, but there are the following issues with it:

  1. The bibtex key is hard to remember. I like systematically named keys.
  2. There is a bare & in the journal title, which is not legal in LaTeX.
  3. There is no year entry, even though it is a 2014 entry. The pages, volume, and number are also problematic, but this is an ASAP article and the reference does not have those yet.
  4. It is hard to see, but the dash between core and shell is a non-ascii character, which can cause problems in LaTeX.
  5. The entry is not very nicely aligned or indented.

You can fix these problems by putting your cursor on the bibtex entry, and typing M-x org-ref-clean-bibtex-entry. This will fix the bibtex key to a standard form, align and indent the entry, escape the & so it is legal syntax, prompt you for a year, and show you the non-ascii characters so you can replace them. The resulting, nicely formatted entry is shown below.

@article{okada-2014-magnet-rattl,
  author =	 {Okada, Tomohiko and Ozono, Shoya and Okamoto, Masami
                  and Takeda, Yohei and Minamisawa, Hikari M. and
                  Haeiwa, Tetsuji and Sakai, Toshio and Mishima,
                  Shozi},
  title =	 {Magnetic Rattle-Type Core-Shell Particles Containing
                  Iron Compounds with Acid Tolerance by Dense Silica},
  journal =	 {Industrial \& Engineering Chemistry Research},
  volume =	 0,
  pages =	 {null},
  year =	 2014,
  doi =		 {10.1021/ie500588j},
  number =	 0,
  url =		 {http://pubs.acs.org/doi/abs/10.1021/ie500588j},
  eprint =	 {http://pubs.acs.org/doi/pdf/10.1021/ie500588j},
}

The key formatting comes from these definitions:

;; variables that control bibtex key format for auto-generation
;; I want firstauthor-year-title-words
;; this usually makes a legitimate filename to store pdfs under.
(setq bibtex-autokey-year-length 4
      bibtex-autokey-name-year-separator "-"
      bibtex-autokey-year-title-separator "-"
      bibtex-autokey-titleword-separator "-"
      bibtex-autokey-titlewords 2
      bibtex-autokey-titlewords-stretch 1
      bibtex-autokey-titleword-length 5)

You should develop a discipline to clean each entry as you add them, and before you cite them. It is a pain to change the key, and then find and change all the places you used that key before. Now that you have a systematic key, go ahead and download the pdf for the article, and save it in your pdf directory by that key name. Set the variable org-ref-pdf-directory to this directory, and later when you click on citations you will be able to open the pdf easily.

2 Validating your bibliography

elisp:bibtex-validate
will check your bibliography for valid syntax. This is a bibtex command.

org-bib.bib

3 Sorting your bibtex file

It is a good idea to keep your bibtex file sorted. This will facilitate finding duplicate entries, and will make it easier to find things. I usually add entries to the top of the file, and then clean them. Then run the command

elisp:bibtex-sort-buffer
. This will sort the entries for you. This is also a bibtex command.

org-bib.bib

4 Make a full bibliography pdf

A good way to check your bibliography for duplicates, spelling errors, and invalid formats is to make a pdf containing all the entries. Open your bibtex file, and run

elisp:org-ref-build-full-bibliography
. If all goes well, you will get a pdf of your bibliography that you can check for accuracy. If there are errors, you will have to fix them until the pdf is generated.

Try it out: org-bib.bib

5 Finding bad citation links

Sometimes you will get bad citation links in your document. Maybe there is no corresponding entry, maybe you typed in the wrong key, maybe you changed the key. Either way, you need to find them and fix them. Run the command

elisp:org-ref-find-bad-citations
to find them.
cite:test

6 Extracting citations entries

You will often work from your default bibliography for your own work. Eventually you will need to extract the entries cited so you can send them to someone. The command

elisp:org-ref-extract-bibtex-entries
will do that for you. If I have cited something
cite:calle-vallejo-2010-trend-stabil
.

7 Summary

You can see a screen cast of this post here: http://screencast.com/t/yZCOdO6kJ

8 References

9 Bibtex entries

#+BEGINSRC: text :tangle extract-bib7108tYg.bib @article{calle-vallejo-2010-trend-stabil, author = {Calle-Vallejo, F. and Martinez, J. I. and Garcia- Lastra, J. M. and Mogensen, M. and Rossmeisl, J.}, title = {Trends in Stability of Perovskite Oxides}, journal = "Angewandte Chemie-International Edition", volume = 49, number = 42, pages = {7699-7701}, year = 2010, doi = {10.1002/anie.201002301}, keyword = {density functional calculations heats of formation perovskites thermochemistry transition-metals catalysts ferroelectricity}, } #+ENDSRC

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter