Finding bibtex entries with no downloaded pdf

| categories: bibtex | tags:

We use bibtex for bibiliography management in our group. Almost every journal provides a utility to download bibtex entries, and you can pretty easily download bibtex entries from citeulike. It doesn't take too long though before you have a few hundred entries. You need some tools to interact with that database.

Bibtex-mode in Emacs provides some tools for working with your bibtex files. For example, you can (bibtex-validate) to check if entries are correct, and (bibtex-sort-buffer) to sort them by key.

I have a specific workflow to entering new entries. This is what I prefer to do:

  1. Go to journal, get bibtex entry, paste into bibtex file.
  2. delete the key that is used, if any
  3. type C-c C-c to autogenerate a key of my style
  4. Copy the key, download the pdf, and save the pdf as (format "%s.pdf" key) in my pdfs directory.
  5. Make an entry in a notes file for that reference. These entries are initially tagged as TODO to remind me to organize them.

Doing this has some payoffs; my org-mode cite links can open either the bibtex entry, or the pdf file directly from the org-file! The notes file is also an org-file, which I can organize as I see fit.

Sometimes I am lazy, and do not get all these steps done, especially the pdf download step. I like to have local copies of the pdf files so I can read them even if I am offline, and because I often annotate them using a tablet PC. It also makes it easy to send them to my students if I need to. Periodically, I like to go through my bibtex database to do some maintenance, download missing files, and notes entries etc… The problem is how do I know which entries have downloaded pdfs or note entries? It is not that difficult with a bit of elisp.

(find-file "~/Dropbox/bibliography/references.bib")
(bibtex-map-entries (lambda (bibtex-key start end) 
                      (let ((type  (cdr (car (bibtex-parse-entry)))))                        
                        (unless (file-exists-p 
                                 (format "~/Dropbox/bibliography/bibtex-pdfs/%s.pdf" bibtex-key))
                          (princ (format "%10s:  cite:%s has no pdf\n" type bibtex-key))))))
      Book:  cite:ambrose-2010-how-learn-works has no pdf
   article:  cite:gerken-2010-fluor-modul has no pdf
      Book:  cite:gray-1973-chemic-bonds has no pdf
   ARTICLE:  cite:kitchin-2003-tio2 has no pdf
   ARTICLE:  cite:kitchin-2012-prefac has no pdf
      Book:  cite:kittel-2005-introd-solid has no pdf
   ARTICLE:  cite:mccormick-2003-tio2-pd has no pdf
   ARTICLE:  cite:mhadeshwar-2004-nh3-ru has no pdf
      Misc:  cite:ni-website has no pdf
   ARTICLE:  cite:norskov-2006-respon has no pdf
      Book:  cite:reif-1965-fundam-statis has no pdf
   article:  cite:risch-2012-water-oxidat has no pdf
   ARTICLE:  cite:shultz-1995-prepar-and has no pdf
   ARTICLE:  cite:shultz-1997-prepar has no pdf
   ARTICLE:  cite:song-2002-h3pw1 has no pdf

Using that list, I can click on those links, which takes me to the entry in file. That entry probably has a url or doi that makes it easy to navigate to the journal page where I can download the pdf file. You could improve on the code above by filtering out only articles, for example.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter