Finding missing citation entries in an org-file

| categories: bibtex, org-mode | tags:

Today we consider how to find citations in a document that have no corresponding entries in a bibtex file. There are a couple of pieces to this which we work out in stages below. First, we specify the bibtex file using a bibliography link defined in jorg-bib.el.

jorg-bib provides a function that gives us the relevant bibliography files found in this file.

(cite-find-bibliography)
bib1.bib bib2.bib

We can get a list of keys in these files

(let ((bibtex-files (cite-find-bibliography)))
(bibtex-global-key-alist))
(adams-1993-orien-imagin . t) (aarik-1997-effec-tio2 . t) (aruga-1985-struc-iron . t)

Now, here are some citations that we want to include in this document.

cite:aruga-1985-struc-iron,aarik-1997-effec-tio2

Here is a citation that is not in the bibtex file

cite:kitchin-2016-nobel-lecture

To find out if any of these are missing, we need a list of the citation keys in this document. We first get all the content from the cite links. We parse the buffer, and for each cite link, we get the path of the link, which contains our keys.

(let ((parsetree (org-element-parse-buffer)))
  (org-element-map parsetree 'link
    (lambda (link)       
      (let ((type (nth 0 link))
            (plist (nth 1 link))
            (content (nth 2 link)))
	(when (equal (plist-get plist ':type) "cite")
	  (plist-get plist ':path))))))
aruga-1985-struc-iron,aarik-1997-effec-tio2 kitchin-2016-nobel-lecture

That is almost what we need, but we need to separate the keys that are joined by commas. That function already exists in jorg-bib as cite-split-keys. We need to make a slight variation to get a list of all the entries, since the cite-split-keys returns a list of entries for each link. Here is on approach to that.

(let ((parsetree (org-element-parse-buffer))
      (results '()))
  (org-element-map parsetree 'link
    (lambda (link)       
      (let ((plist (nth 1 link)))
	(when (equal (plist-get plist ':type) "cite")
	  (setq results (append results (cite-split-keys (plist-get plist ':path))))))))
results)
aruga-1985-struc-iron aarik-1997-effec-tio2 kitchin-2016-nobel-lecture

Ok, now we just need to check each entry of that list against the list of entries in the bibtex files, and highlight any that are not good. We use an index function below to tell us if an element is in a list. This index function works for strings. We use the strange remove-if-not function, which requires something like triple negative logic to get the list of keys that are not in the bibtex files.

(require 'cl)

(defun index (substring list)
  "return the index of string in a list of strings"
  (let ((i 0)
	(found nil))
    (dolist (arg list i)
      (if (string-match substring arg)
	  (progn 
	    (setq found t)
	    (return i)))
      (setq i (+ i 1)))
    ;; return counter if found, otherwise return nil
    (if found i nil)))

;; generate the list of bibtex-keys and cited keys
(let* ((bibtex-files (cite-find-bibliography))
       (bibtex-keys (mapcar (lambda (x) (car x)) (bibtex-global-key-alist)))
       (parsetree (org-element-parse-buffer))
       (cited-keys))
  (org-element-map parsetree 'link
    (lambda (link)       
      (let ((plist (nth 1 link)))			     
	(when (equal (plist-get plist ':type) "cite")
	  (setq cited-keys (append cited-keys (cite-split-keys (plist-get plist ':path))))))))

(princ (remove-if-not (lambda (arg) (not (index arg bibtex-keys))) cited-keys))
)
(kitchin-2016-nobel-lecture)

The only improvement from here would be if this generated a temporary buffer with clickable links to find that bad entry! Let us take a different approach here, and print this to a temporary buffer of clickable links.

(require 'cl)

(defun index (substring list)
  "return the index of string in a list of strings"
  (let ((i 0)
	(found nil))
    (dolist (arg list i)
      (if (string-match substring arg)
	  (progn 
	    (setq found t)
	    (return i)))
      (setq i (+ i 1)))
    ;; return counter if found, otherwise return nil
    (if found i nil)))

;; generate the list of bibtex-keys and cited keys
(let* ((bibtex-files (cite-find-bibliography))
       (bibtex-keys (mapcar (lambda (x) (car x)) (bibtex-global-key-alist)))
       (bad-citations '()))

  (org-element-map (org-element-parse-buffer) 'link
    (lambda (link)       
      (let ((plist (nth 1 link)))			     
	(when (equal (plist-get plist ':type) "cite")
	  (dolist (key (cite-split-keys (plist-get plist ':path)) )
	    (when (not (index key bibtex-keys))
	      (setq bad-citations (append bad-citations
			    `(,(format "%s [[elisp:(progn (find-file \"%s\")(goto-char %s))][not found here]]\n"
		      key (buffer-file-name)(plist-get plist ':begin)))))
			    ))))))

(mapconcat 'identity bad-citations ""))

kitchin-2016-nobel-lecture

elisp:(progn (find-file "/home-research/jkitchin/Dropbox/blogofile-jkitchin.github.com/_blog/blog.org")(goto-char 1052))

That is likely to come in handy. I have put a variation of this code in jorb-bib, in the function called jorg-bib-find-bad-citations.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.6

Discuss on Twitter