Finding bibtex entries with non-ascii characters

| categories: bibtex | tags:

I have found that some journals cannot handle bibtex entries with non-ascii characters in them. Unfortunately, when you paste bibtex entries into your reference file from the web, there are often non-ascii characters in them. Emacs usually shows those characters just fine, so it is difficult to find them. Here is a little recipe to go through each entry to find entries with non-ascii characters. These range from accented characters, greek letters, degree symbols, dashes, fancy quotes, etc… Since they are hard to see by eye, we can let Emacs find them for us, and then replace them with the corresponding ascii LaTeX commands.

I found a function to find non-ascii characters here: . Now, we use a modified version of this on each entry in a bibtex file. If we find a character, we will print an org-mode link to make it easy to get right to the entry.

(defun contains-non-ascii-char-p ()
  "tests if buffer contains non-ascii character"
  (let (point)
      (setq point
            (catch 'non-ascii
              (while (not (eobp))
                (or (eq (char-charset (following-char))
                    (throw 'non-ascii (point)))
                (forward-char 1)))))
    (if point
        (goto-char point)

(find-file "~/Dropbox/bibliography/references.bib")
(bibtex-map-entries (lambda (bibtex-key start end)                        
                        ;; narrow so we only look at this entry. save-restriction will rewiden
                        (when (contains-non-ascii-char-p) (princ (format "cite:%s" bibtex-key)))))))

You can see I only had one reference in that file with a non-ascii character. I think it is best practice to replace these with pure LaTeX commands. See for a good reference on what commands are used for the accented characters.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter