Extracting bibtex file from an org-buffer

| categories: bibtex, org-mode | tags:

Table of Contents

We use citation links a lot in our org-files, like this:

cite:thompson-2014-co2-react
. Sometimes there are multiple citations like this
cite:mehta-2014-ident-poten,hallenbeck-2013-effec-o2
. It would be convenient at times to extract a bibtex file from these citations. That way we could easily share files. This is possible in RefTeX from a LaTeX file. Org makes it easy to export to LaTeX, so this seems like it should be easy. It would be easy, if I always put the bibliography link in the file. I usually do not, so let us check if that is the case, and if it is not add the bibliography to the end before we export. Then, with the LaTeX file in hand, we open it, and call the RefTeX functions to get the bibliography. Finally, we will create a link to the actual created file, and add it as a source block that can be tangled at the end of the file.

Here is a function that does the extraction and some house cleaning. We actually take the contents of the buffer and save it in a temporary file, so that we do not accidentally clobber a tex or bibtex file here.

(defun kg-extract-bibtex ()
  "create bibtex file of entries cited in this buffer"

  (let* ((tempname (make-temp-file "extract-bib"))
         (contents (buffer-string))
         (cb (current-buffer))
         basename texfile bibfile results)

    (find-file tempname)
    (insert contents)
    (setq basename (file-name-sans-extension
                    (file-name-nondirectory buffer-file-name))
          texfile (concat basename ".tex")
          bibfile (concat basename ".bib"))

  (save-excursion
    (goto-char (point-min))
    (unless (re-search-forward "^bibliography:" (point-max) 'end)
      (insert (format "\nbibliography:%s" (mapconcat 'identity reftex-default-bibliography ",")))))

    (org-latex-export-to-latex)
    (find-file texfile)
    (reftex-parse-all)
    (reftex-create-bibtex-file bibfile)
    (setq results (buffer-string))
    (kill-buffer bibfile)
    (kill-buffer texfile)
    (delete-file texfile)
    (delete-file tempname)

    (switch-to-buffer cb)
    (save-excursion
      (goto-char (point-max))
      (insert (format "

** Bibtex entries

#+BEGIN_EXAMPLE: 
%s
#+END_EXAMPLE" results)))))

(kg-extract-bibtex)

There it is! The src block does not render in HTML very well, since it appears to be simple text. It looks fine in the org file though.

It might be a good idea to replace the bibliography line with the new file, but I will leave that as an exercise for later.

1 Bibtex entries

#+BEGINEXAMPLE: @article{hallenbeck-2013-effec-o2, author = "Hallenbeck, Alexander P. and Kitchin, John R.", title = "Effects of \ce{O_2} and \ce{SO_2} on the Capture Capacity of a Primary-Amine Based Polymeric \ce{CO_2} Sorbent", year = 2013, doi = "10.1021/ie400582a", eprint = "http://pubs.acs.org/doi/pdf/10.1021/ie400582a ", journal = "Industrial \& Engineering Chemistry Research", pages = "10788-10794", url = "http://pubs.acs.org/doi/abs/10.1021/ie400582a ", }

@article{mehta-2014-ident-poten, author = {Mehta, Prateek and Salvador, Paul A. and Kitchin, John R.}, title = {Identifying Potential BO2 Oxide Polymorphs for Epitaxial Growth Candidates}, journal = {ACS Applied Materials \& Interfaces}, volume = 0, number = 0, pages = {null}, year = 2014, doi = {10.1021/am4059149}, URL = {http://pubs.acs.org/doi/abs/10.1021/am4059149 }, eprint = {http://pubs.acs.org/doi/pdf/10.1021/am4059149 } }

@Article{thompson-2014-co2-react, author = {Thompson, Robert L. and Albenze, Erik and Shi, Wei and Hopkinson, David and Damodaran, Krishnan and Lee, Anita and Kitchin, John and Luebke, David Richard and Nulwala, Hunaid}, title = {\ce{CO_2} Reactive Ionic Liquids: Effects of functional groups on the anion and its influence on the physical properties}, journal = {RSC Adv.}, year = 2014, pages = "-", publisher = {The Royal Society of Chemistry}, doi = {10.1039/C3RA47097K}, url = {https://doi.org/10.1039/C3RA47097K }, abstract = "Next generation of gas separation materials are needed to alleviate issues faced in energy and environmental area. Ionic liquids (ILs) are promising class of material for CO2 separations. In this work{,} CO2 reactive triazolides ILs were synthesized and characterized with the aim of developing deeper understanding on how structural changes affect the overall properties for CO2 separation. Important insights were gained illustrating the effects of substituents on the anion. It was found that substituents play a crucial role in dictating the overall physical properties of reactive ionic liquids. Depending upon the electronic and steric nature of the substituent{,} CO2 capacities between 0.07-0.4 mol CO2/mol IL were observed. Detailed spectroscopic{,} CO2 absorption{,} rheological{,} and simulation studies were carried out to understand the nature and influence of these substituents. The effect of water content was also evaluated{,} and it was found that water had an unexpected impact on the properties of these materials{,} resulting in an increased viscosity{,} but little change in the CO2 reactivity." } #+ENDEXAMPLE

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter

Merging bibtex files and avoiding duplicates

| categories: bibtex | tags:

I usually advocate to have a master bibtex file with all entries in it. Emacs is helpful at avoiding duplicate entries as you enter them. Sometimes though, you have more than one bibtex file. Maybe you started one for a new project, or someone sent you one. In any case, you want to merge the files into one file. Bibtex requires each entry to have a unique key.

Let us begin. I have two bibtex files I exported from Endnote. I have already removed all the non-ascii characters and cleaned them up pretty well. We start with some analysis.

from bibtexparser.bparser import BibTexParser
with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries1 = bp.get_entry_list()

print '{0} entries in file 1'.format(len(entries1))

with open('../../CMU/proposals/link-to-2014/perovskite-strain/perovskite-strain.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries2 = bp.get_entry_list()

print '{0} entries in file 2'.format(len(entries2))
100 entries in file 1
129 entries in file 2

Now, let see how many duplicates there are. It is easy to use sets for this.

# store keys to check for duplicates
from bibtexparser.bparser import BibTexParser
with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries1 = bp.get_entry_list()

with open('../../CMU/proposals/link-to-2014/perovskite-strain/perovskite-strain.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries2 = bp.get_entry_list()

entry1_keys = set([entry['id'] for entry in entries1])
entry2_keys = set([entry['id'] for entry in entries2])

duplicates = entry1_keys & entry2_keys
print 'There are {0} duplicates'.format(len(duplicates))
print duplicates
There are 20 duplicates
set(['nolan-2008-vacan-co', 'giocondi-2001-spatial', 'giocondi-2001-spatial-batio3', 'wang-2006-oxidat-gga', 'piskunov-2008-elect-lamno3', 'pala-2007-modif-oxidat', 'chretien-2006-densit-funct', 'giocondi-2008-sr2nb-batio3', 'kushima-2010-compet-lacoo3', 'pala-2009-co-ti', 'giocondi-2007-srtio3', 'lee-2009-ab-labo3', 'balasubramanian-2005-epitax-phase', 'mastrikov-2010-pathw-oxygen', 'shapovalov-2007-catal', 'evarestov-2005-compar-lcao', 'choi-2007-comput-study', 'havelia-2009-nucleat-growt', 'lee-2009-ab-defec', 'lee-2009-predic-surfac'])

Ok, now we make a function to format each entry. We take that code from this this post and turn it into a function. Then we add all the entries from the first file. Then, we add entries from the second file as long as the key is not in the list from the first file.

from bibtexparser.bparser import BibTexParser
import os, textwrap

def format_bibtex_entry(entry):
    # field, format, wrap or not
    field_order = [(u'author', '{{{0}}},\n', True),
                   (u'title', '{{{0}}},\n', True),
                   (u'journal','"{0}",\n', True),
                   (u'volume','{{{0}}},\n', True),
                   (u'number', '{{{0}}},\n', True),
                   (u'pages', '{{{0}}},\n', True),
                   (u'year', '{0},\n', True),
                   (u'doi','{{{0}}},\n', False),
                   (u'url','{{\url{{{0}}}}},\n', False),
                   (u'link','{{\url{{{0}}}}},\n', False)]
    
    keys = set(entry.keys())

    extra_fields = keys.difference([f[0] for f in field_order])
    # we do not want these in our entry
    extra_fields.remove('type')
    extra_fields.remove('id')

    # Now build up our entry string
    s = '@{type}{{{id},\n'.format(type=entry['type'].upper(),
                                  id=entry['id'])

    for field, fmt, wrap in field_order:
        if field in entry:
            s1 = '  {0} ='.format(field.upper())
            s2 = fmt.format(entry[field])
            s3 = '{0:17s}{1}'.format(s1, s2)
            if wrap:
                # fill seems to remove trailing '\n'
                s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
            s += s3  

    for field in extra_fields:
        if field in entry:
            s1 = '  {0} ='.format(field.upper())
            s2 = entry[field]
            s3 = '{0:17s}{{{1}}}'.format(s1, s2)
            s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
            s += s3  

    s += '}\n\n'
    return s

if os.path.exists('merged.bib'): os.unlink('merged.bib')    

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries1 = bp.get_entry_list()

for entry in entries1:
    with open('merged.bib', 'a') as f:
        f.write(format_bibtex_entry(entry))

# store keys to check for duplicates
entry1_keys = [entry['id'] for entry in entries1]

with open('../../CMU/proposals/link-to-2014/perovskite-strain/perovskite-strain.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries2 = bp.get_entry_list()

for entry in entries2:
    if not entry['id'] in entry1_keys:
        with open('merged.bib', 'a') as f:
            f.write(format_bibtex_entry(entry))

Here is the merged file: merged.bib and the corresponding bibliography merged.pdf . There are 209 entries in it, which is what we expected given that there were 20 duplicates. There are no doubt other programs that merge bibtex files, but I like this approach for the following reasons:

  1. I learned a new python module that parses bibtex files.
  2. I got my entries formatted exactly the way I wanted them.
  3. I defined what constituted a duplicate.

Of course, here we only eliminate entries with duplicate keys. If the same entry has different keys, they will be merged. This is a very hard problem to get right, since there are many possible ways to abbreviate author names, journal names, and multiple ways to write the title. That is a problem best solved by using a systematic way of generating the keys, so that you minimize the possibility of duplicates that way.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter

Sorting fields in bibtex entries

| categories: bibtex | tags:

I like consistency. In particular, for bibtex entries, I would like all the fields to be in the same order, and in all caps. Why? Because then I know where to look, and incorrect entries stand out more easily. My current bibtex file does not look like this! That is a result of adding bibtex entries from various journals, which all have different conventions. Today, I am going to look at a way to achieve what I want.

The principle idea is that we will parse the bibtex file into a list of entries represented by a convenient data structure. Then, we will format each entry the way we want, and print the result back out to a new file. I will use bibtexparser and python to do this.

Let us examine what bibtexparser does for us. Here we read in a file and get the entries. Each entry is represented as a dictionary.

from bibtexparser.bparser import BibTexParser

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

# look at the first entry
print entries[0]
{u'title': u'Effect of growth conditions on formation of TiO2-II\nthin films in atomic layer deposition process', u'journal': u'Journal of Crystal Growth', u'author': u'Aarik, J. and Aidla, A. and Sammelselg, V. and\nUustare, T.', u'number': u'3', 'id': 'aarik-1997-effec-tio2', u'volume': u'181', u'link': u'<Go to ISI>://A1997YD52700011', u'year': u'1997', 'type': u'article', u'pages': u'259-264'}

Let us take a moment to analyze our bibtex file. Let us see how many types of entries we have. That gives a chance to practice counting .

from bibtexparser.bparser import BibTexParser

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

types = [entry['type'] for entry in entries]
print dict((typ, types.count(typ)) for typ in types)
{u'inbook': 2, u'article': 90, u'book': 4, u'misc': 3, u'phdthesis': 1}

Indeed, there are a lot of entries that we do not want to do by hand. Here is the order I would like the fields to be for articles. A similar order for the other types would be fine too.

AUTHOR
TITLE
JOURNAL
VOLUME
ISSUE
PAGES
YEAR
DOI
URL or link
other fields

Bibtex lets you define arbitrary fields, and we do not want to lose these in the entries. I have for example defined fields for the path to a pdf, or to a notes file in some files. We will use python sets to handle this for us. With sets, we can conveniently compute the difference in fields between our ordered list, and the entry. Here is an example. We have a master list of keys, and an entry with extra keys. We use the difference function to get the list of extra keys.

entry = set(['author', 'title', 'journal', 'field1'])
master = set(['author', 'title'])

print entry.difference(master)
set(['journal', 'field1'])

So, we will use the list we want the order of, and then add the rest of the keys.

from bibtexparser.bparser import BibTexParser

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

field_order = ['author', 'title', 'journal', 'volume', 'issue' 'pages', 'year', 'doi', 'url', 'link']

entry_keys = set(entries[0].keys())
print entry_keys.difference(field_order)
set([u'number', 'id', 'type', u'pages'])

You can see a subtlety here, the pages key is a unicode string, but our fieldorder is a regular string. Also, number is a unicode string. It appears that all the keys are unicode except type and id. In the next block we will address that.

You should probably go ahead and remove non-ascii characters from your bib-file. We got lucky with this entry, but some entries have non-ascii characters and these cause errors.

So we need to specify the order of fields, how they should be formatted, and whether we should wrap the field contents into a nice block. We do that in the next block. Note that in the formats we use double {{ to get a literal { when we use string formatting. We use the formats to wrap the fields in brackets or quotes as needed. We use the textwrap module to neatly wrap multiline fields with indentation of the second line and beyond. By some iteration, I have made this print an entry that emacs-bibtex likes, and does not need to further reformat.

WARNING: The code below creates new files, and deletes files. Make sure you pay attention to this to avoid losing your own files. You do keep your bib-file under version control right ;).

from bibtexparser.bparser import BibTexParser
import textwrap

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

# field, format, wrap or not
field_order = [(u'author', '{{{0}}},\n', True),
               (u'title', '{{{0}}},\n', True),
               (u'journal','"{0}",\n', True),
               (u'volume','{0},\n', True),
               (u'number', '{0},\n', True),
               (u'pages', '{{{0}}},\n', True),
               (u'year', '{0},\n', True),
               (u'doi','{{{0}}},\n', False),
               (u'url','{{\url{{{0}}}}},\n', False),
               (u'link','{{\url{{{0}}}}},\n', False)]

# pick an entry, this time second to last one
entry = entries[-2]
keys = set(entry.keys())

extra_fields = keys.difference([f[0] for f in field_order])

# we do not want these in our entry, they go in the "header"
extra_fields.remove('type')
extra_fields.remove('id')

# Now build up our entry string
s = '@{type}{{{id},\n'.format(type=entry['type'].upper(),
                              id=entry['id'])

# Now handle the ordered fields, then the extra fields
for field, fmt, wrap in field_order:
    if field in entry:
        s1 = '  {0} ='.format(field.upper())
        s2 = fmt.format(entry[field])
        s3 = '{0:17s}{1}'.format(s1, s2)
        if wrap:
            # fill seems to remove trailing '\n'
            s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
        s += s3  

for field in extra_fields:
    if field in entry:
        s1 = '  {0} ='.format(field.upper())
        s2 = entry[field]
        s3 = '{0:17s}{{{1}}}'.format(s1, s2)
        s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
        s += s3  

s += '}\n\n'

print s
@ARTICLE{yang-2008-anatas-tio2,
  AUTHOR =       {Yang, H. G. and Sun, C. H. and Qiao, S. Z. and Zou,
                  J. and Liu, G. and Smith, S. C. and Cheng, H. M. and
                  Lu, G. Q.},
  TITLE =        {Anatase \ce{TiO_2} single crystals with a large
                  percentage of reactive facets},
  JOURNAL =      "Nature",
  VOLUME =       453,
  NUMBER =       7195,
  PAGES =        {638-U4},
  YEAR =         2008,
  DOI =          {10.1038/nature06964},
  LINK =         {\url{http://www.nature.com/nature/journal/v453/n7195/pdf/nature06964.pdf}},
  KEYWORD =      {TOTAL-ENERGY CALCULATIONS WAVE BASIS-SET
                  HYDROTHERMAL CONDITIONS TITANIUM-DIOXIDE SURFACE
                  OXIDE NANOSTRUCTURES NANOPARTICLES NANOCRYSTALS
                  EFFICIENCY}
}

That looks pretty good. Now, we are ready to try the whole file. We simply loop through all the entries, and append the string to a file for each entry.

from bibtexparser.bparser import BibTexParser
import os, textwrap

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

# field, format, wrap or not
field_order = [(u'author', '{{{0}}},\n', True),
               (u'title', '{{{0}}},\n', True),
               (u'journal','"{0}",\n', True),
               (u'volume','{{{0}}},\n', True),
               (u'number', '{{{0}}},\n', True),
               (u'pages', '{{{0}}},\n', True),
               (u'year', '{0},\n', True),
               (u'doi','{{{0}}},\n', False),
               (u'url','{{\url{{{0}}}}},\n', False),
               (u'link','{{\url{{{0}}}}},\n', False)]

# rm file if it exists. this is a new file, not our bibliography!
if os.path.exists('bib.bib'): os.unlink('bib.bib')

for entry in entries:
    
    keys = set(entry.keys())

    extra_fields = keys.difference([f[0] for f in field_order])
    # we do not want these in our entry
    extra_fields.remove('type')
    extra_fields.remove('id')

    # Now build up our entry string
    s = '@{type}{{{id},\n'.format(type=entry['type'].upper(),
                                  id=entry['id'])

    for field, fmt, wrap in field_order:
        if field in entry:
            s1 = '  {0} ='.format(field.upper())
            s2 = fmt.format(entry[field])
            s3 = '{0:17s}{1}'.format(s1, s2)
            if wrap:
                # fill seems to remove trailing '\n'
                s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
            s += s3  

    for field in extra_fields:
        if field in entry:
            s1 = '  {0} ='.format(field.upper())
            s2 = entry[field]
            s3 = '{0:17s}{{{1}}}'.format(s1, s2)
            s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
            s += s3  

    s += '}\n\n'

    with open('bib.bib', 'a') as f:
        f.write(s)

This results in bib.bib with 100 entries, which according to emacs is a syntactically correct bibtex file, and which builds this bibliography bib.pdf , which also has 100 entries. That usually means everything is in order (minor intention of pun there). More importantly, the fields are ordered the way I want them!

Getting to this point was an iterative process. You will want to make sure the original bib file is under version control or backed up someway, in case something happens during this transformation!

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter

Finding bibtex entries with no downloaded pdf

| categories: bibtex | tags:

We use bibtex for bibiliography management in our group. Almost every journal provides a utility to download bibtex entries, and you can pretty easily download bibtex entries from citeulike. It doesn't take too long though before you have a few hundred entries. You need some tools to interact with that database.

Bibtex-mode in Emacs provides some tools for working with your bibtex files. For example, you can (bibtex-validate) to check if entries are correct, and (bibtex-sort-buffer) to sort them by key.

I have a specific workflow to entering new entries. This is what I prefer to do:

  1. Go to journal, get bibtex entry, paste into bibtex file.
  2. delete the key that is used, if any
  3. type C-c C-c to autogenerate a key of my style
  4. Copy the key, download the pdf, and save the pdf as (format "%s.pdf" key) in my pdfs directory.
  5. Make an entry in a notes file for that reference. These entries are initially tagged as TODO to remind me to organize them.

Doing this has some payoffs; my org-mode cite links can open either the bibtex entry, or the pdf file directly from the org-file! The notes file is also an org-file, which I can organize as I see fit.

Sometimes I am lazy, and do not get all these steps done, especially the pdf download step. I like to have local copies of the pdf files so I can read them even if I am offline, and because I often annotate them using a tablet PC. It also makes it easy to send them to my students if I need to. Periodically, I like to go through my bibtex database to do some maintenance, download missing files, and notes entries etc… The problem is how do I know which entries have downloaded pdfs or note entries? It is not that difficult with a bit of elisp.

(find-file "~/Dropbox/bibliography/references.bib")
(bibtex-map-entries (lambda (bibtex-key start end) 
                      (let ((type  (cdr (car (bibtex-parse-entry)))))                        
                        (unless (file-exists-p 
                                 (format "~/Dropbox/bibliography/bibtex-pdfs/%s.pdf" bibtex-key))
                          (princ (format "%10s:  cite:%s has no pdf\n" type bibtex-key))))))
      Book:  cite:ambrose-2010-how-learn-works has no pdf
   article:  cite:gerken-2010-fluor-modul has no pdf
      Book:  cite:gray-1973-chemic-bonds has no pdf
   ARTICLE:  cite:kitchin-2003-tio2 has no pdf
   ARTICLE:  cite:kitchin-2012-prefac has no pdf
      Book:  cite:kittel-2005-introd-solid has no pdf
   ARTICLE:  cite:mccormick-2003-tio2-pd has no pdf
   ARTICLE:  cite:mhadeshwar-2004-nh3-ru has no pdf
      Misc:  cite:ni-website has no pdf
   ARTICLE:  cite:norskov-2006-respon has no pdf
      Book:  cite:reif-1965-fundam-statis has no pdf
   article:  cite:risch-2012-water-oxidat has no pdf
   ARTICLE:  cite:shultz-1995-prepar-and has no pdf
   ARTICLE:  cite:shultz-1997-prepar has no pdf
   ARTICLE:  cite:song-2002-h3pw1 has no pdf

Using that list, I can click on those links, which takes me to the entry in file. That entry probably has a url or doi that makes it easy to navigate to the journal page where I can download the pdf file. You could improve on the code above by filtering out only articles, for example.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Finding bibtex entries with non-ascii characters

| categories: bibtex | tags:

I have found that some journals cannot handle bibtex entries with non-ascii characters in them. Unfortunately, when you paste bibtex entries into your reference file from the web, there are often non-ascii characters in them. Emacs usually shows those characters just fine, so it is difficult to find them. Here is a little recipe to go through each entry to find entries with non-ascii characters. These range from accented characters, greek letters, degree symbols, dashes, fancy quotes, etc… Since they are hard to see by eye, we can let Emacs find them for us, and then replace them with the corresponding ascii LaTeX commands.

I found a function to find non-ascii characters here: http://www.emacswiki.org/emacs/FindingNonAsciiCharacters . Now, we use a modified version of this on each entry in a bibtex file. If we find a character, we will print an org-mode link to make it easy to get right to the entry.

(defun contains-non-ascii-char-p ()
  "tests if buffer contains non-ascii character"
  (interactive)
  (let (point)
    (save-excursion
      (setq point
            (catch 'non-ascii
              (while (not (eobp))
                (or (eq (char-charset (following-char))
                        'ascii)
                    (throw 'non-ascii (point)))
                (forward-char 1)))))
    (if point
        (goto-char point)
      nil)))


(find-file "~/Dropbox/bibliography/references.bib")
(bibtex-map-entries (lambda (bibtex-key start end)                        
                      (save-restriction
                        ;; narrow so we only look at this entry. save-restriction will rewiden
                        (bibtex-narrow-to-entry)
                        (when (contains-non-ascii-char-p) (princ (format "cite:%s" bibtex-key)))))))
cite:suntivich-2011-perov-oxide

You can see I only had one reference in that file with a non-ascii character. I think it is best practice to replace these with pure LaTeX commands. See http://en.wikibooks.org/wiki/LaTeX/Special_Characters for a good reference on what commands are used for the accented characters.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter
« Previous Page