Extracting bibtex file from an org-buffer

| categories: org-mode, bibtex | tags:

Table of Contents

We use citation links a lot in our org-files, like this:

cite:thompson-2014-co2-react
. Sometimes there are multiple citations like this
cite:mehta-2014-ident-poten,hallenbeck-2013-effec-o2
. It would be convenient at times to extract a bibtex file from these citations. That way we could easily share files. This is possible in RefTeX from a LaTeX file. Org makes it easy to export to LaTeX, so this seems like it should be easy. It would be easy, if I always put the bibliography link in the file. I usually do not, so let us check if that is the case, and if it is not add the bibliography to the end before we export. Then, with the LaTeX file in hand, we open it, and call the RefTeX functions to get the bibliography. Finally, we will create a link to the actual created file, and add it as a source block that can be tangled at the end of the file.

Here is a function that does the extraction and some house cleaning. We actually take the contents of the buffer and save it in a temporary file, so that we do not accidentally clobber a tex or bibtex file here.

(defun kg-extract-bibtex ()
  "create bibtex file of entries cited in this buffer"

  (let* ((tempname (make-temp-file "extract-bib"))
         (contents (buffer-string))
         (cb (current-buffer))
         basename texfile bibfile results)

    (find-file tempname)
    (insert contents)
    (setq basename (file-name-sans-extension
                    (file-name-nondirectory buffer-file-name))
          texfile (concat basename ".tex")
          bibfile (concat basename ".bib"))

  (save-excursion
    (goto-char (point-min))
    (unless (re-search-forward "^bibliography:" (point-max) 'end)
      (insert (format "\nbibliography:%s" (mapconcat 'identity reftex-default-bibliography ",")))))

    (org-latex-export-to-latex)
    (find-file texfile)
    (reftex-parse-all)
    (reftex-create-bibtex-file bibfile)
    (setq results (buffer-string))
    (kill-buffer bibfile)
    (kill-buffer texfile)
    (delete-file texfile)
    (delete-file tempname)

    (switch-to-buffer cb)
    (save-excursion
      (goto-char (point-max))
      (insert (format "

** Bibtex entries

#+BEGIN_EXAMPLE: 
%s
#+END_EXAMPLE" results)))))

(kg-extract-bibtex)

There it is! The src block does not render in HTML very well, since it appears to be simple text. It looks fine in the org file though.

It might be a good idea to replace the bibliography line with the new file, but I will leave that as an exercise for later.

1 Bibtex entries

#+BEGINEXAMPLE: @article{hallenbeck-2013-effec-o2, author = "Hallenbeck, Alexander P. and Kitchin, John R.", title = "Effects of \ce{O_2} and \ce{SO_2} on the Capture Capacity of a Primary-Amine Based Polymeric \ce{CO_2} Sorbent", year = 2013, doi = "10.1021/ie400582a", eprint = "http://pubs.acs.org/doi/pdf/10.1021/ie400582a ", journal = "Industrial \& Engineering Chemistry Research", pages = "10788-10794", url = "http://pubs.acs.org/doi/abs/10.1021/ie400582a ", }

@article{mehta-2014-ident-poten, author = {Mehta, Prateek and Salvador, Paul A. and Kitchin, John R.}, title = {Identifying Potential BO2 Oxide Polymorphs for Epitaxial Growth Candidates}, journal = {ACS Applied Materials \& Interfaces}, volume = 0, number = 0, pages = {null}, year = 2014, doi = {10.1021/am4059149}, URL = {http://pubs.acs.org/doi/abs/10.1021/am4059149 }, eprint = {http://pubs.acs.org/doi/pdf/10.1021/am4059149 } }

@Article{thompson-2014-co2-react, author = {Thompson, Robert L. and Albenze, Erik and Shi, Wei and Hopkinson, David and Damodaran, Krishnan and Lee, Anita and Kitchin, John and Luebke, David Richard and Nulwala, Hunaid}, title = {\ce{CO_2} Reactive Ionic Liquids: Effects of functional groups on the anion and its influence on the physical properties}, journal = {RSC Adv.}, year = 2014, pages = "-", publisher = {The Royal Society of Chemistry}, doi = {10.1039/C3RA47097K}, url = {https://doi.org/10.1039/C3RA47097K }, abstract = "Next generation of gas separation materials are needed to alleviate issues faced in energy and environmental area. Ionic liquids (ILs) are promising class of material for CO2 separations. In this work{,} CO2 reactive triazolides ILs were synthesized and characterized with the aim of developing deeper understanding on how structural changes affect the overall properties for CO2 separation. Important insights were gained illustrating the effects of substituents on the anion. It was found that substituents play a crucial role in dictating the overall physical properties of reactive ionic liquids. Depending upon the electronic and steric nature of the substituent{,} CO2 capacities between 0.07-0.4 mol CO2/mol IL were observed. Detailed spectroscopic{,} CO2 absorption{,} rheological{,} and simulation studies were carried out to understand the nature and influence of these substituents. The effect of water content was also evaluated{,} and it was found that water had an unexpected impact on the properties of these materials{,} resulting in an increased viscosity{,} but little change in the CO2 reactivity." } #+ENDEXAMPLE

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter

yasnippets for jasp, ase and python

| categories: jasp, emacs, ase | tags:

In using [[http://github.com/jkitchin/jasp for calculations, I find there are lots of small python phrases I use over and over. Today I will examine using yasnippet to save time and keystrokes. yasnippet is a template expansion module, where you type a small set of characters, press Tab, and the characters "expand" to the full text. It is pretty sophisticated, and allows you to define "tab-stops" which you interactively fill in, and tab between like filling in a form.

All the snippets are defined in the

*Appendix
.

1 Tangle the snippets, and add them to yasnippet

Each snippet definition belongs in a file in a directory. The main directory is called "snippets". Since I anticipate using these snippets in org-mode, each snippet is defined in a directory within snippets called "org-mode". First, we make the directory here. I also want to use the snippets in python mode, so we also create a python-mode directory here. We do not have to duplicate the snippets. We can create a file called .yas-parents , with one line in it containing "org-mode".

mkdir -p snippets/org-mode
mkdir -p snippets/python-mode
echo "org-mode" > snippets/python-mode/.yas-parents

Each snippet is defined in a src block with a :tangle header. So, we can extract them all in one command here.

(org-babel-tangle)
snippets/org-mode/iase snippets/org-mode/imp snippets/org-mode/inp snippets/org-mode/ij snippets/org-mode/pl snippets/org-mode/pyl snippets/org-mode/pxl snippets/org-mode/pp snippets/org-mode/npa snippets/org-mode/awt snippets/org-mode/avw snippets/org-mode/agf snippets/org-mode/ape snippets/org-mode/atms snippets/org-mode/atm snippets/org-mode/cga snippets/org-mode/cc snippets/org-mode/wjn snippets/org-mode/wjl

We also need to add our new directory to yasnippets. This is done by adding the directory to the yas-snippet-dirs variable. You could add this to your init.el file to permanently add these snippets.

(add-to-list 'yas-snippet-dirs "c:/Users/jkitchin/Dropbox/blogofile-jkitchin.github.com/_blog/snippets")
c:/Users/jkitchin/Dropbox/blogofile-jkitchin.github.com/blog/snippets ~/.emacs.d/snippets c:/users/jkitchin/Dropbox/kitchingroup/jmax/elpa/yasnippet-20140106.1009/snippets

Finally, we reload all the snippet definitions, so our new definitions are ready to use.

(yas-reload-all)
[yas] Reloaded everything (snippets will load just-in-time)... (some errors, check *Messages*).

Alternatively, you might just load this directory.

(yas-load-directory "./snippets")

2 Using the snippets

Each of these snippets is for a python phrase, but I usually write my python blocks in org-mode. You would use these by typing the shortcut name, and then pressing tab. Below I show what each shortcut expands to.

wjl → with jasp('') as calc:

wjn → with jasp('',) as calc: calc.calculate(atoms)

cc → calc.calculate(atoms)

cga → atoms = calc.get_atoms()

atm → Atom('', )

atms → atoms = Atoms([], cell)=

ape → atoms.get_potential_energy()

agf → atoms.get_forces()

avw → from ase.visualize import view view(atoms)

awt → from ase.io import write write('.png', atoms, show_unit_cell=2)

npa → np.array()

pp → plt.plot(, )

pxl → plt.xlabel()

pyl → plt.ylabel()

pl → plt.legend()

ij → from jasp import *

inp → import numpy as np

imp → import matplotlib.pyplot as plt

iase → from ase import Atom, Atoms

What other snippets would be handy?

3 Appendix

3.1 jasp snippets

# -*- mode: snippet -*-
# --
with jasp('$1') as calc:
    $0
# -*- mode: snippet -*-
# --
with jasp('$1',$0) as calc:
    calc.calculate(atoms)
# -*- mode: snippet -*-
# --
calc.calculate(atoms)
# -*- mode: snippet -*-
# --
atoms = calc.get_atoms()

3.2 ase snippets

Template for an ase.Atom

# -*- mode: snippet -*-
# --
Atom('$1', $2)
# -*- mode: snippet -*-
# --
atoms = Atoms([$1], cell=$2)
# -*- mode: snippet -*-
# --
atoms.get_potential_energy()
# -*- mode: snippet -*-
# --
atoms.get_forces()
# -*- mode: snippet -*-
# --
from ase.visualize import view
view(${1:atoms})
# -*- mode: snippet -*-
# --
from ase.io import write
write('$1.png', ${2:atoms}, show_unit_cell=${3:2})

3.3 python snippets

# -*- mode: snippet -*-
# --
import numpy as np
# -*- mode: snippet -*-
# --
import matplotlib.pyplot as plt
# -*- mode: snippet -*-
# --
from ase import Atom, Atoms
# -*- mode: snippet -*-
# --
np.array($0)
# -*- mode: snippet -*-
# --
plt.plot($1, $2)
# -*- mode: snippet -*-
# --
plt.xlabel($1)
# -*- mode: snippet -*-
# --
plt.ylabel($1)
# -*- mode: snippet -*-
# --
plt.legend($1)
# -*- mode: snippet -*-
# --
from jasp import *

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter

A dynamic snippet for a task due 7 days from now

| categories: org-mode, emacs | tags:

I have been playing with yasnippets. A pretty cool feature is that you can run elisp code in the template to generate text. Below, I define a snippet that will create a todo item due 7 days from the time you define it. This is an unconventional way to define a snippet, but I did not want to save it to a file just to try it out. So, I put it in a temporary buffer, and load it from there. When you run this block, it will note it is a new snippet, and ask if you want to save it. You can say no.

We will use the code we developed here to create a timestamp from the current time plus seven days.

(yas-global-mode)
(with-temp-buffer
  (insert "# name : todo-followup
# --

*************** TODO $1
${2:             DEADLINE: `(let ((seven-days (seconds-to-time (* 7 24 60 60))))
  (format-time-string \"<%Y-%m-%d %a>\" (time-add (current-time) seven-days)))`}$0
*************** END 
")
  (yas-load-snippet-buffer-and-close 'org-mode))

Now, you will have a new entry in the YASnippet menu that is called todo-followup. If you put the cursor on a blank line, and select that entry you get this below (after you fill in the text for the headline, of course!):

*************** TODO see how many times this was viewed
		DEADLINE: <2014-02-23 Sun>
*************** END

That is pretty nice, as it saves a lot of keystrokes for that particular kind of task. Let us up the ante, and see if we can make it interactive so you can enter the number of days from now the task is due.

(yas-global-mode)
(with-temp-buffer
  (insert "# name : todo-followup
# --

*************** TODO $1
${2:             DEADLINE: `(let ((ndays (seconds-to-time (* (string-to-int (read-from-minibuffer \"Days until due: \")) 24 60 60))))
  (format-time-string \"<%Y-%m-%d %a>\" (time-add (current-time) ndays)))`}$0
*************** END 
")
  (yas-load-snippet-buffer-and-close 'org-mode))
*************** TODO sweet!
		DEADLINE: <2014-02-26 Wed>
*************** END

Well, that made it just a bit sweeter! I was prompted for the "Days until due:", entered 10 days, and a date 10 days from now was automatically entered!

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter

Merging bibtex files and avoiding duplicates

| categories: bibtex | tags:

I usually advocate to have a master bibtex file with all entries in it. Emacs is helpful at avoiding duplicate entries as you enter them. Sometimes though, you have more than one bibtex file. Maybe you started one for a new project, or someone sent you one. In any case, you want to merge the files into one file. Bibtex requires each entry to have a unique key.

Let us begin. I have two bibtex files I exported from Endnote. I have already removed all the non-ascii characters and cleaned them up pretty well. We start with some analysis.

from bibtexparser.bparser import BibTexParser
with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries1 = bp.get_entry_list()

print '{0} entries in file 1'.format(len(entries1))

with open('../../CMU/proposals/link-to-2014/perovskite-strain/perovskite-strain.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries2 = bp.get_entry_list()

print '{0} entries in file 2'.format(len(entries2))
100 entries in file 1
129 entries in file 2

Now, let see how many duplicates there are. It is easy to use sets for this.

# store keys to check for duplicates
from bibtexparser.bparser import BibTexParser
with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries1 = bp.get_entry_list()

with open('../../CMU/proposals/link-to-2014/perovskite-strain/perovskite-strain.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries2 = bp.get_entry_list()

entry1_keys = set([entry['id'] for entry in entries1])
entry2_keys = set([entry['id'] for entry in entries2])

duplicates = entry1_keys & entry2_keys
print 'There are {0} duplicates'.format(len(duplicates))
print duplicates
There are 20 duplicates
set(['nolan-2008-vacan-co', 'giocondi-2001-spatial', 'giocondi-2001-spatial-batio3', 'wang-2006-oxidat-gga', 'piskunov-2008-elect-lamno3', 'pala-2007-modif-oxidat', 'chretien-2006-densit-funct', 'giocondi-2008-sr2nb-batio3', 'kushima-2010-compet-lacoo3', 'pala-2009-co-ti', 'giocondi-2007-srtio3', 'lee-2009-ab-labo3', 'balasubramanian-2005-epitax-phase', 'mastrikov-2010-pathw-oxygen', 'shapovalov-2007-catal', 'evarestov-2005-compar-lcao', 'choi-2007-comput-study', 'havelia-2009-nucleat-growt', 'lee-2009-ab-defec', 'lee-2009-predic-surfac'])

Ok, now we make a function to format each entry. We take that code from this this post and turn it into a function. Then we add all the entries from the first file. Then, we add entries from the second file as long as the key is not in the list from the first file.

from bibtexparser.bparser import BibTexParser
import os, textwrap

def format_bibtex_entry(entry):
    # field, format, wrap or not
    field_order = [(u'author', '{{{0}}},\n', True),
                   (u'title', '{{{0}}},\n', True),
                   (u'journal','"{0}",\n', True),
                   (u'volume','{{{0}}},\n', True),
                   (u'number', '{{{0}}},\n', True),
                   (u'pages', '{{{0}}},\n', True),
                   (u'year', '{0},\n', True),
                   (u'doi','{{{0}}},\n', False),
                   (u'url','{{\url{{{0}}}}},\n', False),
                   (u'link','{{\url{{{0}}}}},\n', False)]
    
    keys = set(entry.keys())

    extra_fields = keys.difference([f[0] for f in field_order])
    # we do not want these in our entry
    extra_fields.remove('type')
    extra_fields.remove('id')

    # Now build up our entry string
    s = '@{type}{{{id},\n'.format(type=entry['type'].upper(),
                                  id=entry['id'])

    for field, fmt, wrap in field_order:
        if field in entry:
            s1 = '  {0} ='.format(field.upper())
            s2 = fmt.format(entry[field])
            s3 = '{0:17s}{1}'.format(s1, s2)
            if wrap:
                # fill seems to remove trailing '\n'
                s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
            s += s3  

    for field in extra_fields:
        if field in entry:
            s1 = '  {0} ='.format(field.upper())
            s2 = entry[field]
            s3 = '{0:17s}{{{1}}}'.format(s1, s2)
            s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
            s += s3  

    s += '}\n\n'
    return s

if os.path.exists('merged.bib'): os.unlink('merged.bib')    

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries1 = bp.get_entry_list()

for entry in entries1:
    with open('merged.bib', 'a') as f:
        f.write(format_bibtex_entry(entry))

# store keys to check for duplicates
entry1_keys = [entry['id'] for entry in entries1]

with open('../../CMU/proposals/link-to-2014/perovskite-strain/perovskite-strain.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries2 = bp.get_entry_list()

for entry in entries2:
    if not entry['id'] in entry1_keys:
        with open('merged.bib', 'a') as f:
            f.write(format_bibtex_entry(entry))

Here is the merged file: merged.bib and the corresponding bibliography merged.pdf . There are 209 entries in it, which is what we expected given that there were 20 duplicates. There are no doubt other programs that merge bibtex files, but I like this approach for the following reasons:

  1. I learned a new python module that parses bibtex files.
  2. I got my entries formatted exactly the way I wanted them.
  3. I defined what constituted a duplicate.

Of course, here we only eliminate entries with duplicate keys. If the same entry has different keys, they will be merged. This is a very hard problem to get right, since there are many possible ways to abbreviate author names, journal names, and multiple ways to write the title. That is a problem best solved by using a systematic way of generating the keys, so that you minimize the possibility of duplicates that way.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter

Sorting fields in bibtex entries

| categories: bibtex | tags:

I like consistency. In particular, for bibtex entries, I would like all the fields to be in the same order, and in all caps. Why? Because then I know where to look, and incorrect entries stand out more easily. My current bibtex file does not look like this! That is a result of adding bibtex entries from various journals, which all have different conventions. Today, I am going to look at a way to achieve what I want.

The principle idea is that we will parse the bibtex file into a list of entries represented by a convenient data structure. Then, we will format each entry the way we want, and print the result back out to a new file. I will use bibtexparser and python to do this.

Let us examine what bibtexparser does for us. Here we read in a file and get the entries. Each entry is represented as a dictionary.

from bibtexparser.bparser import BibTexParser

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

# look at the first entry
print entries[0]
{u'title': u'Effect of growth conditions on formation of TiO2-II\nthin films in atomic layer deposition process', u'journal': u'Journal of Crystal Growth', u'author': u'Aarik, J. and Aidla, A. and Sammelselg, V. and\nUustare, T.', u'number': u'3', 'id': 'aarik-1997-effec-tio2', u'volume': u'181', u'link': u'<Go to ISI>://A1997YD52700011', u'year': u'1997', 'type': u'article', u'pages': u'259-264'}

Let us take a moment to analyze our bibtex file. Let us see how many types of entries we have. That gives a chance to practice counting .

from bibtexparser.bparser import BibTexParser

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

types = [entry['type'] for entry in entries]
print dict((typ, types.count(typ)) for typ in types)
{u'inbook': 2, u'article': 90, u'book': 4, u'misc': 3, u'phdthesis': 1}

Indeed, there are a lot of entries that we do not want to do by hand. Here is the order I would like the fields to be for articles. A similar order for the other types would be fine too.

AUTHOR
TITLE
JOURNAL
VOLUME
ISSUE
PAGES
YEAR
DOI
URL or link
other fields

Bibtex lets you define arbitrary fields, and we do not want to lose these in the entries. I have for example defined fields for the path to a pdf, or to a notes file in some files. We will use python sets to handle this for us. With sets, we can conveniently compute the difference in fields between our ordered list, and the entry. Here is an example. We have a master list of keys, and an entry with extra keys. We use the difference function to get the list of extra keys.

entry = set(['author', 'title', 'journal', 'field1'])
master = set(['author', 'title'])

print entry.difference(master)
set(['journal', 'field1'])

So, we will use the list we want the order of, and then add the rest of the keys.

from bibtexparser.bparser import BibTexParser

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

field_order = ['author', 'title', 'journal', 'volume', 'issue' 'pages', 'year', 'doi', 'url', 'link']

entry_keys = set(entries[0].keys())
print entry_keys.difference(field_order)
set([u'number', 'id', 'type', u'pages'])

You can see a subtlety here, the pages key is a unicode string, but our fieldorder is a regular string. Also, number is a unicode string. It appears that all the keys are unicode except type and id. In the next block we will address that.

You should probably go ahead and remove non-ascii characters from your bib-file. We got lucky with this entry, but some entries have non-ascii characters and these cause errors.

So we need to specify the order of fields, how they should be formatted, and whether we should wrap the field contents into a nice block. We do that in the next block. Note that in the formats we use double {{ to get a literal { when we use string formatting. We use the formats to wrap the fields in brackets or quotes as needed. We use the textwrap module to neatly wrap multiline fields with indentation of the second line and beyond. By some iteration, I have made this print an entry that emacs-bibtex likes, and does not need to further reformat.

WARNING: The code below creates new files, and deletes files. Make sure you pay attention to this to avoid losing your own files. You do keep your bib-file under version control right ;).

from bibtexparser.bparser import BibTexParser
import textwrap

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

# field, format, wrap or not
field_order = [(u'author', '{{{0}}},\n', True),
               (u'title', '{{{0}}},\n', True),
               (u'journal','"{0}",\n', True),
               (u'volume','{0},\n', True),
               (u'number', '{0},\n', True),
               (u'pages', '{{{0}}},\n', True),
               (u'year', '{0},\n', True),
               (u'doi','{{{0}}},\n', False),
               (u'url','{{\url{{{0}}}}},\n', False),
               (u'link','{{\url{{{0}}}}},\n', False)]

# pick an entry, this time second to last one
entry = entries[-2]
keys = set(entry.keys())

extra_fields = keys.difference([f[0] for f in field_order])

# we do not want these in our entry, they go in the "header"
extra_fields.remove('type')
extra_fields.remove('id')

# Now build up our entry string
s = '@{type}{{{id},\n'.format(type=entry['type'].upper(),
                              id=entry['id'])

# Now handle the ordered fields, then the extra fields
for field, fmt, wrap in field_order:
    if field in entry:
        s1 = '  {0} ='.format(field.upper())
        s2 = fmt.format(entry[field])
        s3 = '{0:17s}{1}'.format(s1, s2)
        if wrap:
            # fill seems to remove trailing '\n'
            s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
        s += s3  

for field in extra_fields:
    if field in entry:
        s1 = '  {0} ='.format(field.upper())
        s2 = entry[field]
        s3 = '{0:17s}{{{1}}}'.format(s1, s2)
        s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
        s += s3  

s += '}\n\n'

print s
@ARTICLE{yang-2008-anatas-tio2,
  AUTHOR =       {Yang, H. G. and Sun, C. H. and Qiao, S. Z. and Zou,
                  J. and Liu, G. and Smith, S. C. and Cheng, H. M. and
                  Lu, G. Q.},
  TITLE =        {Anatase \ce{TiO_2} single crystals with a large
                  percentage of reactive facets},
  JOURNAL =      "Nature",
  VOLUME =       453,
  NUMBER =       7195,
  PAGES =        {638-U4},
  YEAR =         2008,
  DOI =          {10.1038/nature06964},
  LINK =         {\url{http://www.nature.com/nature/journal/v453/n7195/pdf/nature06964.pdf}},
  KEYWORD =      {TOTAL-ENERGY CALCULATIONS WAVE BASIS-SET
                  HYDROTHERMAL CONDITIONS TITANIUM-DIOXIDE SURFACE
                  OXIDE NANOSTRUCTURES NANOPARTICLES NANOCRYSTALS
                  EFFICIENCY}
}

That looks pretty good. Now, we are ready to try the whole file. We simply loop through all the entries, and append the string to a file for each entry.

from bibtexparser.bparser import BibTexParser
import os, textwrap

with open('../../CMU/proposals/link-to-2014/bo2-polymorphs/bo2-polymorphs.bib', 'r') as bibfile:
    bp = BibTexParser(bibfile)
    entries = bp.get_entry_list()

# field, format, wrap or not
field_order = [(u'author', '{{{0}}},\n', True),
               (u'title', '{{{0}}},\n', True),
               (u'journal','"{0}",\n', True),
               (u'volume','{{{0}}},\n', True),
               (u'number', '{{{0}}},\n', True),
               (u'pages', '{{{0}}},\n', True),
               (u'year', '{0},\n', True),
               (u'doi','{{{0}}},\n', False),
               (u'url','{{\url{{{0}}}}},\n', False),
               (u'link','{{\url{{{0}}}}},\n', False)]

# rm file if it exists. this is a new file, not our bibliography!
if os.path.exists('bib.bib'): os.unlink('bib.bib')

for entry in entries:
    
    keys = set(entry.keys())

    extra_fields = keys.difference([f[0] for f in field_order])
    # we do not want these in our entry
    extra_fields.remove('type')
    extra_fields.remove('id')

    # Now build up our entry string
    s = '@{type}{{{id},\n'.format(type=entry['type'].upper(),
                                  id=entry['id'])

    for field, fmt, wrap in field_order:
        if field in entry:
            s1 = '  {0} ='.format(field.upper())
            s2 = fmt.format(entry[field])
            s3 = '{0:17s}{1}'.format(s1, s2)
            if wrap:
                # fill seems to remove trailing '\n'
                s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
            s += s3  

    for field in extra_fields:
        if field in entry:
            s1 = '  {0} ='.format(field.upper())
            s2 = entry[field]
            s3 = '{0:17s}{{{1}}}'.format(s1, s2)
            s3 = textwrap.fill(s3, subsequent_indent=' '*18, width=70) + '\n'
            s += s3  

    s += '}\n\n'

    with open('bib.bib', 'a') as f:
        f.write(s)

This results in bib.bib with 100 entries, which according to emacs is a syntactically correct bibtex file, and which builds this bibliography bib.pdf , which also has 100 entries. That usually means everything is in order (minor intention of pun there). More importantly, the fields are ordered the way I want them!

Getting to this point was an iterative process. You will want to make sure the original bib file is under version control or backed up someway, in case something happens during this transformation!

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter
« Previous Page -- Next Page »