A sexp version of a bibtex entry

| categories: bibtex, lisp | tags:

Below you see a typical bibtex entry. Today we explore an alternate approach to represent the information (data) in that entry as s-expressions, i.e. as a lisp data structure. Why? because it seems like an interesting exploration!

@article{hallenbeck-2013-effec-o2,
  author =       "Hallenbeck, Alexander P. and Kitchin, John R.",
  title =        {Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a
                  primary-amine based polymeric \ce{CO_2} sorbent},
  keywords =     {RUA, orgmode},
  journal =      "Industrial \& Engineering Chemistry Research",
  pages =        "10788-10794",
  year =         2013,
  volume =       {52},
  number =       {31},
  doi =          "10.1021/ie400582a",
  url =          "http://pubs.acs.org/doi/abs/10.1021/ie400582a",
  eprint =       "http://pubs.acs.org/doi/pdf/10.1021/ie400582a",
}

Here is what that same data structure might look like as a sexp-based lisp data structure.

(article "hallenbeck-2013-effec-o2"
         (author "Hallenbeck, Alexander P. and Kitchin, John R.")
         (title "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent")
         (journal "Industrial \& Engineering Chemistry Research")
         (pages "10788-10794")
         (year 2013)
         (number 31)
         (doi "10.1021/ie400582a")
         (url "http://pubs.acs.org/doi/abs/10.1021/ie400582a")
         (eprint "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))

We can retrieve data from the sexp form pretty easily. Here we get the authors.

(let* ((art '(article "hallenbeck-2013-effec-o2"
                      (author "Hallenbeck, Alexander P. and Kitchin, John R.")
                      (title "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent")
                      (journal "Industrial \& Engineering Chemistry Research")
                      (pages "10788-10794")
                      (year 2013)
                      (number 31)
                      (doi "10.1021/ie400582a")
                      (url "http://pubs.acs.org/doi/abs/10.1021/ie400582a")
                      (eprint "http://pubs.acs.org/doi/pdf/10.1021/ie400582a")))
       (fields (cddr art)))
  (cadr (assoc 'author fields)))
Hallenbeck, Alexander P. and Kitchin, John R.

That is simple enough you might just write a little function to streamline it like this, and return a formatted string.

(defun get-article-field (article field)
  "Return value of FIELD in ARTICLE."
  (cadr (assoc field (cddr article))))

(let ((art '(article "hallenbeck-2013-effec-o2"
                     (author "Hallenbeck, Alexander P. and Kitchin, John R.")
                     (title "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent")
                     (journal "Industrial \& Engineering Chemistry Research")
                     (pages "10788-10794")
                     (year 2013)
                     (number 31)
                     (doi "10.1021/ie400582a")
                     (url "http://pubs.acs.org/doi/abs/10.1021/ie400582a")
                     (eprint "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))))
  (format "%s, doi:%s (%s)"
          (get-article-field art 'author)
          (get-article-field art 'doi)
          (get-article-field art 'year)))
Hallenbeck, Alexander P. and Kitchin, John R., doi:10.1021/ie400582a (2013)

You might be wondering, why is that even a little bit interesting? One reason is that it looks a little like what lisp returns after parsing an xml file. Another is, the data structure looks kind of like data, but it is also some code, if article was defined as a function! Let us consider what this might look like. I use a macro to define the field functions since in this case they all do the same thing, and these simply return a string with the field-name and value in curly brackets. We eval the macro to make sure it defines the function. I define an article function that wraps the fields in @bibtex-key{fields}, which defines a bibtex entry.

(defmacro make-field (field-name)
  "define a field that returns a string"
  `(defun ,(intern field-name) (content)
     (format "  %s = {%s}" ,field-name content)))

(loop for field in '("author" "title" "journal" "pages" "number" "doi" "url" "eprint" "year")
  do (eval `(make-field ,field)))

(defun article (bibtex-key &rest fields)
  (concat
   (format "@article{%s,\n" bibtex-key)
   (mapconcat (lambda (field) (eval field)) fields ",\n")
   "\n}\n"))

(article "hallenbeck-2013-effec-o2"
         (author "Hallenbeck, Alexander P. and Kitchin, John R.")
         (title "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent")
         (journal "Industrial \& Engineering Chemistry Research")
         (pages "10788-10794")
         (number 31)
         (year 2013)
         (doi "10.1021/ie400582a")
         (url "http://pubs.acs.org/doi/abs/10.1021/ie400582a")
         (eprint "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))
@article{hallenbeck-2013-effec-o2,
  author = {Hallenbeck, Alexander P. and Kitchin, John R.},
  title = {Effects of ce{O_2} and ce{SO_2} on the capture capacity of a primary-amine based polymeric ce{CO_2} sorbent},
  journal = {Industrial & Engineering Chemistry Research},
  pages = {10788-10794},
  number = {31},
  year = {2013},
  doi = {10.1021/ie400582a},
  url = {http://pubs.acs.org/doi/abs/10.1021/ie400582a},
  eprint = {http://pubs.acs.org/doi/pdf/10.1021/ie400582a}
}

Wow. We executed our data structure, and got a bibtex entry! That seems moderately interesting to me. Next is an example of taking the same data structure and rendering it as xml. This is some lispy wizardry, rather than use a macro to define functions, I temporarily define functions within a cl-flet macro, which I have to collect as a list of code. Then, I eval the list. This feels pretty odd, but seems like a lispy kind of thing to do.

(eval
 (list 'cl-flet
       (append (loop for field in '("author" "title" "journal" "pages"
                                      "number" "doi" "url" "eprint" "year")
                       collect (list (intern field)
                                     '(content)
                                     `(format "  <%s>%s</%s>" ,field content ,field)))
               '((article (bibtex-key &rest fields)
                          (concat
                           (format
                            "<article bibtex-key=\"%s\">\n" bibtex-key)
                           (mapconcat (lambda (field) (eval field)) fields "\n")
                           "\n</article>")))
               )
       ;; body of cl-flet
       '(article "hallenbeck-2013-effec-o2"
                (author "Hallenbeck, Alexander P. and Kitchin, John R.")
                (title "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent")
                (journal "Industrial \& Engineering Chemistry Research")
                (pages "10788-10794")
                (number 31)
                (year 2013)
                (doi "10.1021/ie400582a")
                (url "http://pubs.acs.org/doi/abs/10.1021/ie400582a")
                (eprint "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))))
<article bibtex-key="hallenbeck-2013-effec-o2">
  <author>Hallenbeck, Alexander P. and Kitchin, John R.</author>
  <title>Effects of ce{O_2} and ce{SO_2} on the capture capacity of a primary-amine based polymeric ce{CO_2} sorbent</title>
  <journal>Industrial & Engineering Chemistry Research</journal>
  <pages>10788-10794</pages>
  <number>31</number>
  <year>2013</year>
  <doi>10.1021/ie400582a</doi>
  <url>http://pubs.acs.org/doi/abs/10.1021/ie400582a</url>
  <eprint>http://pubs.acs.org/doi/pdf/10.1021/ie400582a</eprint>
</article>

Prefer json? No problem, just reformat the functions!

(eval
 (list 'cl-flet
       (append (loop for field in '("author" "title" "journal" "pages"
                                      "number" "doi" "url" "eprint" "year")
                       collect (list (intern field)
                                     '(content)
                                     `(format "   \"%s\": \"%s\"" ,field content)))
               '((article (bibtex-key &rest fields)
                          (concat
                           (format
                            "{\"article\":\n  {\"bibtex-key\": \"%s\",\n" bibtex-key)
                           (mapconcat (lambda (field) (eval field)) fields ",\n")
                           "}\n}"))))
       ;; body of cl-flet
       '(article "hallenbeck-2013-effec-o2"
                (author "Hallenbeck, Alexander P. and Kitchin, John R.")
                (title "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent")
                (journal "Industrial \& Engineering Chemistry Research")
                (pages "10788-10794")
                (number 31)
                (year 2013)
                (doi "10.1021/ie400582a")
                (url "http://pubs.acs.org/doi/abs/10.1021/ie400582a")
                (eprint "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))))
{"article":
  {"bibtex-key": "hallenbeck-2013-effec-o2",
   "author": "Hallenbeck, Alexander P. and Kitchin, John R.",
   "title": "Effects of ce{O_2} and ce{SO_2} on the capture capacity of a primary-amine based polymeric ce{CO_2} sorbent",
   "journal": "Industrial & Engineering Chemistry Research",
   "pages": "10788-10794",
   "number": "31",
   "year": "2013",
   "doi": "10.1021/ie400582a",
   "url": "http://pubs.acs.org/doi/abs/10.1021/ie400582a",
   "eprint": "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"}
}

Is this useful? Great question. I don't plan to convert by bibtex files to sexp format anytime soon ;) The format I used above is just a simple one. It might be desirable to include individual authors instead of an author string, and maybe support attributes to establish an author order. An author structure might be more complex to include scientific ids like an orcid, alternative names, etc… Finally, the s-exp data structure is super easy to use in lisp, but other languages would have parse it into some native structure the way they parse json or xml. There is limited support for s-expressions in most other non-lispy languages.

I like the idea of data representation as code, and its conversion to some other kind of format. It is subtle here, but notice we never had to write a parser for the sexp notation. That already exists as the lisp interpreter. We did write code to use the data, and convert the data. The sexp notation is pretty easy to write, in contrast to the xml or json representations. Some interesting issues might be what to do with fields that are not defined, perhaps a macro would be used on the fly, or in the cl-flet definition. It is hard to imagine doing these things in another language than lisp!

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter