The Kitchin Research Group: ref

A python version of the s-exp bibtex entry

Posted June 11, 2015 at 10:02 AM | categories: ref, python, bibtex | tags:

In this post we explored representing a bibtex entry in lisp s-exp notation, and showed interesting things that enables. Here, I explore something similar in Python. The s-exp notation in Python is really more like tuples. It looks almost identical, except we need a lot of commas for the Python syntax. One significant difference in Python is we need to define the functions in advance because otherwise the function symbols are undefined. Similar to lisp, we can define the field functions at run-time in a loop. We have to use an eval statement, which some Pythonistas find distasteful, but it is not that different to me than what we did in lisp.

The syntax for "executing" the data structure is quite different than in lisp, because this data is not code in Python. Instead, we have to deconstruct the data, knowing that the function is the first object, and it takes the remaining arguments in the tuple.

Here is the proof of concept:

def article(bibtex_key, *args):
    "Return the bibtex formatted entry"
    return ',\n'.join(['@article{{{0}}}'.format(bibtex_key)] +[arg[0](arg[1]) for arg in args[0]] + ['}'])

fields = ("author", "title", "journal", "pages", "number", "doi", "url", "eprint", "year")

for f in fields:
    locals()[f] = eval ('lambda x: "  {0} = {{{1}}}".format("' + f + '", x)')

entry = (article, "hallenbeck-2013-effec-o2",
         (author, "Hallenbeck, Alexander P. and Kitchin, John R."),
         (title, "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent"),
         (journal, "Industrial \& Engineering Chemistry Research"),
         (pages, "10788-10794"),
         (year, 2013),
         (number, 31),
         (doi, "10.1021/ie400582a"),
         (url, "http://pubs.acs.org/doi/abs/10.1021/ie400582a"),
         (eprint, "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))


print entry[0](entry[1], entry[2:])

@article{hallenbeck-2013-effec-o2},
  author = {Hallenbeck, Alexander P. and Kitchin, John R.},
  title = {Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent},
  journal = {Industrial \& Engineering Chemistry Research},
  pages = {10788-10794},
  year = {2013},
  number = {31},
  doi = {10.1021/ie400582a},
  url = {http://pubs.acs.org/doi/abs/10.1021/ie400582a},
  eprint = {http://pubs.acs.org/doi/pdf/10.1021/ie400582a},
}

We can still get specific fields out. Since we used a tuple here, it is not quite as nice as using a dictionary, but it is neither too bad, and it can be wrapped in a reasonably convenient function.

def article(bibtex_key, *args):
    "Return the bibtex formatted entry"
    return ',\n'.join(['@article{{{0}}}'.format(bibtex_key)] +[arg[0](arg[1]) for arg in args[0]] + ['}'])

fields = ("author", "title", "journal", "pages", "number", "doi", "url", "eprint", "year")

for f in fields:
    locals()[f] = eval ('lambda x: "  {0} = {{{1}}}".format("' + f + '", x)')

entry = (article, "hallenbeck-2013-effec-o2",
         (author, "Hallenbeck, Alexander P. and Kitchin, John R."),
         (title, "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent"),
         (journal, "Industrial \& Engineering Chemistry Research"),
         (pages, "10788-10794"),
         (year, 2013),
         (number, 31),
         (doi, "10.1021/ie400582a"),
         (url, "http://pubs.acs.org/doi/abs/10.1021/ie400582a"),
         (eprint, "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))


for field in entry[2:]:
    if field[0] == author:
        print field

def get_field(entry, field):
    for element in entry[2:]:
        if element[0] == field:
            return element[1]
    else:
        return None

print get_field(entry, title)
print get_field(entry, "bad")

(<function <lambda> at 0x1005975f0>, 'Hallenbeck, Alexander P. and Kitchin, John R.')
Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent
None

So, it seems Python can do some things like lisp in treating functions like first-class objects that can be used as functions, or keys. I still like the lisp s-exp better, but this is an interesting idea for Python too.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Converting a DOI to other scientific identifiers in Pubmed

Posted June 09, 2015 at 07:29 AM | categories: ref, orgmode | tags:

Sometimes it is useful to convert a DOI to another type of identifier. For example, in this post we converted a DOI to a Scopus EID, and in this one we got the WOS accession number from a DOI. Today, we consider how to get Pubmed identifiers. Pubmed provides an API for this purpose:

http://www.ncbi.nlm.nih.gov/pmc/tools/id-converter-api/

We will use the DOI tool. According to the documentation, we need to form a URL like this:

DOI: http://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=my_tool&email=my_email@example.com&ids=10.1093/nar/gks1195

We will call our tool "org-ref" and use the value of user-mail-address. The URL above returns XML, so we can parse it, and then extract the identifiers. This is a simple http GET request, which we can construct using url-retrieve-synchronously. Here is what we get.

(let* ((url-request-method "GET")
       (doi"10.1093/nar/gks1195")
       (my-tool "org-ref")
       (url (format "http://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=%s&email=%s&ids=%s"
                    my-tool
                    user-mail-address
                    doi))
       (xml (with-current-buffer  (url-retrieve-synchronously url)
                (xml-parse-region url-http-end-of-headers (point-max)))))
xml)

((pmcids
  ((status . "ok"))
  "\n"
  (request
   ((idtype . "doi")
    (dois . "")
    (versions . "yes")
    (showaiid . "no"))
   "\n"
   (echo nil "tool=org-ref;email=jkitchin%40andrew.cmu.edu;ids=10.1093%2Fnar%2Fgks1195")
   "\n")
  "\n"
  (record
   ((requested-id . "10.1093/NAR/GKS1195")
    (pmcid . "PMC3531190")
    (pmid . "23193287")
    (doi . "10.1093/nar/gks1195"))
   (versions nil
             (version
              ((pmcid . "PMC3531190.1")
               (current . "true")))))
  "\n"))

The parsed xml is now just an emacs-lisp data structure. We need to get the record, and then get the attributes of it to extract the identifiers. Next, we create a plist of the identifiers. For fun, we add the Scopus EID and WOS accession number from the previous posts too.

(let* ((url-request-method "GET")
       (doi"10.1093/nar/gks1195")
       (my-tool "org-ref")
       (url (format "http://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=%s&email=%s&ids=%s"
                    my-tool
                    user-mail-address
                    doi))
       (xml (car (with-current-buffer  (url-retrieve-synchronously url)
                   (xml-parse-region url-http-end-of-headers (point-max)))))
       (record (first  (xml-get-children xml 'record)))
       (doi (xml-get-attribute record 'doi))
       (pmcid (xml-get-attribute record 'pmcid))
       (pmid (xml-get-attribute record 'pmid)))
  (list :doi doi :pmid pmid :pmcid pmcid :eid (scopus-doi-to-eid doi) :wos (wos-doi-to-accession-number doi)))

(:doi "10.1093/nar/gks1195" :pmid "23193287" :pmcid "PMC3531190" :eid "2-s2.0-80053651587" :wos "000312893300006")

Well, there you have it, four new scientific document ids from one DOI. Of course we have defined org-mode links for each one of these:

doi:10.1093/nar/gks1195

pmid:23193287

pmcid:PMC3531190

eid:2-s2.0-80053651587

wos:000312893300006

I have not tested this on too many DOIs yet. Not all of them are indexed by Pubmed.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Getting a WOS Accession number from a DOI

Posted June 08, 2015 at 11:23 AM | categories: ref, orgmode | tags:

Updated June 09, 2015 at 07:25 AM

I have been slowly working on getting alternative identifiers to the DOI for scientific literature. The DOI is great for getting a bibtex entry, and getting to the article page, but other identifiers, e.g. from Pubmed, Scopus or Web of Science provide links to additional information. Here, I examine an approach to get a Web of Science identifier from a DOI.

In a previous post we showed how to use the Web of Science OpenURL services to derive links to articles from the DOI. It turns out that if you follow that link, you get redirected to a URL that has the WOS Accession number in it. For example, this link: http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:doi/10.1021/jp047349j is redirected to http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000225079300029&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8703b88d69db6b417a9c0dc510538f44 . You can see the wos:000225079300029 in that URL, so all we need to do is extract it. We use some url functions in emacs lisp to to that. They are a little convoluted, but they work. Previously I used a regular expression to do this.

(cdr (assoc "KeyUT" (url-parse-query-string (url-filename (url-generic-parse-url  "http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000225079300029&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8703b88d69db6b417a9c0dc510538f44")))))

000225079300029

It is a tad tricky to get the redirected URL. We have to use the most basic url-retrieve, which works asynchronously, and we need a callback function to handle the response. I use a trick with global variables to note that the function is waiting, and to sleep briefly until it is ready. We want the last redirect (this seems to get redirected twice).

(defvar *wos-redirect* nil)
(defvar *wos-waiting* nil)

(defun wos-get-wos-redirect (url)
  "Return final redirect url for open-url"
  (setq *wos-waiting* t)
  (url-retrieve
   url
   (lambda (status)
     (setq *wos-redirect* (car (last status)))
     (setq *wos-waiting* nil)))
  (while *wos-waiting* (sleep-for 0.1))
  (url-unhex-string *wos-redirect*))


(defun wos-doi-to-accession-number (doi)
  "Return a WOS Accession number for a DOI."
  (let* ((open-url (concat "http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:doi/" doi))
         (redirect (wos-get-wos-redirect open-url)))
    (substring  (cadr
                 (assoc
                  "KeyUT"
                  (url-parse-query-string
                   (url-filename
                    (url-generic-parse-url redirect)))))
    4)))

(concat "wos:" (wos-doi-to-accession-number "10.1021/jp047349j"))

wos:000225079300029

I am not super crazy about this approach, but until I figure out the WOK API, this is surprisingly simple! And, now you can use the Accession number in a url like these examples:

http://onlinelibrary.wiley.com/resolve/reference/ISI?id=000225079300029

http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:ut/000225079300029

That might turn out to be handy at some point.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Getting a Scopus EID from a DOI

Posted June 07, 2015 at 04:29 PM | categories: ref, orgmode | tags:

Updated June 07, 2015 at 04:54 PM

Scopus is a scientific literature indexing and search engine service run by Elsevier. I have been integrating Scopus workflows into Emacs and org-ref. Scopus seems to work with their own digital identifiers, known as an EID. I usually have a DOI to work with. Here, we develop a way to get an EID from a DOI using the Scopus API. You need to get your own Scopus API key here: http://dev.elsevier.com/myapikey.html and set scopus-api-key in Emacs to use this code.

Once we have an EID, here are a few interesting things we can do with them. This is an EID: 2-s2.0-84881394200, for this reference:

Hallenbeck, Alexander P. and Kitchin, John R., "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent", Industrial & Engineering Chemistry Research, 52:10788-10794 (2013)

With the EID, we can construct a URL to the Scopus document page:

(let ((eid "2-s2.0-84881394200"))
  (format "http://www.scopus.com/record/display.url?eid=%s&origin=resultslist" eid))

http://www.scopus.com/record/display.url?eid=2-s2.0-84881394200&origin=resultslist

We can construct a URL to citing documents:

(let ((eid "2-s2.0-84881394200"))
  (format "http://www.scopus.com/results/citedbyresults.url?sort=plf-f&cite=%s&src=s&imp=t&sot=cite&sdt=a&sl=0&origin=recordpage" eid))

http://www.scopus.com/results/citedbyresults.url?sort=plf-f&cite=2-s2.0-84881394200&src=s&imp=t&sot=cite&sdt=a&sl=0&origin=recordpage

And there are three types of related document urls we can create: by author, keyword or references.

By authors:

(let ((eid "2-s2.0-84881394200"))
  (format (concat "http://www.scopus.com/search/submit/mlt.url"
                  "?eid=%s&src=s&all=true&origin=recordpage"
                  "&method=aut&zone=relatedDocuments")
            eid))

http://www.scopus.com/search/submit/mlt.url?eid=2-s2.0-84881394200&src=s&all=true&origin=recordpage&method=aut&zone=relatedDocuments

By keywords:

(let ((eid "2-s2.0-84881394200"))
  (format (concat "http://www.scopus.com/search/submit/mlt.url"
                  "?eid=%s&src=s&all=true&origin=recordpage"
                  "&method=key&zone=relatedDocuments")
          eid))

http://www.scopus.com/search/submit/mlt.url?eid=2-s2.0-84881394200&src=s&all=true&origin=recordpage&method=key&zone=relatedDocuments

And by references:

(let ((eid "2-s2.0-84881394200"))
  (format (concat  "http://www.scopus.com/search/submit/mlt.url?"
                   "eid=%s&src=s&all=true&origin=recordpage"
                   "&method=ref&zone=relatedDocuments")
           eid))

http://www.scopus.com/search/submit/mlt.url?eid=2-s2.0-84881394200&src=s&all=true&origin=recordpage&method=ref&zone=relatedDocuments

We can generate all those on the fly if we have an EID. The problem is that we usually have the DOI, not the EID. So, here we use the Scopus API to retrieve that. Basically, we just do a search on the DOI, assume one and only one is found, and get the EID from the results. The DOI we have for the reference considered here is doi:10.1021/ie400582a.

The gist of what we will do is send an http request to Scopus with our API key, and data specifying what to get. Scopus will return data to us in either json or xml, depending on what we ask for.

I find json easiest to deal with, so we first work it out in json. We use the Scopus search API and query on the doi here. We get back json data which we read as an emacs-lisp plist, and extract the eid from it.

(let* ((doi "10.1021/ie400582a")
       (url-request-method "GET")
       (url-mime-accept-string "application/json")
       (url-request-extra-headers  (list (cons "X-ELS-APIKey" *scopus-api-key*)
                                         '("field" . "eid")))
       (url (format  "http://api.elsevier.com/content/search/scopus?query=doi(%s)" doi))
       (json-object-type 'plist)
       (json-data (with-current-buffer  (url-retrieve-synchronously url)
                    (json-read-from-string
                     (buffer-substring url-http-end-of-headers (point-max))))))
 (plist-get (elt (plist-get (plist-get json-data :search-results) :entry) 0) :eid))

2-s2.0-84881394200

That is the EID we were looking for. Here, we just wrap that code in a function so it is easier to reuse.

(defun scopus-doi-to-eid-json (doi)
  "Return a parsed xml from the Scopus article retrieval api for DOI.
This does not always seem to work for the most recent DOIs."
  (let* ((url-request-method "GET")
         (url-mime-accept-string "application/json")
         (url-request-extra-headers  (list (cons "X-ELS-APIKey" *scopus-api-key*)
                                           '("field" . "eid")))
         (url (format  "http://api.elsevier.com/content/search/scopus?query=doi(%s)" doi))
         (json-object-type 'plist)
         (json-data (with-current-buffer  (url-retrieve-synchronously url)
                      (json-read-from-string
                       (buffer-substring url-http-end-of-headers (point-max))))))
    (plist-get (elt (plist-get (plist-get json-data :search-results) :entry) 0) :eid)))

(scopus-doi-to-eid "10.1021/ie400582a")

XML is the native format in the Scopus API. They say that json works most of the time, but some XML cannot be rendered as json. Here we use the XML returned to get the EID. It is less intuitive to me, but mostly because I have used it less. I don't think you can specify and XPATH like you can in Python.

(let* ((doi "10.1021/ie400582a")
       (url-request-method "GET")
       (url-mime-accept-string "application/xml")
       (url-request-extra-headers  (list (cons "X-ELS-APIKey" *scopus-api-key*)
                                         '("field" . "eid")))
       (url (format  "http://api.elsevier.com/content/search/scopus?query=doi(%s)" doi))
       (xml (with-current-buffer  (url-retrieve-synchronously url)
              (xml-parse-region url-http-end-of-headers (point-max))))
       (results (car xml))
       (entry (car (xml-get-children results 'entry))))
  (car (xml-node-children (car (xml-get-children entry 'eid)))))

2-s2.0-84881394200

Now we wrap this in a function for reusability.

(defun scopus-doi-to-eid (doi)
  "Get a Scopus eid from a DOI."
  (let* ((url-request-method "GET")
         (url-mime-accept-string "application/xml")
         (url-request-extra-headers  (list (cons "X-ELS-APIKey" *scopus-api-key*)
                                           '("field" . "eid")))
         (url (format  "http://api.elsevier.com/content/search/scopus?query=doi(%s)" doi))
         (xml (with-current-buffer  (url-retrieve-synchronously url)
                (xml-parse-region url-http-end-of-headers (point-max))))
         (results (car xml))
         (entry (car (xml-get-children results 'entry))))
    (car (xml-node-children (car (xml-get-children entry 'eid))))))

(scopus-doi-to-eid "10.1021/ie400582a")

2-s2.0-84881394200

This code is wrapped up in org-ref/scopus.el . It provides a new org-mode eid link, e.g. eid:2-s2.0-84881394200 which is functional and provides access to the citing and related article Scopus pages for that eid.

There are also new links and functions for a alloy Au segregation and auth(kitchin) and title(segregation).

Let's not forget the scopusid:7004212771 link to Scopus Author pages.

Now you can use org-mode for reproducible scientific literature searching in Scopus!

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter