ox-pandoc - org-mode + org-ref to docx with bibliographies

| categories: pandoc, orgmode, docx | tags:

There is a new org-mode exporter: ox-pandoc . It seems like it makes it easy to convert org-mode to other formats, including docx, and including references in a bibliography. Let us try it out.

1 The setup

We have to modify org-ref org-ref modifies helm-bibtex to insert citation links. We have to undo that here to insert LaTeX style citations. We do that here so that the key binding for inserting references from org-ref inserts the LaTeX citations. This is necessary for pandoc to convert the reference citations to the bibliography in the docx format. If you do not use org-ref, this is probably not necessary.

(setq helm-bibtex-format-citation-functions
      '((org-mode . (lambda (x) (insert (concat
                                         "\\cite{"
                                         (mapconcat 'identity x ",")
                                         "}")) ""))))
org-mode lambda (x) (insert (concat \cite{ (mapconcat (quote identity) x ,) }))

We have to add ox-pandoc and require it.

(add-to-list 'load-path (expand-file-name "ox-pandoc" starter-kit-dir))
(require 'ox-pandoc)

2 The document

Now, for some text. Grindy wrote this nice paper on approaching chemical accuracy with density functional calculations \cite{grindy-2013-approac}. Two other interesting papers include these ones \cite{guldner-1961,guerrini-2008-effec-feo}.

An equation: \(e^x = 4\).

And a figure with a caption:

Figure 1: Make sure this is in your org-file.

3 Summary

This is better than what I have seen in the past. ox-pandoc has some options that might tailor the bibliography to specific formats. You lose some functionality of org-ref cite links by using raw LaTeX, but if that is not a deal breaker this might be a good way to go for some purposes.

Here is the word document that results from this file: test-doc.docx

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Converting a DOI to other scientific identifiers in Pubmed

| categories: orgmode, ref | tags:

Sometimes it is useful to convert a DOI to another type of identifier. For example, in this post we converted a DOI to a Scopus EID, and in this one we got the WOS accession number from a DOI. Today, we consider how to get Pubmed identifiers. Pubmed provides an API for this purpose:

http://www.ncbi.nlm.nih.gov/pmc/tools/id-converter-api/

We will use the DOI tool. According to the documentation, we need to form a URL like this:

DOI: http://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=my_tool&email=my_email@example.com&ids=10.1093/nar/gks1195

We will call our tool "org-ref" and use the value of user-mail-address. The URL above returns XML, so we can parse it, and then extract the identifiers. This is a simple http GET request, which we can construct using url-retrieve-synchronously. Here is what we get.

(let* ((url-request-method "GET")
       (doi"10.1093/nar/gks1195")
       (my-tool "org-ref")
       (url (format "http://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=%s&email=%s&ids=%s"
                    my-tool
                    user-mail-address
                    doi))
       (xml (with-current-buffer  (url-retrieve-synchronously url)
                (xml-parse-region url-http-end-of-headers (point-max)))))
xml)
((pmcids
  ((status . "ok"))
  "\n"
  (request
   ((idtype . "doi")
    (dois . "")
    (versions . "yes")
    (showaiid . "no"))
   "\n"
   (echo nil "tool=org-ref;email=jkitchin%40andrew.cmu.edu;ids=10.1093%2Fnar%2Fgks1195")
   "\n")
  "\n"
  (record
   ((requested-id . "10.1093/NAR/GKS1195")
    (pmcid . "PMC3531190")
    (pmid . "23193287")
    (doi . "10.1093/nar/gks1195"))
   (versions nil
             (version
              ((pmcid . "PMC3531190.1")
               (current . "true")))))
  "\n"))

The parsed xml is now just an emacs-lisp data structure. We need to get the record, and then get the attributes of it to extract the identifiers. Next, we create a plist of the identifiers. For fun, we add the Scopus EID and WOS accession number from the previous posts too.

(let* ((url-request-method "GET")
       (doi"10.1093/nar/gks1195")
       (my-tool "org-ref")
       (url (format "http://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=%s&email=%s&ids=%s"
                    my-tool
                    user-mail-address
                    doi))
       (xml (car (with-current-buffer  (url-retrieve-synchronously url)
                   (xml-parse-region url-http-end-of-headers (point-max)))))
       (record (first  (xml-get-children xml 'record)))
       (doi (xml-get-attribute record 'doi))
       (pmcid (xml-get-attribute record 'pmcid))
       (pmid (xml-get-attribute record 'pmid)))
  (list :doi doi :pmid pmid :pmcid pmcid :eid (scopus-doi-to-eid doi) :wos (wos-doi-to-accession-number doi)))
(:doi "10.1093/nar/gks1195" :pmid "23193287" :pmcid "PMC3531190" :eid "2-s2.0-80053651587" :wos "000312893300006")

Well, there you have it, four new scientific document ids from one DOI. Of course we have defined org-mode links for each one of these:

doi:10.1093/nar/gks1195

pmid:23193287

pmcid:PMC3531190

eid:2-s2.0-80053651587

wos:000312893300006

I have not tested this on too many DOIs yet. Not all of them are indexed by Pubmed.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Getting a WOS Accession number from a DOI

| categories: orgmode, ref | tags:

I have been slowly working on getting alternative identifiers to the DOI for scientific literature. The DOI is great for getting a bibtex entry, and getting to the article page, but other identifiers, e.g. from Pubmed, Scopus or Web of Science provide links to additional information. Here, I examine an approach to get a Web of Science identifier from a DOI.

In a previous post we showed how to use the Web of Science OpenURL services to derive links to articles from the DOI. It turns out that if you follow that link, you get redirected to a URL that has the WOS Accession number in it. For example, this link: http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:doi/10.1021/jp047349j is redirected to http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000225079300029&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8703b88d69db6b417a9c0dc510538f44 . You can see the wos:000225079300029 in that URL, so all we need to do is extract it. We use some url functions in emacs lisp to to that. They are a little convoluted, but they work. Previously I used a regular expression to do this.

(cdr (assoc "KeyUT" (url-parse-query-string (url-filename (url-generic-parse-url  "http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000225079300029&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8703b88d69db6b417a9c0dc510538f44")))))
000225079300029

It is a tad tricky to get the redirected URL. We have to use the most basic url-retrieve, which works asynchronously, and we need a callback function to handle the response. I use a trick with global variables to note that the function is waiting, and to sleep briefly until it is ready. We want the last redirect (this seems to get redirected twice).

(defvar *wos-redirect* nil)
(defvar *wos-waiting* nil)

(defun wos-get-wos-redirect (url)
  "Return final redirect url for open-url"
  (setq *wos-waiting* t)
  (url-retrieve
   url
   (lambda (status)
     (setq *wos-redirect* (car (last status)))
     (setq *wos-waiting* nil)))
  (while *wos-waiting* (sleep-for 0.1))
  (url-unhex-string *wos-redirect*))


(defun wos-doi-to-accession-number (doi)
  "Return a WOS Accession number for a DOI."
  (let* ((open-url (concat "http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:doi/" doi))
         (redirect (wos-get-wos-redirect open-url)))
    (substring  (cadr
                 (assoc
                  "KeyUT"
                  (url-parse-query-string
                   (url-filename
                    (url-generic-parse-url redirect)))))
    4)))

(concat "wos:" (wos-doi-to-accession-number "10.1021/jp047349j"))
wos:000225079300029

I am not super crazy about this approach, but until I figure out the WOK API, this is surprisingly simple! And, now you can use the Accession number in a url like these examples:

http://onlinelibrary.wiley.com/resolve/reference/ISI?id=000225079300029

http://ws.isiknowledge.com/cps/openurl/service?url_ver=Z39.88-2004&rft_id=info:ut/000225079300029

That might turn out to be handy at some point.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Getting a Scopus EID from a DOI

| categories: orgmode, ref | tags:

Scopus is a scientific literature indexing and search engine service run by Elsevier. I have been integrating Scopus workflows into Emacs and org-ref. Scopus seems to work with their own digital identifiers, known as an EID. I usually have a DOI to work with. Here, we develop a way to get an EID from a DOI using the Scopus API. You need to get your own Scopus API key here: http://dev.elsevier.com/myapikey.html and set scopus-api-key in Emacs to use this code.

Once we have an EID, here are a few interesting things we can do with them. This is an EID: 2-s2.0-84881394200, for this reference:

Hallenbeck, Alexander P. and Kitchin, John R., "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent", Industrial & Engineering Chemistry Research, 52:10788-10794 (2013)

With the EID, we can construct a URL to the Scopus document page:

(let ((eid "2-s2.0-84881394200"))
  (format "http://www.scopus.com/record/display.url?eid=%s&origin=resultslist" eid))
http://www.scopus.com/record/display.url?eid=2-s2.0-84881394200&origin=resultslist

We can construct a URL to citing documents:

(let ((eid "2-s2.0-84881394200"))
  (format "http://www.scopus.com/results/citedbyresults.url?sort=plf-f&cite=%s&src=s&imp=t&sot=cite&sdt=a&sl=0&origin=recordpage" eid))
http://www.scopus.com/results/citedbyresults.url?sort=plf-f&cite=2-s2.0-84881394200&src=s&imp=t&sot=cite&sdt=a&sl=0&origin=recordpage

And there are three types of related document urls we can create: by author, keyword or references.

By authors:

(let ((eid "2-s2.0-84881394200"))
  (format (concat "http://www.scopus.com/search/submit/mlt.url"
                  "?eid=%s&src=s&all=true&origin=recordpage"
                  "&method=aut&zone=relatedDocuments")
            eid))

http://www.scopus.com/search/submit/mlt.url?eid=2-s2.0-84881394200&src=s&all=true&origin=recordpage&method=aut&zone=relatedDocuments

By keywords:

(let ((eid "2-s2.0-84881394200"))
  (format (concat "http://www.scopus.com/search/submit/mlt.url"
                  "?eid=%s&src=s&all=true&origin=recordpage"
                  "&method=key&zone=relatedDocuments")
          eid))

http://www.scopus.com/search/submit/mlt.url?eid=2-s2.0-84881394200&src=s&all=true&origin=recordpage&method=key&zone=relatedDocuments

And by references:

(let ((eid "2-s2.0-84881394200"))
  (format (concat  "http://www.scopus.com/search/submit/mlt.url?"
                   "eid=%s&src=s&all=true&origin=recordpage"
                   "&method=ref&zone=relatedDocuments")
           eid))

http://www.scopus.com/search/submit/mlt.url?eid=2-s2.0-84881394200&src=s&all=true&origin=recordpage&method=ref&zone=relatedDocuments

We can generate all those on the fly if we have an EID. The problem is that we usually have the DOI, not the EID. So, here we use the Scopus API to retrieve that. Basically, we just do a search on the DOI, assume one and only one is found, and get the EID from the results. The DOI we have for the reference considered here is doi:10.1021/ie400582a.

The gist of what we will do is send an http request to Scopus with our API key, and data specifying what to get. Scopus will return data to us in either json or xml, depending on what we ask for.

I find json easiest to deal with, so we first work it out in json. We use the Scopus search API and query on the doi here. We get back json data which we read as an emacs-lisp plist, and extract the eid from it.

(let* ((doi "10.1021/ie400582a")
       (url-request-method "GET")
       (url-mime-accept-string "application/json")
       (url-request-extra-headers  (list (cons "X-ELS-APIKey" *scopus-api-key*)
                                         '("field" . "eid")))
       (url (format  "http://api.elsevier.com/content/search/scopus?query=doi(%s)" doi))
       (json-object-type 'plist)
       (json-data (with-current-buffer  (url-retrieve-synchronously url)
                    (json-read-from-string
                     (buffer-substring url-http-end-of-headers (point-max))))))
 (plist-get (elt (plist-get (plist-get json-data :search-results) :entry) 0) :eid))
2-s2.0-84881394200

That is the EID we were looking for. Here, we just wrap that code in a function so it is easier to reuse.

(defun scopus-doi-to-eid-json (doi)
  "Return a parsed xml from the Scopus article retrieval api for DOI.
This does not always seem to work for the most recent DOIs."
  (let* ((url-request-method "GET")
         (url-mime-accept-string "application/json")
         (url-request-extra-headers  (list (cons "X-ELS-APIKey" *scopus-api-key*)
                                           '("field" . "eid")))
         (url (format  "http://api.elsevier.com/content/search/scopus?query=doi(%s)" doi))
         (json-object-type 'plist)
         (json-data (with-current-buffer  (url-retrieve-synchronously url)
                      (json-read-from-string
                       (buffer-substring url-http-end-of-headers (point-max))))))
    (plist-get (elt (plist-get (plist-get json-data :search-results) :entry) 0) :eid)))

(scopus-doi-to-eid "10.1021/ie400582a")

XML is the native format in the Scopus API. They say that json works most of the time, but some XML cannot be rendered as json. Here we use the XML returned to get the EID. It is less intuitive to me, but mostly because I have used it less. I don't think you can specify and XPATH like you can in Python.

(let* ((doi "10.1021/ie400582a")
       (url-request-method "GET")
       (url-mime-accept-string "application/xml")
       (url-request-extra-headers  (list (cons "X-ELS-APIKey" *scopus-api-key*)
                                         '("field" . "eid")))
       (url (format  "http://api.elsevier.com/content/search/scopus?query=doi(%s)" doi))
       (xml (with-current-buffer  (url-retrieve-synchronously url)
              (xml-parse-region url-http-end-of-headers (point-max))))
       (results (car xml))
       (entry (car (xml-get-children results 'entry))))
  (car (xml-node-children (car (xml-get-children entry 'eid)))))
2-s2.0-84881394200

Now we wrap this in a function for reusability.

(defun scopus-doi-to-eid (doi)
  "Get a Scopus eid from a DOI."
  (let* ((url-request-method "GET")
         (url-mime-accept-string "application/xml")
         (url-request-extra-headers  (list (cons "X-ELS-APIKey" *scopus-api-key*)
                                           '("field" . "eid")))
         (url (format  "http://api.elsevier.com/content/search/scopus?query=doi(%s)" doi))
         (xml (with-current-buffer  (url-retrieve-synchronously url)
                (xml-parse-region url-http-end-of-headers (point-max))))
         (results (car xml))
         (entry (car (xml-get-children results 'entry))))
    (car (xml-node-children (car (xml-get-children entry 'eid))))))

(scopus-doi-to-eid "10.1021/ie400582a")
2-s2.0-84881394200

This code is wrapped up in org-ref/scopus.el . It provides a new org-mode eid link, e.g. eid:2-s2.0-84881394200 which is functional and provides access to the citing and related article Scopus pages for that eid.

There are also new links and functions for a alloy Au segregation and auth(kitchin) and title(segregation).

Let's not forget the scopusid:7004212771 link to Scopus Author pages.

Now you can use org-mode for reproducible scientific literature searching in Scopus!

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Another approach to embedding org-source in html

| categories: data, orgmode | tags:

In this post I examined a way to embed the org-source in a comment in the html of the post, and developed a reasonably convenient way to extract the source in emacs. One downside of the approach was the need to escape at least the dashes, and then unescape them on extraction. I came across another idea, which is to put the org-source in base64 encoded form in a data uri .

First let us see what the encoding means:

(base64-encode-string "<!-- test-->")
PCEtLSB0ZXN0LS0+

And decoding:

(base64-decode-string "PCEtLSB0ZXN0LS0+")
<!-- test-->

The encoding looks random, but it is reversible. More importantly, it probably will not have any html like characters in it that need escaped. The idea of a data uri is that the data it serves is embedded in the URL href attribute. This is basically how to make a data uri. We give the url here a class so we can find it later.

<a class="some-org-source" href="data:text/plain;charset=US-ASCII;base64,PCEtLSB0ZXN0LS0+">source</a>

Here is the actual html for the browser. If you click on it, your browser automatically decodes it for you!

source

So, during the blog publish step, we just need to add this little step to the html generation, and it will be included as a data uri. Here is the function that generates the data uri for us, and example of using it. The encoded source is not at all attractive to look at it, but you almost never need to look at it, it is invisible in the browser. Interestingly, if you click on the link, you will see the org source right in your browser!

(defun source-data-uri (source)
  "Encode the string in SOURCE to a data uri."
  (format
   "<a class=\"org-source\" href=\"data:text/plain;charset=US-ASCII;base64,%s\">source</a>"
   (base64-encode-string source)))

(source-data-uri (buffer-string))
source

Now, we integrate it into the blogofile function:

(defun bf-get-post-html ()
  "Return a string containing the YAML header, the post html, my
copyright line, and a link to the org-source code."
  (interactive)
  (let ((org-source (buffer-string))
        (url-to-org (bf-get-url-to-org-source))
        (yaml (bf-get-YAML-heading))
        (body (bf-get-HTML)))

    (with-temp-buffer
      (insert yaml)
      (insert body)
      (insert
       (format "<p>Copyright (C) %s by John Kitchin. See the <a href=\"/copying.html\">License</a> for information about copying.<p>"
               (format-time-string "%Y")))
      (insert (format "<p><a href=\"%s\">org-mode source</a><p>"
                      url-to-org))
      (insert (format "<p>Org-mode version = %s</p>" (org-version)))
      ;; this is the only new code we need to add.
      (insert (source-data-uri org-source))
      ;; return value
      (buffer-string))))

Now we need a new adaptation of the grab-org-source function. We still need a regexp search to get the source, and we still need to decode it.

(defun grab-org-source (url)
  "Extract org-source from URL to a buffer named *grab-org-source*."
  (interactive "sURL: ")
  (switch-to-buffer (get-buffer-create "*grab-org-source*"))
  (erase-buffer)
  (org-mode)
  (insert
   (with-current-buffer
       (url-retrieve-synchronously url)
     (let (start)
       (re-search-forward
        "<a class=\"org-source\" href=\"data:text/plain;charset=US-ASCII;base64,\\([^\"]*\\)\\\">" nil t)
       (base64-decode-string  (match-string 1))))))

What else could we do with this? One idea would be to generate data uris for each code block that you could open in your browser. For example, here we generate a list of data uris for each code block in the buffer. We don't take care to label them or make it easy to see what they are, but if you click on one, you should see a plain text version of the block. If this is done a lot, it might even make sense to change the mime type to download the code in some native app.

(org-element-map (org-element-parse-buffer) 'src-block
  (lambda (src-block)
    (source-data-uri (org-element-property :value src-block))))
(source source source source source source)

I am not sure if this is better or worse than the other approach. I have not tested it very thoroughly, but it seems like it should work pretty generally. I imagine you could also embed other kinds of files in the html, if for some reason you did not want to put the files on your server. Overall this seems to lack some elegance in searching for data, e.g. like RDF or RDFa is supposed to enable, but it might be a step in that direction, using org-mode and Emacs as the editor.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

source
Discuss on Twitter
« Previous Page -- Next Page »