Making highly linked bibliographies from the Scopus API

| categories: scopus, python | tags:

A given article entry in a bibliography might have the following kinds of links in it. I think we can generate these from a Scopus query.

We are going to look at the document above, with eid=2-s2.0-84901638552. This is another long post, so here is a teaser of what we are doing. For this eid, we want to generate an html entry where each part of the entry is clickable. Here is what we will be able to do by the end of this post:

from scopus import *

print '<ol>', get_html('2-s2.0-84901638552'), '</ol>'
  1. Xu Z.,Kitchin J.R., Relating the electronic structure and reactivity of the 3d transition metal monoxide surfaces, Catalysis Communications, 52, p. 60-64, (2014-07-05), doi:10.1016/j.catcom.2013.10.028, .

In this post, we work out code that works for this document. This code in the form shown here might not work on all entries, e.g. for ones that are in press and are missing data, or for APS journals that have no page range. Later, I will fix those so this is more robust. To minimize repeating the code below, I create a python module here called scopus.py . Tangle it out with org-babel-tangle. As in the last post , I am not sharing my API key here, since it is not clear if that key is private or not.

I like json, so we use that data format here. XML would be more robust, as the Scopus site admits not all of the data can be turned into the json format, but for now we stick to json for its simplicity.

import requests
import json, os
from my_scopus import MY_API_KEY

def get_abstract_info(EID, refresh=False):
    'Get and save the json data for EID.'
    base = 'scopus-data/get_abstract_info'
    if not os.path.exists(base):
        os.makedirs(base)

    fname = '{0}/{1}'.format(base, EID)
    if os.path.exists(fname) and not refresh:
        with open(fname) as f:
            return json.loads(f.read())

    # Otherwise retrieve and save results
    url = ("http://api.elsevier.com/content/abstract/eid/" + EID)
    resp = requests.get(url,
                    headers={'Accept':'application/json',
                             'X-ELS-APIKey': MY_API_KEY})
    results = json.loads(resp.text.encode('utf-8'))
    with open(fname, 'w') as f:
        f.write(json.dumps(results))

    return results

1 Author pages

Here, we generate the html that will make each author a clickable link that goes to their Scopus ID author page.

def get_author_link(EID):
    data = get_abstract_info(EID)
    result = data['abstracts-retrieval-response']
    html = '<a href="http://www.scopus.com/authid/detail.url?origin=AuthorProfile&authorId={0}">{1}</a>'
    authors = [html.format(auid, name) for auid, name in
               zip([x['@auid'] for x in result['authors']['author']],
                   [x['ce:indexed-name'] for x in result['authors']['author']])]

    return ','.join(authors)
from scopus import *
print get_author_link('2-s2.0-84901638552')
Xu Z.,Kitchin J.R.

2 Journal link

The most important pieces of information we need is the journal name and the source-id from the coredata.

from scopus import *
EID = '2-s2.0-84901638552'
data = get_abstract_info(EID)
result = data['abstracts-retrieval-response']
print result['coredata']['source-id']
print result['coredata']['prism:publicationName']
22746
Catalysis Communications
def get_journal_link(EID):
    data = get_abstract_info(EID)
    result = data['abstracts-retrieval-response']
    sid = result['coredata']['source-id']
    journal = result['coredata']['prism:publicationName']
    s = '<a href="http://www.scopus.com/source/sourceInfo.url?sourceId={sid}">{journal}</a>'

    return s.format(sid=sid, journal=journal)
from scopus import *
print get_journal_link('2-s2.0-84901638552')
Catalysis Communications

3 DOI link

It would be helpful to have a doi link, which is actually independent of Scopus so people without Scopus access can still access information.

from scopus import *
EID = '2-s2.0-84901638552'
data = get_abstract_info(EID)
result = data['abstracts-retrieval-response']
print result['coredata']['prism:doi']
10.1016/j.catcom.2013.10.028
def get_doi_link(EID):
    data = get_abstract_info(EID)
    result = data['abstracts-retrieval-response']
    s = '<a href="https://doi.org/{doi}">doi:{doi}</a>'
    return s.format(doi=result['coredata']['prism:doi'])
from scopus import *
print get_doi_link('2-s2.0-84901638552')
doi:10.1016/j.catcom.2013.10.028

4 Citation count image

It is nice to show impact of a paper by showing the citations. These change with time, so a static view is not ideal. Scopus provides a way to get an image they generate that should update when viewed. We need the doi to get that.

def get_cite_img_link(EID):
    data = get_abstract_info(EID)
    result = data['abstracts-retrieval-response']
    s = '<img src="http://api.elsevier.com/content/abstract/citation-count?doi={doi}&httpAccept=image/jpeg&apiKey={apikey}"></img>'
    return s.format(doi=result['coredata']['prism:doi'].strip(), apikey=MY_API_KEY)
from scopus import *
print get_cite_img_link('2-s2.0-84901638552')

5 The document link

The document link is sort of buried in the coredata. It seems like & has been replaced by &amp; in the json data so we have to do a clunky fix here.

from scopus import *
EID = '2-s2.0-84901638552'
data = get_abstract_info(EID)
result = data['abstracts-retrieval-response']

print result['coredata']['dc:title']
for ref in result['coredata']['link']:
    if ref['@rel'] == 'scopus':
        print ref['@href'].replace('&amp;', '&')
        break
Relating the electronic structure and reactivity of the 3d transition metal monoxide surfaces
http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=84901638552&origin=inward
def get_abstract_link(EID):
    data = get_abstract_info(EID)
    result = data['abstracts-retrieval-response']
    title = result['coredata']['dc:title']
    for ref in result['coredata']['link']:
        if ref['@rel'] == 'scopus':
            link = ref['@href'].replace('&amp;', '&')

    s = '<a href="{link}">{title}</a>'
    return s.format(link=link, title=title)
from scopus import *
print get_abstract_link('2-s2.0-84901638552')
Relating the electronic structure and reactivity of the 3d transition metal monoxide surfaces

6 Putting it all together

Our goal is ultimately an html formatted citation where nearly every piece is a hyperlink to additional information, e.g. each author is linked to their page, the title is linked to the scopus document page, the journal is linked to the scopus journal page, a DOI link, and an image of the number of citations. Here it is.

def get_html(EID):
    data = get_abstract_info(EID)
    result = data['abstracts-retrieval-response']

    s = '<li>{authors}, <i>{title}</i>, {journal}, <b>{volume}{issue}</b>, p. {pages}, ({year}), {doi}, {cites}.</li>'

    issue = ''
    if result['coredata'].get('prism:issue'):
        issue = '({})'.format(result['coredata'].get('prism:issue'))
    return s.format(authors=get_author_link(EID),
                    title=get_abstract_link(EID),
                    journal=get_journal_link(EID),
                    volume=result['coredata'].get('prism:volume'),
                    issue=issue,
                    pages=result['coredata'].get('prism:pageRange'),
                    year=result['coredata'].get('prism:coverDate'),
                    doi=get_doi_link(EID),
                    cites=get_cite_img_link(EID))
from scopus import *
print get_html('2-s2.0-84901638552')
  • Xu Z.,Kitchin J.R., Relating the electronic structure and reactivity of the 3d transition metal monoxide surfaces, Catalysis Communications, 52, p. 60-64, (2014-07-05), doi:10.1016/j.catcom.2013.10.028, .
  • Well, that is the end for now. We have a reusable function that generates a nice HTML formatted citation that links out to many different resources. Why aren't all citations on the web this helpful?

    Copyright (C) 2015 by John Kitchin. See the License for information about copying.

    org-mode source

    Org-mode version = 8.2.10

    Discuss on Twitter