New publication in SoftwareX

| categories: news, publication | tags: | View Comments

Bibliometrics are increasingly used to quantify scholarly productivity. In this paper, we introduce a Python package called pybliometrics that provides a scriptable interface to Scopus to aggregate publication data for analysis. The package provides pretty comprehensive coverage of the APIs for author, abstract, affiliation and citation queries. The manuscript shows examples for downloading abstracts in bulk, building collaboration network graphs, and analyzing citation trends. You have to get a key from Scopus to access their databases, and the package provides some guidance on how to get it and configure the package. If you are interested in bibliometrics, this package may be useful to you!

@article{rose-2019-pybliom,
  author =       {Michael E. Rose and John R. Kitchin},
  title =        {Pybliometrics: Scriptable Bibliometrics Using a Python
                  Interface To Scopus},
  journal =      {SoftwareX},
  volume =       10,
  number =       {nil},
  pages =        100263,
  year =         2019,
  doi =          {10.1016/j.softx.2019.100263},
  url =          {https://doi.org/10.1016/j.softx.2019.100263},
  DATE_ADDED =   {Mon Jul 8 07:06:58 2019},
}

Copyright (C) 2019 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.2.3

Read and Post Comments

An improvement for figures in ipython + scimax

| categories: ipython | tags: | View Comments

One of the best features of ipython in scimax is automatic inline images that you do not have to name. This has had a downside though, and that is it is not easy to use this and put attributes like names (so you can reference them later) or captions, or if you want a specific filename to get that. No more. Now you can use the :ipyfile header argument to control these. For example, if you use this in the header of the next block, it will save the images into the filenames you specified (in the order they are defined), and add attributes to the output. The syntax is just a list of plists (in elispese).

:ipyfile '((:name "clockwise" :filename "obipy-resources/clockwise.png" :caption "A clockwise spiral.") (:name "counterclockwise" :filename "obipy-resources/counterclockwise.png" :caption "A counterclockwise spiral."))

That allows you to refer to the clockwise one in Figure clockwise and the counterclockwise in Fig. counterclockwise. That may be helpful when using Ipython to write papers or for presentations where you might prefer named figures that are easy to find. Enjoy!

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

t = np.linspace(0, 20 * np.pi, 350)
x = np.exp(-0.1 * t) * np.sin(t)
y = np.exp(-0.1 * t) * np.cos(t)

plt.plot(x, y)
plt.axis('equal')

plt.figure()
plt.plot(y, x)

plt.axis('equal')

print('Length of t = {}'.format(len(t)))
print('x .dot. y = {}'.format(x @ y))

Length of t = 350 x .dot. y = 1.3598389888491538

A clockwise spiral.

A counterclockwise spiral.

Copyright (C) 2019 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.2.1

Read and Post Comments

Using results from one code block in another org-mode

| categories: elisp, emacs, orgmode | tags: | View Comments

One really great feature in org-mode is you have many options to pass data between code-blocks. In this post we look at some of these options using emacs-lisp as the language. This runs in a session where you can keep variables in memory between blocks, and use them in subsequent blocks.

Here we set a variable to a value.

(setq some-variable 42)
42

Then later in another block we can use that variable:

(+ some-variable 1)
43

While you are in the session, some-variable can be used. If you want some mind-bending trouble, the emacs-lisp session is global, and you can access some-variable even in another buffer! Don't do that. When you close emacs this variable will disappear, and all that is left are the results from above.

There is another way to pass information from one block to another using named src blocks and variables in the block header. This allows you to pass data between blocks by name, and you will see later you can even access the results by name from other files.

1 :var

First, we give our src block a name like this:

#+name: block-1
#+BEGIN_SRC emacs-lisp
(current-time-string)
#+END_SRC

When we run this, the results will have a name too.

(current-time-string)
Tue Feb 12 08:19:23 2019

Now, we can use the named result as input to a new block using the :var header.

#+BEGIN_SRC emacs-lisp :var input=block-1
(format "We got %S in block-1" input)
#+END_SRC

When we run this block, emacs will run block-1 and put the output in to the variable input which we use inside the code block.

(format "We got %S in block-1" input)
We got "Tue Feb 12 08:20:44 2019" in block-1

Some things to note:

  1. Every time you run this, block-1 gets rerun.
  2. The results in this block are not the same as in block-1
  3. The results in block-1 are not changed when you run the second block.

You may not want to rerun block-1 each time; maybe it is an expensive calculation, or maybe it should not be changed. You can prevent this behavior by using the :cache header.

2 :cache

If you specify :cache yes then org-mode should store a hash of the code block with the results, and if the code block hasn't changed then it should not run again.

#+name: block-2
#+BEGIN_SRC emacs-lisp :cache yes
(current-time-string)
#+END_SRC
(current-time-string)
Tue Feb 12 08:06:22 2019

Now, we use block-2 as input to a block, we see the output is the same as the output from block-2.

(format "We got %S in block-2" input)
We got "Tue Feb 12 08:06:22 2019" in block-2

Ok, but what if my results are too large to put in the buffer, or too complex for text? You still have some options.

3 :wrap

Suppose we generate some json in one block, and we want to use it in another block. We still want to see the json in the buffer as an intermediate result. We can wrap the output in a json block like this.

(require 'json)
(json-encode `(("date" . ,(current-time-string))))

{"date":"Tue Feb 12 08:30:20 2019"}

Then, we can simply input that output into a new block.

(format "We got %S in json" input)
We got "{\"date\":\"Tue Feb 12 08:30:20 2019\"}
" in json

This admittedly still pretty simple, text-based data. It is probably not a good idea to do this with binary data.

Note you can refer to this result even in another org-file:

#+BEGIN_SRC emacs-lisp :var input=./2019-02-12.org:json
input
#+END_SRC

#+RESULTS:
: {"date":"Tue Feb 12 08:30:20 2019"}

4 :file

It may be that your data is too large to conveniently put into your org-file, or maybe it is binary data. No problem, just put it into an external file using the :file header. It looks like this:

#+name: block-3
#+BEGIN_SRC emacs-lisp :cache yes :file block-3
(require 'json)
(json-encode `(("date" . ,(current-time-string))))
#+END_SRC

#+RESULTS[a14d376653bd8c40a0961ca95f21d8837dddec66]: block-3
[[file:block-3]]

Note that you have to provide a file name for this. Sometimes that is nice if you want a human recognizable file to send to someone, but it would also be nice if there was an automatic naming scheme, e.g. based on an sha-1 hash of the src block.

(require 'json)
(json-encode `(("date" . ,(current-time-string))))

block-3

Now you can use other tools to check out the file. Here we can still use simple shell tools.

cat block-3

The output of block-3 is a file name:

input
/Users/jkitchin/Box Sync/kitchingroup/jkitchin/journal/2019/02/12/block-3

So you can use it in a new block to read the data in, and then do something new with it.

(with-temp-buffer
  (insert-file-contents input)
  (format "We got %S in block-3" (json-read-from-string (buffer-string))))
We got ((date . "Tue Feb 12 08:46:55 2019")) in block-3

5 "remote" data

The blocks do not have to be in order. If you want, you can put your blocks in an appendix, and then just have analysis blocks here that use them. That way, you can have short blocks here that are more readable, but longer, more complex blocks elsewhere that do not clutter your document.

(with-temp-buffer
  (insert-file-contents input)
  (format "We got %S in the appendix data" (json-read-from-string (buffer-string))))
We got "{\"date\":\"Tue Feb 12 09:11:12 2019\"}" in the appendix data

6 Manually saving data in files

Note you can also manually save data in a file, for example:

(require 'json)
(let ((f "block-4.json"))
  (with-temp-file f
    (prin1
     (json-encode `(("date" . ,(current-time-string))))
     (current-buffer)))
  f)
block-4.json

We put the filename as the last variable which is returned by the block, so that we don't have to manually type it later in the next block. You know, try not to repeat yourself…

This just shows we did write out to our file:

cat block-4.json

And we read the file in here, using the filename from block-4 as an input variable.

(with-temp-buffer
  (insert-file-contents input)
  (format "We got %S in block-4" (json-read-from-string (buffer-string))))
We got "{\"date\":\"Tue Feb 12 08:51:25 2019\"}" in block-4

7 An appendix for data

(require 'json)
(let ((f "appendix.json"))
  (with-temp-file f
    (prin1
     (json-encode `(("date" . ,(current-time-string))))
     (current-buffer)))
  f)
appendix.json

8 Caveats

Using org-mode like this is almost always finding the right tradeoffs in what is persistent, and where is it stored. Not all of the intermediate data/calculations are stored; if they are really cheap you can just run the code blocks again. If they are really small, i.e. easy for your to read in a few lines, you can store them in the document. If they are really large, you can store them in a file.

The beauty of having everything in an org-file is you have a single file that is easy to transport. When the files get too large though, it can become impractical, e.g. emacs may slow down if you try to put thousands of lines of xml data into the buffer. Then, you have to make some decisions about what to keep, where to keep it, and in what form to keep it.

For short projects where you only need a single compute session, having everything in memory may be fine. For longer projects, say one that is long enough you will close all the buffers, and possibly restart emacs in between working on it, then you have to make some decisions about what to save from each block so you can continue the work in the next session. Again, you have to decide what to save, where to save, and in what form.

Once you start saving data outside the org-file, it becomes less portable, or more tricky to move the file because you need to also move all the data files to keep it intact. I have explored a concept of making an org-archive in the past, where you get a list of all files linked in the org-file, but this so far has just been worked out for some small proof of concept ideas.

Not all languages are the same in org-mode. They do not all support sessions for example, and they may not all work like the examples here. The scimax iPython modifications do not behave like the examples above. That is probably due to bugs I have inadvertently introduced, and in the future I will try to make it work like emacs-lisp does above.

Overall, org-mode has one of the most flexible and powerful systems for passing and reusing data in documents I have ever seen. It is not perfect, and in such a powerful system there are many unexplored or lightly traveled corners that may have hazards in them. It still seems pretty promising though.

Copyright (C) 2019 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.2.1

Read and Post Comments

2018 in a nutshell for the Kitchin Research Group

| categories: news | tags: | View Comments

The majority of this year was spent finishing my sabbatical in the Google Accelerated Science Research group. I finished that up in August, and have returned to Pittsburgh now. I spent that time learning about differentiable programming and machine learning applications in science and engineering. It was a great learning year for me, and the beginning of some new research directions.

It wasn't all work, I was able to bike over 3000 miles while we lived in California, spent lots of weekends at beaches, state parks, San Francisco, and many other beautiful places. There is a lot I miss from my sabbatical, but I am mostly glad to be back home.

1 An all new research group

In 2017, the last of my students finished, and my group shrunk temporarily to zero for a semester. That luckily coincided with the beginning of my sabbatical, which allowed me to focus exclusively on starting new research directions. Towards the end of last year, three new PhD students joined my group. I did not take any new PhD students this year, but several new MS students joined the group at the end of 2018:

  1. Siddhant Lambor (co-advised with Prof. Lynn Walker)
  2. Gautham Swaminathan (co-advised with Prof. Lynn Walker)
  3. Siddarth Achar
  4. Senhong Liu
  5. Dingqi Nai
  6. Qiong Wang

These students will all work on some aspect of machine learning in formulation research, design of experiments, and molecular simulation. It should be an exciting year!

2 Publications

2018 was a light year on publications, largely due to my group shrinking to zero, and being on sabbatical. I wrote a nice perspective article on machine learning in catalysis:

kitchin-2018-machin-learn-catal
This perspective article describes how machine learning is being used in catalysis research and opportunities for further research.

And several publications from past M.S. students got finished and published:

thirumalai-2018-inves-react
This paper shows that many single atom alloys have unique electronic structure features that are responsible for their special catalytic properties.
gao-2018-model-pallad
We used DFT calculations to build a neural network potential to model adatom diffusion on a metal surface.
wang-2018-densit-funct
We used DFT calculations to build a neural network potential to model zirconia polymorphs, oxygen vacancy formation and diffusion.

This was technically published in 2017, but it was the most cited article in J. Phys.: Cond. Matt. in 2018!

larsen-2017-atomic-simul
This is a modern update on the Atomic Simulation Environment Python software. We have been using and contributing to this software for about 15 years now!

Citations on our past work continue to grow.

Bibliography

  • [kitchin-2018-machin-learn-catal] John Kitchin, Machine Learning in Catalysis, Nature Catalysis, 1(4), 230-232 (2018). link. doi.
  • [thirumalai-2018-inves-react] Hari Thirumalai & John Kitchin, Investigating the Reactivity of Single Atom Alloys Using Density Functional Theory, Topics in Catalysis, 61(5-6), 462-474 (2018). link. doi.
  • [gao-2018-model-pallad] Tianyu Gao & John Kitchin, Modeling Palladium Surfaces With Density Functional Theory, Neural Networks and Molecular Dynamics, Catalysis Today, 312, 132-140 (2018). link. doi.
  • [wang-2018-densit-funct] Chen Wang, Akshay Tharval & John Kitchin, A Density Functional Theory Parameterised Neural Network Model of Zirconia, Molecular Simulation, 44(8), 623-630 (2018). link. doi.
  • [larsen-2017-atomic-simul] Ask Hjorth Larsen, Jens J\orgen Mortensen, Jakob, Blomqvist, Ivano E Castelli, Rune Christensen, , Marcin Dułak, Jesper Friis, Michael N Groves, , Bj\ork Hammer, Cory Hargus, Eric D Hermes, Paul C, Jennings, Peter Bjerre Jensen, James Kermode, John, R Kitchin, Esben Leonhard Kolsbjerg, Joseph Kubal, , Kristen Kaasbjerg, Steen Lysgaard, J\'on Bergmann, Maronsson, Tristan Maxson, Thomas Olsen, Lars, Pastewka, Andrew Peterson, Carsten Rostgaard, Jakob, Schi\otz, Ole Sch\"utt, Mikkel Strange, Kristian, S Thygesen, Tejs Vegge, Lasse Vilhelmsen, Michael, Walter, Zhenhua Zeng & Karsten W Jacobsen, The Atomic Simulation Environment-A Python Library for Working With Atoms, Journal of Physics: Condensed Matter, 29(27), 273002 (2017). link.

3 New courses

On my return from my sabbatical, I taught a new course for me, 06-623 Mathematical modeling of chemical engineering processes. I taught this course in Python, and it was a tour of mathematical and scientific programming that started with "Hello world" 'Hello world' and ended with machine learning. We traveled through differential equations, nonlinear algebra, optimization, and regression along the way using numpy and scipy. It was a fun class, and I look forward to teaching it again next Fall. You can see the course notes at https://github.com/jkitchin/f18-06623. I ran this course using Jupyter notebooks (of course I wrote the notes in org-mode and used the jupyter notebook exporter I wrote to make these!) and Box.com. It worked, but wasn't my favorite. I will try to go back to Emacs+org-mode for this next year.

This coming spring will be another new course: our junior Transport Lab course.

4 Emacs, org-mode and scimax

org-ref has been downloaded close to 40K times! I thought it passed 40K early in December, but MELPA shows it with just under 40K today. They switched servers recently, so maybe some statistics were lost. If you count the Melpa-stable downloads, it is still over 40K. There are now over 50 contributors to org-ref besides me! That is pretty awesome, and hopefully speaks to the number of people interested in using Emacs for scientific publishing.

This fall I picked up scimax development again after my sabbatical break, and have a few notable improvements I will launch in 2019. These include:

  • Many improvements to ipython in scimax:
    • Code completion
    • Inspection
    • More jupyter-like features (?, ??) and key-bindings
    • export to Jupyter notebooks
    • src-block keymaps
    • and more.
  • A new editmarks package that will allow persistent comments, highlights, and track-change mode.
  • Improved support for literate programming in org-mode including jump to the definition in org-mode.

Scimax is an increasingly important project to me, and in 2019 I am going to work on some ways that will make it easier for me to spend more time on it in the future.

5 Online activity

5.1 kitchingroup.cheme.cmu.edu

Since I was on sabbatical, it was a low volume blogging year with only 22 posts. Traffic to the blog was up nonetheless from the last year. I suspect I will blog more this year.

5.2 Github

I was even less active in 2018 than in 2017 on GitHub activity. You can see it picked back up this past fall as I returned to my day job as a professor. I expect 2019 will pick back up as usual.

5.3 YouTube

Our YouTube traffic is down this year compared to last year. It is still always interesting to see the spikes in traffic on the org-mode is awesome video. Maybe it got mentioned on Hacker News or something. I only made one video last year; I took a break while on sabbatical, and was busy this fall with a new course. Maybe 2019 will be a better year for that. I have some plans for new videos in the new year on ipython, and some updates in scimax.

We did cross 1000 subscribers this year. That doesn't qualify my channel for monetization yet, you also need 4000 watch hours in the past year. Last year we only had 1466 hours, so not that close yet. Why am I interested in this? I am actively looking for ways to support scimax development, and this could be one way to do that.

5.4 Twitter

It wasn't super easy to get all the Twitter data, I had to manually download the information from each month. Now that I have it, I did some analysis, so here it is. First we look at how many tweets, likes, retweets, etc. there were last year:

import csv
import glob

tweets = 0
impressions = 0
texts = []
times = [] # times of the tweets
likes = 0
retweets = 0
replies = 0

for csvfile in glob.glob('*.csv'):
    with open(csvfile) as f:
        reader = csv.DictReader(f)
        for row in reader:
            tweets += 1
            impressions += float(row['impressions'])
            texts += [row['Tweet text']]
            times += [row['time']]
            likes += float(row['likes'])
            replies += float(row['replies'])
            retweets += float(row['retweets'])

print(f'''{tweets} Tweets with {int(impressions)} impressions.
There were {int(likes)} likes, {int(retweets)} retweets, and {int(replies)} replies.''')

471 Tweets with 282655 impressions. There were 1089 likes, 220 retweets, and 341 replies.

Next, we look at the time distribution of these tweets. It seems like this should be easier to do (it probably is in Pandas).

import datetime
import numpy as np

x = np.array([datetime.datetime.strptime(time[0:-6], "%Y-%m-%d %H:%M")
              for time in times])

months = np.zeros(12)
for time in x:
    months[time.month - 1] += 1

plt.bar(np.arange(12), months)
plt.xticks(np.arange(12), ['January', 'February', 'March',
                           'April', 'May', 'June',
                           'July', 'August', 'September',
                           'October', 'November', 'December'],
           rotation=45);
plt.ylabel('Tweet count')

6 Summary and outlook

2018 was a pretty good year. I took a break from several things I have spent a lot of time in the past and learned some new things I spend my time on now. I am looking forward to getting back to some of the old things, especially scimax development. I still think it is a key component of modern scientific research and publishing, and that it has an important role in education. Stay tuned!

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.14

Read and Post Comments

Line integrals in Python with autograd

| categories: integration, autograd, python | tags: | View Comments

Table of Contents

A line integral is an integral of a function along a curve in space. We usually represent the curve by a parametric equation, e.g. \(\mathbf{r}(t) = [x(t), y(t), z(t)] = x(t)\mathbf{i} + y(t)\mathbf{j} + z(t)\mathbf{k}\). So, in general the curve will be a vector function, and the function we want to integrate will also be a vector function.

Then, we can write the line integral definition as:

\(\int_C \mathbf{F(r)}\cdot d\mathbf{r} = \int_a^b \mathbf{F}({\mathbf{r}(t)) \cdot \mathbf{r'}(t) dt\) where \(\mathbf{r'}(t) = \frac{d\mathbf{r}}{dt}\). This integrand is a scalar function, because of the dot product.

The following examples are adapted from Chapter 10 in Advanced Engineering Mathematics by Kreysig.

The first example is the evaluation of a line integral in the plane. We want to evaluate the integral of \(\mathbf{F(r)}=[-y, -xy]\) on the curve \(\mathbf{r(t)}=[-sin(t), cos(t)]\) from t=0 to t = π/2. The answer in the book is given as 0.4521. Here we evaluate this numerically, using autograd for the relevant derivative. Since the curve has multiple outputs, we have to use the jacobian function to get the derivatives. After that, it is a simple bit of matrix multiplication, and a call to the quad function.

import autograd.numpy as np
from autograd import jacobian
from scipy.integrate import quad

def F(X):
    x, y = X
    return -y, -x * y

def r(t):
    return np.array([-np.sin(t), np.cos(t)])

drdt = jacobian(r)

def integrand(t):
    return F(r(t)) @ drdt(t)

I, e = quad(integrand, 0.0, np.pi / 2)
print(f'The integral is {I:1.4f}.')
The integral is 0.4521.


We get the same result as the analytical solution.

The next example is in three dimensions. Find the line integral along \(\mathbf{r}(t)=[cos(t), sin(t), 3t]\) of the function \(\mathbf{F(r)}=[z, x, y]\) from t=0 to t=2 π. The solution is given as 21.99.

import autograd.numpy as np
from autograd import elementwise_grad, grad, jacobian

def F(X):
    x, y, z = X
    return [z, x, y]

def C(t):
    return np.array([np.cos(t), np.sin(t), 3 * t])

dCdt = jacobian(C, 0)

def integrand(t):
    return F(C(t)) @ dCdt(t)

I, e = quad(integrand, 0, 2 * np.pi)
print(f'The integral is {I:1.2f}.')
The integral is 21.99.


That is also the same as the analytical solution. Note the real analytical solution was 7 π, which is nearly equivalent to our answer.

7 * np.pi - I
3.552713678800501e-15

As a final example, we consider an alternate form of the line integral. In this form we do not use a dot product, so the integral results in a vector. This doesn't require anything from autograd, but does require us to be somewhat clever in how to do the integrals since quad can only integrate scalar functions. We need to integrate each component of the integrand independently. Here is one approach where we use lambda functions for each component. You could also manually separate the components.

def F(r):
    x, y, z = r
    return x * y, y * z, z

def r(t):
    return np.array([np.cos(t), np.sin(t), 3 * t])

def integrand(t):
    return F(r(t))

[quad(lambda t: integrand(t)[i], 0, 2 * np.pi)[0] for i in [0, 1, 2]]
[-6.9054847581172525e-18, -18.849555921538755, 59.21762640653615]

The analytical solution in this case was given as:

[0, -6 * np.pi, 6 * np.pi**2]
[0, -18.84955592153876, 59.21762640653615]

which is evidently the same as our numerical solution.

Maybe an alternative, but more verbose is this vectorized integrate function. We still make temporary functions for integrating, and the vectorization is essentially like the list comprehension above, but we avoid the lambda functions.

@np.vectorize
def integrate(i):
    def integrand(t):
        return F(r(t))[i]
    I, e = quad(integrand, 0, 2 * np.pi)
    return I

integrate([0, 1, 2])
array([ -6.90548476e-18,  -1.88495559e+01,   5.92176264e+01])

1 Summary

Once again, autograd provides a convenient way to compute function jacobians which make it easy to evaluate line integrals in Python.

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.14

Read and Post Comments

Next Page »