New publication in Topics in Catalysis

| categories: publication, news | tags: | View Comments

Single atom alloys are alloys in the extreme dilute limit, where single atoms of a reactive metal are surrounded by comparatively unreactive metals. This makes the single reactive atoms like single atom sites where reactions can occur. These sites are interesting because they are metallic, but their electronic structure is different than the atoms in more concentrated alloys. This means there is the opportunity for different, perhaps better catalytic performance for the single atom alloys. In this paper, we studied the electronic structure and some representative reaction pathways on a series of single atom alloy surfaces.

  author =       "Thirumalai, Hari and Kitchin, John R.",
  title =        "Investigating the Reactivity of Single Atom Alloys Using
                  Density Functional Theory",
  journal =      "Topics in Catalysis",
  year =         "2018",
  month =        "Jan",
  day =          "25",
  abstract =     "Single atom alloys are gaining importance as atom-efficient
                  catalysts which can be extremely selective and active towards
                  the formation of desired products. They possess such desirable
                  characteristics because of the presence of a highly reactive
                  single atom in a less reactive host surface. In this work, we
                  calculated the electronic structure of several representative
                  single atom alloys. We examined single atom alloys of gold,
                  silver and copper doped with single atoms of platinum,
                  palladium, iridium, rhodium and nickel in the context of the
                  d-band model of Hammer and N{\o}rskov. The reactivity of these
                  alloys was probed through the dissociation of water and nitric
                  oxide and the hydrogenation of acetylene to ethylene. We
                  observed that these alloys exhibit a sharp peak in their atom
                  projected d-band density of states, which we hypothesize could
                  be the cause of high surface reactivity. We found that the
                  d-band centers and d-band widths of these systems correlated
                  linearly as with other alloys, but that the energy of
                  adsorption of a hydrogen atom on these surfaces could not be
                  correlated with the d-band center, or the average reactivity
                  of the surface. Finally, the single atom alloys, with the
                  exception of copper--palladium showed good catalytic behavior
                  by activating the reactant molecules more strongly than the
                  bulk atom behavior and showing favorable reaction pathways on
                  the free energy diagrams for the reactions investigated.",
  issn =         "1572-9028",
  doi =          "10.1007/s11244-018-0899-0",
  url =          ""

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.6

Read and Post Comments

New publication in Molecular Simulation

| categories: publication, news | tags: | View Comments

This paper is our latest work using neural networks in molecular simulation. In this work, we build a Behler-Parinello neural network potential of bulk zirconia. The potential can describe several polymorphs of zirconia, as well as oxygen vacancy defect formation energies and diffusion barriers. We show that we can use the potential to model oxygen vacancy diffusion using molecular dynamics at different temperatures, and to use that data to estimate the effective diffusion activation energy. This is further evidence of the general utility of the neural network-based potential for molecular simulations with DFT accuracy.

  author =       {Chen Wang and Akshay Tharval and John R. Kitchin},
  title =        {A Density Functional Theory Parameterised Neural Network Model
                  of Zirconia},
  journal =      {Molecular Simulation},
  volume =       0,
  number =       0,
  pages =        {1-8},
  year =         2018,
  doi =          {10.1080/08927022.2017.1420185},
  url =          {},
  eprint =       { },
  publisher =    {Taylor \& Francis},

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.5

Read and Post Comments

2017 in a nutshell for the Kitchin Research group

| categories: news | tags: | View Comments

Since the last update a lot of new things have happened in the Kitchin Research group. Below are some summaries of the group accomplishments, publications and activities for the past year.

1 Student accomplishments

Jacob Boes completed his PhD and began postdoctoral work with Thomas Bligaard at SLAC/Suncat at Stanford. Congratulations Jake!

Four new PhD students joined the group:

  1. Jenny Zhan will work on simulation of molten superalloys
  2. Mingjie Liu will work on the design of single atom alloy catalysts
  3. Yilin Yang will work on segregation in multicomponent alloys under reaction conditions
  4. Zhitao Guo is also joining the group and will be co-advised by Prof. Gellman. He will work on multicomponent alloy catalysts.

Welcome to the group!

2 Publications

Our publications and citation counts have continued to grow this year. Here is our current metrics according to Researcher ID.

We have eight new papers that are online, and two that are accepted, but not online yet. There are brief descriptions below.

2.1 Collaborative papers

This is a modern update on the Atomic Simulation Environment Python software. We have been using and contributing to this software for about 15 years now!
This collaborative effort with the Keith group at UPitt and Anatole von Lilienfeld explored a novel approach to estimating adsorption energies on alloy surfaces.
We used DFT calculations to understand epitaxial stabilization of titania films on strontium titanate surfaces.
We previously predicted that tin oxide should be able to form in the columbite phase as an epitaxial film. In this paper our collaborators show that it can be done!
This paper finally came out in print. It shows an automated approach to sharing data. Also, it may be the only paper with data hidden inside a picture of a library in the literature.

2.2 Papers on neural networks in molecular simulation

We used neural networks in conjunction with molecular dynamics and Monte Carlo simulations to model the coverage dependent adsorption of oxygen and initial oxidation of a Pd(111) surface.
We used neural networks in conjunction with Monte Carlo simulations to model segregation across composition space for a Au-Pd alloy.
We used a cluster expansion with Monte Carlo simulations to resolve some inconsistencies in simulated Cu-Pd phase diagrams. There is an interesting transition from an fcc to bcc to fcc structure across the composition space that is subtle and difficult to compute.

2.3 Papers accepted in 2017 but not yet in press

  1. Chen Wang, Akshay Tharval, John R. Kitchin, A density functional theory parameterized neural network model of zirconia, Accepted in Molecular Simulation, July 2017.
  2. Hari Thirumalai, John R. Kitchin, Investigating the Reactivity of Single Atom Alloys using Density Functional Theory, Topics in Catalysis, Accepted November 2017.

3 New courses

After a five year stint of teaching Master's and PhD courses, I taught the undergraduate chemical engineering course again. This was the first time I taught the course using Python. All the lectures and assignments were in Jupyter notebooks. You can find the course here: The whole class basically ran from a browser using a Python Flask app to serve the syllabus, lectures and assignments. Assignments were submitted and returned by email through the Flask app. It was pretty interesting. I did not like it as much as using Emacs/org-mode like I have in the past, but it was easier to get 70 undergraduates up and running.

I did not teach in the Fall, because I was on Sabbatical!

4 Sabbatical at Google

In August 2017 I started my first sabbatical! I am spending a year in the Accelerated Science group at Google in Mountain View, California. I am learning about machine learning applications in engineering and science. This is a pivotal year in my research program, so stay tuned for our new work!

It has been great for my family, who moved out here with me. We have been seeing a lot of California. I have been biking to work almost every day, usually 15-20 miles. I have logged over 1200 commuting miles already since August.

5 Emacs and org-mode

org-ref remains in the top 15% of downloaded MELPA packages, with more than 24,000 downloads since it was released. It has been pretty stable lately. It remains a cornerstone of my technical writing toolbox.

I have spent some time improving org-mode/ipython interactions including inline images, asynchronous execution and export to jupyter notebooks. It is still a work in progress.

I spent a fair bit of time learning about dynamic modules for writing compiled extensions to Emacs to bring features like linear algebra, numerical methods and database access to it. I wish I had more time to work on this. I think it will be useful to make org-mode even better for scientific research and documentation.

6 Social media

I have continued exploring the use of social media to share my work. It still seems like a worthwhile use of time, but we need continued efforts to make this really useful for science.


I use my blog to share technical knowledge and news about the group. We had 48 blog posts in 2017. A lot of them were on some use of org-mode and Emacs. I also introduced a new exporter for org-mode to make jupyter notebooks. I spent November exploring automatic differentiation and applications of it to engineering problems. Visits to the site continue to grow. Here is the growth over the past two years. The big spike in Oct 2017 is from this article on Hacker News about one of my posts!

I continue to think that technical blogging is a valuable way to communicate technical knowledge. It provides an easy way to practice writing, and with comments enabled to get feedback on your ideas. It has taken several years to develop a style for doing this effectively that is useful to me, and to others. I have integrated my blog into Twitter so that new posts are automatically tweeted, which helps publicize the new posts.

It has some limitations, e.g. it is not obvious how to cite them in ways that are compatible with the current bibliometric driven assessment tools used in promotion and tenure. Overall, I find it very complementary to formal publications though, and I wish more people did it.

6.2 Github

I was a little less active on Github this year than last year, especially this fall as I started my sabbatical. Github remains my goto version control service though, and we continue using it for everything from code development and paper writing to course serving.

scimax finally has more Github stars than jmax does!

6.3 Youtube

Another year with over 100,000 minutes of Youtube watch time on our videos. org-mode is awesome was most popular, with almost 50,000 views. We have six videos with over 2500 views for the past year!

I have not made too many new videos this year. Hopefully there will be some new ones on the new features in scimax in the next year.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.3

Read and Post Comments

Solving an eigenvalue differential equation with a neural network

| categories: bvp, eigenvalue, autograd | tags: | View Comments

The 1D harmonic oscillator is described here. It is a boundary value differential equation with eigenvalues. If we let let ω=1, m=1, and units where ℏ=1. then, the governing differential equation becomes:

\(-0.5 \frac{d^2\psi(x)}{dx^2} + (0.5 x^2 - E) \psi(x) = 0\)

with boundary conditions: \(\psi(-\infty) = \psi(\infty) = 0\)

We can further stipulate that the probability of finding the particle over this domain is equal to one: \(\int_{-\infty}^{\infty} \psi^2(x) dx = 1\). In this set of equations, \(E\) is an eigenvalue, which means there are only non-trivial solutions for certain values of \(E\).

Our goal is to solve this equation using a neural network to represent the wave function. This is a different problem than the one here or here because of the eigenvalue. This is an additional adjustable parameter we have to find. Also, we have the normalization constraint to consider, which we did not consider before.

1 The neural network setup

Here we setup the neural network and its derivatives. This is the same as we did before.

import autograd.numpy as np
from autograd import grad, elementwise_grad
import autograd.numpy.random as npr
from autograd.misc.optimizers import adam

def init_random_params(scale, layer_sizes, rs=npr.RandomState(42)):
    """Build a list of (weights, biases) tuples, one for each layer."""
    return [(rs.randn(insize, outsize) * scale,   # weight matrix
             rs.randn(outsize) * scale)           # bias vector
            for insize, outsize in zip(layer_sizes[:-1], layer_sizes[1:])]

def swish(x):
    return x / (1.0 + np.exp(-x))

def psi(nnparams, inputs):
    "Neural network wavefunction"
    for W, b in nnparams:
        outputs =, W) + b
        inputs = swish(outputs)    
    return outputs

psip = elementwise_grad(psi, 1) # dpsi/dx 
psipp = elementwise_grad(psip, 1) # d^2psi/dx^2

2 The objective function

The important function we need is the objective function. This function codes the Schrödinger equation, the boundary conditions, and the normalization as a cost function that we will later seek to minimize. Ideally, at the solution the objective function will be zero. We can't put infinity into our objective function, but it turns out that x = ± 6 is practically infinity in this case, so we approximate the boundary conditions there.

Another note is the numerical integration by the trapezoid rule. I use a vectorized version of this because autograd doesn't have a trapz derivative and I didn't feel like figuring one out.

We define the params to vary here as a dictionary containing neural network weights and biases, and the value of the eigenvalue.

# Here is our initial guess of params:
nnparams = init_random_params(0.1, layer_sizes=[1, 8, 1])

params = {'nn': nnparams, 'E': 0.4}

x = np.linspace(-6, 6, 200)[:, None]

def objective(params, step):
    nnparams = params['nn']
    E = params['E']        
    # This is Schrodinger's eqn
    zeq = -0.5 * psipp(nnparams, x) + (0.5 * x**2 - E) * psi(nnparams, x) 
    bc0 = psi(nnparams, -6.0) # This approximates -infinity
    bc1 = psi(nnparams, 6.0)  # This approximates +infinity
    y2 = psi(nnparams, x)**2
    # This is a numerical trapezoid integration
    prob = np.sum((y2[1:] + y2[0:-1]) / 2 * (x[1:] - x[0:-1]))
    return np.mean(zeq**2) + bc0**2 + bc1**2 + (1.0 - prob)**2

# This gives us feedback from the optimizer
def callback(params, step, g):
    if step % 1000 == 0:
        print("Iteration {0:3d} objective {1}".format(step,
                                                      objective(params, step)))

3 The minimization

Now, we just let an optimizer minimize the objective function for us. Note, I ran this next block more than once, as the objective continued to decrease. I ran this one at least two times, and the loss was still decreasing slowly.

params = adam(grad(objective), params,
              step_size=0.001, num_iters=5001, callback=callback) 

Iteration   0 objective [[ 0.00330204]]
Iteration 1000 objective [[ 0.00246459]]
Iteration 2000 objective [[ 0.00169862]]
Iteration 3000 objective [[ 0.00131453]]
Iteration 4000 objective [[ 0.00113132]]
Iteration 5000 objective [[ 0.00104405]]

Good news, the lowest energy eigenvalue is known to be 0.5 for our choice of parameters, and that is approximately what we got. Now let's see our solution and compare it to the known solution. Interestingly we got the negative of the solution, which is still a solution. The NN solution is not indistinguishable from the analytical solution, and has some spurious curvature in the tails, but it is approximately correct, and more training might get it closer. A different activation function might also work better.

%matplotlib inline
import matplotlib.pyplot as plt

x = np.linspace(-6, 6)[:, None]
y = psi(params['nn'], x)

plt.plot(x, -y, label='NN')
plt.plot(x, (1/np.pi)**0.25 * np.exp(-x**2 / 2), 'r--', label='analytical')

4 The first excited state

Now, what about the first excited state? This has an eigenvalue of 1.5, and the solution has odd parity. We can naively change the eigenvalue, and hope that the optimizer will find the right new solution. We do that here, and use the old NN params.

params['E'] = 1.6

Now, we run a round of optimization:

params = adam(grad(objective), params,
              step_size=0.003, num_iters=5001, callback=callback) 

Iteration   0 objective [[ 0.09918192]]
Iteration 1000 objective [[ 0.00102333]]
Iteration 2000 objective [[ 0.00100269]]
Iteration 3000 objective [[ 0.00098684]]
Iteration 4000 objective [[ 0.00097425]]
Iteration 5000 objective [[ 0.00096347]]

That doesn't work though. The optimizer just pushes the solution back to the known one. Next, we try starting from scratch with the eigenvalue guess.

nnparams = init_random_params(0.1, layer_sizes=[1, 8, 1])

params = {'nn': nnparams, 'E': 1.6}

params = adam(grad(objective), params,
              step_size=0.003, num_iters=5001, callback=callback) 

Iteration   0 objective [[ 2.08318762]]
Iteration 1000 objective [[ 0.02358685]]
Iteration 2000 objective [[ 0.00726497]]
Iteration 3000 objective [[ 0.00336433]]
Iteration 4000 objective [[ 0.00229851]]
Iteration 5000 objective [[ 0.00190942]]

That also doesn't work. We are going to have to steer this. The idea is pre-train the neural network to have the basic shape and symmetry we want, and then use that as the input for the objective function. The first excited state has odd parity, and here is a guess of that shape. This is a pretty ugly hacked up version that only roughly has the right shape. I am counting on the NN smoothing out the discontinuities.

xm = np.linspace(-6, 6)[:, None]
ym = -0.5 * ((-1 * (xm + 1.5)**2) + 1.5) * (xm < 0) * (xm > -3)
yp = -0.5 * ((1 * (xm - 1.5)**2 ) - 1.5) * (xm > 0) * (xm < 3)

plt.plot(xm, (ym + yp))
plt.plot(x, (1/np.pi)**0.25 * np.sqrt(2) * x * np.exp(-x**2 / 2), 'r--', label='analytical')

Now we pretrain a bit.

def pretrain(params, step):
    nnparams = params['nn']
    errs = psi(nnparams, xm) - (ym + yp)
    return np.mean(errs**2)

params = adam(grad(pretrain), params,
              step_size=0.003, num_iters=501, callback=callback) 
Iteration   0 objective [[ 1.09283695]]

Here is the new initial guess we are going to use. You can see that indeed a lot of smoothing has occurred.

plt.plot(xm, ym + yp, xm, psi(params['nn'], xm))

That has the right shape now. So we go back to the original objective function.

params = adam(grad(objective), params,
              step_size=0.001, num_iters=5001, callback=callback) 

Iteration   0 objective [[ 0.00370029]]
Iteration 1000 objective [[ 0.00358193]]
Iteration 2000 objective [[ 0.00345137]]
Iteration 3000 objective [[ 0.00333]]
Iteration 4000 objective [[ 0.0032198]]
Iteration 5000 objective [[ 0.00311844]]

I ran that optimization block many times. The loss is still decreasing, but slowly. More importantly, the eigenvalue is converging to 1.5, which is the known analytical value, and the solution is converging to the known solution.

x = np.linspace(-6, 6)[:, None]
y = psi(params['nn'], x)

plt.plot(x, y, label='NN')
plt.plot(x, (1/np.pi)**0.25 * np.sqrt(2) * x * np.exp(-x**2 / 2), 'r--', label='analytical')

We can confirm the normalization is reasonable:

# check the normalization
print(np.trapz(y.T * y.T, x.T))
[ 0.99781886]

5 Summary

This is another example of using autograd to solve an eigenvalue differential equation. Some of these solutions required tens of thousands of iterations of training. The groundstate wavefunction was very easy to get. The first excited state, on the other hand, took some active steering. This is very much like how an initial guess can change which solution a nonlinear optimization (which this is) finds.

There are other ways to solve this particular problem. What I think is interesting about this is the possibility to solve harder problems, e.g. not just a harmonic potential, but a more complex one. You could pretrain a network on the harmonic solution, and then use it as the initial guess for the harder problem (which has no analytical solution).

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Solving ODEs with a neural network and autograd

| categories: ode, autograd | tags: | View Comments

In the last post I explored using a neural network to solve a BVP. Here, I expand the idea to solving an initial value ordinary differential equation. The idea is basically the same, we just have a slightly different objective function.

\(dCa/dt = -k Ca(t)\) where \(Ca(t=0) = 2.0\).

Here is the code that solves this equation, along with a comparison to the analytical solution: \(Ca(t) = Ca0 \exp -kt\).

import autograd.numpy as np
from autograd import grad, elementwise_grad
import autograd.numpy.random as npr
from autograd.misc.optimizers import adam

def init_random_params(scale, layer_sizes, rs=npr.RandomState(0)):
    """Build a list of (weights, biases) tuples, one for each layer."""
    return [(rs.randn(insize, outsize) * scale,   # weight matrix
             rs.randn(outsize) * scale)           # bias vector
            for insize, outsize in zip(layer_sizes[:-1], layer_sizes[1:])]

def swish(x):
    return x / (1.0 + np.exp(-x))

def Ca(params, inputs):
    "Neural network functions"
    for W, b in params:
        outputs =, W) + b
        inputs = swish(outputs)    
    return outputs

# Here is our initial guess of params:
params = init_random_params(0.1, layer_sizes=[1, 8, 1])

# Derivatives
dCadt = elementwise_grad(Ca, 1)

k = 0.23
Ca0 = 2.0
t = np.linspace(0, 10).reshape((-1, 1))

# This is the function we seek to minimize
def objective(params, step):
    # These should all be zero at the solution
    # dCadt = -k * Ca(t)
    zeq = dCadt(params, t) - (-k * Ca(params, t))
    ic = Ca(params, 0) - Ca0
    return np.mean(zeq**2) + ic**2

def callback(params, step, g):
    if step % 1000 == 0:
        print("Iteration {0:3d} objective {1}".format(step,
                                                      objective(params, step)))

params = adam(grad(objective), params,
              step_size=0.001, num_iters=5001, callback=callback) 

tfit = np.linspace(0, 20).reshape(-1, 1)
import matplotlib.pyplot as plt
plt.plot(tfit, Ca(params, tfit), label='soln')
plt.plot(tfit, Ca0 * np.exp(-k * tfit), 'r--', label='analytical soln')
plt.xlim([0, 20])
Iteration   0 objective [[ 3.20374053]]
Iteration 1000 objective [[  3.13906829e-05]]
Iteration 2000 objective [[  1.95894699e-05]]
Iteration 3000 objective [[  1.60381564e-05]]
Iteration 4000 objective [[  1.39930673e-05]]
Iteration 5000 objective [[  1.03554970e-05]]

Huh. Those two solutions are nearly indistinguishable. Since we used a neural network, let's hype it up and say we learned the solution to a differential equation! But seriously, note that although we got an "analytical" solution, we should only rely on it in the region we trained the solution on. You can see the solution above is not that good past t=10, even perhaps going negative (which is not even physically correct). That is a reminder that the function we have for the solution is not the same as the analytical solution, it just approximates it really well over the region we solved over. Of course, you can expand that region to the region you care about, but the main point is don't rely on the solution outside where you know it is good.

This idea isn't new. There are several papers in the literature on using neural networks to solve differential equations, e.g. and, and other blog posts that are similar (, even using autograd). That means to me that there is some merit to continuing to investigate this approach to solving differential equations.

There are some interesting challenges for engineers to consider with this approach though. When is the solution accurate enough? How reliable are derivatives of the solution? What network architecture is appropriate or best? How do you know how good the solution is? Is it possible to build in solution features, e.g. asymptotes, or constraints on derivatives, or that the solution should be monotonic, etc. These would help us trust the solutions not to do weird things, and to extrapolate more reliably.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Next Page »