Using autograd in nonlinear regression

| categories: python, regression, autograd | tags: | View Comments

Table raw-data contains the energy as a function of volume for some solid material from a set of density functional theory calculations. Our goal is to fit the Murnaghan equation of state to this data. The model is moderately nonlinear. I have previously done this with the standard nonlinear regression functions in scipy, so today we will use autograd along with a builtin optimizer to minimize an objective function to achieve the same thing.

The basic idea is we define an objective function, in this case the summed squared errors between predicted values from the model and known values from our data. The objective function takes two arguments: the model parameters, and the "step". This function signature is a consequence of the built in optimizer we use; it expects that signature (it is useful for batch training, but we will not use that here). We use autograd to create a gradient of the objective function which the adam optimizer will use to vary the parameters with the goal of minimizing the objective function.

The adam optimizer function takes as one argument a callback function, which we call summary to print out intermediate results during the convergence. We run the optimizer in a loop because the optimizer runs a fixed number of steps on each call. We check if the objective function is sufficiently small, and if it is we break out.

import autograd.numpy as np
from autograd import grad
from autograd.misc.optimizers import adam

np.set_printoptions(precision=3, suppress=True)

# input data
Vinput = np.array([row[0] for row in data]) 
Eknown = np.array([row[1] for row in data])

def Murnaghan(pars, vol):
    given a vector of parameters and volumes, return a vector of energies.
    equation From PRB 28,5480 (1983)
    E0, B0, BP, V0 = pars
    E = E0 + B0 * vol / BP * (((V0 / vol)**BP) / (BP - 1.0) + 1.0) - V0 * B0 / (BP - 1.)
    return E

def objective(pars, step):
    "This is what we want to minimize by varying the pars."
    predicted = Murnaghan(pars, Vinput)
    # Note Eknown is not defined in this function scope
    errors = Eknown - predicted
    return np.sum(errors**2)

objective_grad = grad(objective)

def summary(pars, step, gradient):
    # Note i, N are not defined in this function scope
    if step % N == 0: 
        print('step {0:5d}: {1:1.3e}'.format(i * N + step, 
                                             objective(pars, step)))

pars = np.array([-400, 0.5, 2, 210]) # The initial guess
N = 200 # num of steps to take on each optimization
learning_rate = 0.001
for i in range(100):
    pars = adam(objective_grad, pars, step_size=learning_rate, 
                num_iters=N, callback=summary)
    SSE = objective(pars, None)
    if SSE < 0.00002:
        print('Tolerance met.', SSE)
step     0: 3.127e+02
step   200: 1.138e+02
step   400: 2.011e+01
step   600: 1.384e+00
step   800: 1.753e-01
step  1000: 2.044e-03
step  1200: 1.640e-03
step  1400: 1.311e-03
step  1600: 1.024e-03
step  1800: 7.765e-04
step  2000: 5.698e-04
step  2200: 4.025e-04
step  2400: 2.724e-04
step  2600: 1.762e-04
step  2800: 1.095e-04
step  3000: 6.656e-05
step  3200: 3.871e-05
step  3400: 2.359e-05
('Tolerance met.', 1.5768901008364176e-05)
[-400.029    0.004    4.032  211.847]

There are some subtleties in the code above. One is the variables that are used kind of all over the place, which is noted in a few places. Those could get tricky to keep track of. Another is the variable I called learning_rate. I borrowed that terminology from the machine learning community. It is the step_size in this implementation of the optimizer. If you make it too large, the objective function doesn't converge, but if you set it too small, it will take a long time to converge. Note that it took at about 3400 steps of "training". This is a lot more than is typically required by something like pycse.nlinfit. This isn't the typical application for this approach to regression. More on that another day.

As with any fit, it is wise to check it out at least graphically. Here is the fit and data.

%matplotlib inline
import matplotlib
matplotlib.rc('axes.formatter', useoffset=False)
import matplotlib.pyplot as plt

plt.plot(Vinput, Eknown, 'ko', label='known')

vinterp = np.linspace(Vinput.min(), Vinput.max(), 200)

plt.plot(vinterp, Murnaghan(pars, vinterp), 'r-', label='predicted')

The fit looks pretty good.

Table 1: Volume-Energy data for a solid state system.
volume energy
324.85990899 -399.9731688470
253.43999457 -400.0172393178
234.03826687 -400.0256270548
231.12159387 -400.0265690700
228.40609504 -400.0273551120
225.86490337 -400.0280030862
223.47556626 -400.0285313450
221.21992353 -400.0289534593
219.08319566 -400.0292800709
217.05369547 -400.0295224970
215.12089909 -400.0296863867
213.27525144 -400.0297809256
211.51060823 -400.0298110000
203.66743321 -400.0291665573
197.07888649 -400.0275017142
191.39717952 -400.0250998136
186.40163591 -400.0221371852
181.94435510 -400.0187369863
177.92077043 -400.0149820198
174.25380090 -400.0109367042
170.88582166 -400.0066495100
167.76711189 -400.0021478258
164.87096104 -399.9974753449
159.62553397 -399.9876885136
154.97005460 -399.9774175487
150.78475335 -399.9667603369
146.97722201 -399.9557686286
143.49380641 -399.9445262604

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Sensitivity analysis using automatic differentiation in Python

| categories: python, autograd, sensitivity | tags: | View Comments

This paper describes how sensitivity analysis requires access to the derivatives of a function. Say, for example we have a function describing the time evolution of the concentration of species A:

\([A] = \frac{[A]_0}{k_1 + k_{-1}} (k_1 e^{(-(k_1 _ k_{-1})t)} + k_{-1})\)

The local sensitivity of the concentration of A to the parameters \(k1\) and \(k_1\) are defined as \(\frac{\partial A}{\partial k1}\) and \(\frac{\partial A}{\partial k_1}\). Our goal is to plot the sensitivity as a function of time. We could derive those derivatives, but we will use auto-differentiation instead through the autograd package. Here we import numpy from the autograd package and plot the function above.

import autograd.numpy as np

A0 = 1.0

def A(t, k1, k_1):
    return A0 / (k1 + k_1) * (k1 * np.exp(-(k1 + k_1) * t) + k_1)

%matplotlib inline
import matplotlib.pyplot as plt

t = np.linspace(0, 0.5)

k1 = 3.0
k_1 = 3.0
plt.plot(t, A(t, k1, k_1))
plt.xlim([0, 0.5])
plt.ylim([0, 1])

The figure above reproduces Fig. 1 from the paper referenced above. Next, we use autograd to get the derivatives. This is subtly different than our previous post. First, we need the derivative of the function with respect to the second and third arguments; the default is the first argument. Second, we want to evaluate this derivative at each time value. We use the jacobian function in autograd to get these. This is different than grad, which will sum up the derivatives at each time. That might be useful for regression, but not for sensitivity analysis. Finally, to reproduce Figure 2a, we plot the absolute value of the sensitivities.

from autograd import jacobian

dAdk1 = jacobian(A, 1)
dAdk_1 = jacobian(A, 2)

plt.plot(t, np.abs(dAdk1(t, k1, k_1)))
plt.plot(t, np.abs(dAdk_1(t, k1, k_1)))
plt.xlim([0, 0.5])
plt.ylim([0, 0.1])
plt.legend(['$S_{k1}$', '$S_{k\_1}$'])

That looks like the figure in the paper. To summarize the main takeaway, autograd enabled us to readily compute derivatives without having to derive them manually. There was a little subtlety in choosing jacobian over grad or elementwise_grad but once you know what these do, it seems reasonable. It is important to import the wrapped numpy first, to enable autograd to do its work. All the functions here are pretty standard, so everything worked out of the box. We should probably be using autograd, or something like it for more things in science!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Forces by automatic differentiation in molecular simulation

| categories: simulation, autograd | tags: | View Comments

In molecular simulation we often use a potential to compute the total energy of a system. For example, we might use something simple like a Lennard-Jones potential. If we have a potential function, e.g. \(E = V(R)\) where \(R\) are the positions of the atoms, then we know the forces on the atoms are defined by \(f = -\frac{dV}{dR}\). For simple functions, you can derive the derivative pretty easily, but these functions quickly get complicated. In this post, we consider automatic differentiation as implemented by autograd. This is neither symbolic nor numerical differentiation. The gist is that the program uses the chain rule to evaluate derivatives. Here we do not delve into how it is done, we just see how it might help us in molecular simulation.

For reference, here is a result from the LennardJones calculator in ASE.

from ase.calculators.lj import LennardJones
from ase.cluster.icosahedron import Icosahedron

atoms = Icosahedron('Ar', noshells=2, latticeconstant=3)

print('LJ: ', atoms.get_potential_energy())
('LJ: ', -3.3553466825679812)

First, we define a function for the Lennard-Jones potential. I adapted the code here to implement a function that calculates the Lennard-Jones energy for a cluster of atoms with no periodic boundary conditions. Instead of using numpy directly, we import it from the autograd package which puts thin wrappers around the functions to enable the derivative calculations.

import autograd.numpy as np

def energy(positions):
    "Compute the energy of a Lennard-Jones system."
    natoms = len(positions)

    sigma = 1.0
    epsilon = 1.0
    rc = 3 * sigma

    e0 = 4 * epsilon * ((sigma / rc)**12 - (sigma / rc)**6)
    energy = 0.0
    for a1 in range(natoms):
        for j in range(a1 + 1, natoms):
            r2 = np.sum((positions[a1] - positions[j])**2)
            if r2 <= rc**2:
                c6 = (sigma**2 / r2)**3
                energy -= e0 
                c12 = c6**2
                energy += 4 * epsilon * (c12 - c6)

    return energy

Here is our function in action, and it produces the same result as the ASE calculator. So far there is nothing new.

print('our func: ', energy(atoms.positions))
('our func: ', -3.3553466825679803)

Now, we look at the forces from the ASE calculator. If you look at the ASE code you will see that the formula for forces was analytically derived and accumulated in the loop.

np.set_printoptions(precision=3, suppress=True)
[[ 0.545  1.667  0.721]
 [-0.068  0.002  0.121]
 [-0.18   0.018 -0.121]
 [ 0.902 -0.874 -0.083]
 [ 0.901 -0.937 -1.815]
 [ 0.243 -0.19   0.063]
 [-0.952 -1.776 -0.404]
 [-0.562  1.822  1.178]
 [-0.235  0.231  0.081]
 [-0.023  0.204 -0.294]
 [ 0.221 -0.342 -0.425]
 [-5.385 -6.017  1.236]
 [ 4.593  6.193 -0.258]]

Now we look at how to use autograd for this purpose. We want an element-wise gradient of the total energy with respect to the positions. autograd returns functions, so we wrap it in another function so we can take the negative of that function.

from autograd import elementwise_grad

def forces(pos):
    dEdR = elementwise_grad(energy)
    return -dEdR(pos)

[[ 0.545  1.667  0.721]
 [-0.068  0.002  0.121]
 [-0.18   0.018 -0.121]
 [ 0.902 -0.874 -0.083]
 [ 0.901 -0.937 -1.815]
 [ 0.243 -0.19   0.063]
 [-0.952 -1.776 -0.404]
 [-0.562  1.822  1.178]
 [-0.235  0.231  0.081]
 [-0.023  0.204 -0.294]
 [ 0.221 -0.342 -0.425]
 [-5.385 -6.017  1.236]
 [ 4.593  6.193 -0.258]]

Here we show the results are the same from both approaches.

print(np.allclose(atoms.get_forces(), forces(atoms.positions)))

Wow. We got forces without deriving a derivative, or using numerical finite differences, across loops, and conditionals. That is pretty awesome. You can easily modify the potential function now, without the need to rederive the force derivatives! This is an idea worth exploring further. In principle, it should be possible to include periodic boundary conditions and use autograd to compute stresses too. Maybe that will be a future post.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

New publication in J. Phys. Chem. Lett.

| categories: publication, news | tags: | View Comments

DFT calculations are extensively used to predict the chemical properties of metal alloy surfaces, but they are expensive which limits the number of calculations that can be practically be calculated. In this paper, we explore a perturbation approach known as alchemy to take previously calculated results and extend them to new compositions. We use oxygen reduction as a prototype reaction, and show that alchemy is often much faster than DFT, with an accuracy within 0.1 eV of the DFT. There are cases where the accuracy is not as good suggesting that further improvements to the perturbation model could be beneficial. Overall, alchemy appears to be a useful tool in high-throughput screening research.

  title =        {Alchemical Predictions for Computational Catalysis: Potential
                  and Limitations},
  year =         2017,
  Author =       {Saravanan, Karthikeyan and Kitchin, John R. and von
                  Lilienfeld, O. Anatole and Keith, John A.},
  Bdsk-Url-1 =   {},
  Doi =          {10.1021/acs.jpclett.7b01974},
  Eprint =       {},
  Journal =      {The Journal of Physical Chemistry Letters},
  Note =         {PMID: 28938798},
  Number =       {ja},
  Pages =        {null},
  Url =          {},
  Volume =       0,

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.7

Read and Post Comments

Finding similar bibtex entries

| categories: similarity, bibtex | tags: | View Comments

A common task while writing scientific papers is citing previous research. I use org-ref extensively for that, and it makes it pretty easy to find similar references, e.g. that have common authors, or common keywords. It also lets me find similar articles in Web of Science or Scopus. Suppose that I have cited a particular paper, e.g. e boes-2016-neural-networ, and I want to find similar references to it that are already in my bibtex file, and similar by my definition. With org-ref I can easily search by keyword or author to find similar entries, but these are limited by what I search for, and they are not sorted. Today, I will explore the first step in a recommender system that calculates similarity, and provides a sorted list of candidates with the most relevant ones first.

The idea is to calculate some measure of similarity between the title of that reference, and the titles of other references in my bibtex file, and then sort them by similarity. This is the reference I want to find similar entries for:

Boes, J. R., Groenenboom, M. C., Keith, J. A., & Kitchin, J. R., Neural network and Reaxff comparison for Au properties, Int. J. Quantum Chem., 116(13), 979–987 (2016).

The first thing we do is read in our bibtex file, and print a representative entry.

import bibtexparser
from bibtexparser.bparser import BibTexParser

with open('/Users/jkitchin/Dropbox/bibliography/references.bib') as bibtex_file:
    parser = BibTexParser()
    bib_database = bibtexparser.load(bibtex_file, parser=parser)
    entries = bib_database.entries


{'author': 'Jaan Aarik and Aleks Aidla and V{\\"a}ino Sammelselg and Teet\nUustare', 'title': 'Effect of Growth Conditions on Formation of \\ce{TiO_2}-{II}\nThin Films in Atomic Layer Deposition Process', 'journal': 'Journal of Crystal Growth', 'volume': '181', 'number': '3', 'pages': '259 - 264', 'year': '1997', 'doi': '10.1016/S0022-0248(97)00279-0', 'link': '', 'issn': '0022-0248', 'ENTRYTYPE': 'article', 'ID': 'aarik-1997-effec-growt'}

Each entry is a dictionary containing the fields and their values. For this exploration, I will only consider similarities between titles. The next step is we find which entry corresponds to the reference we want to find similarities to.

ids = [e['ID'] for e in entries]
i = ids.index('boes-2016-neural-networ')

{'author': 'Jacob R. Boes and Mitchell C. Groenenboom and John A. Keith\nand John R. Kitchin', 'title': 'Neural Network and {Reaxff} Comparison for {Au} Properties', 'journal': 'Int. J. Quantum Chem.', 'volume': '116', 'number': '13', 'pages': '979-987', 'year': '2016', 'doi': '10.1002/qua.25115', 'link': '', 'issn': '1097-461X', 'keyword': 'Kohn-Sham density functional theory, neural networks, reactive\nforce fields, potential energy surfaces, machine learning', 'ENTRYTYPE': 'article', 'ID': 'boes-2016-neural-networ'}

It is best if we make the entry we want to find similarities to the first one, so here we swap the first and ith entries.

entries[0], entries[i] = entries[i], entries[0]

Now, we prepare the list of strings to get similarities for.

titles = [e.get('title', '') for e in entries]

We will use term frequency–inverse document frequency to get a vector that represents each title, and then use cosine similarity as a measure of similarity. Here is the place to note that I chose these, and could choose other ones too. Also, it is worth noting that in this measure of similarity I did not choose which keywords to measure similarity on.

The functionality for this is provided by sklearn. It has implemented functions for the algorithms above, and in just a few lines of code you get an array of tf-idf features to analyze. The array we get from our vectorizer contains normalized vectors, so we can get the cosine similarity just from a dot product of the vectors. The first row corresponds to the similarity of the first string to all the others. I want them sorted in descending order. The argsort function returns ascending order, so we use a trick to sort the negative of the similarity score which achieves that. There are certainly more advanced treatments of the text we could use by customizing the vectorizer, e.g. word stemming, but for now we neglect that.

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(titles)

cosine_similarities = (X * X.T).A[0]

related_docs_indices = (-cosine_similarities).argsort()

print('The top 10 recommendations for {} are:\n'.format(S[0]))
for i, j in enumerate(related_docs_indices[1:11]):
    print('{i}. {ID}: {title}, {author}\n'.format(i=i + 1, **entries[j]))

The top 10 recommendations for Neural Network and {Reaxff} Comparison for {Au} Properties are:

  1. behler-2010-neural: Neural network potential-energy surfaces for atomistic

simulations, J{\"o}rg Behler

  1. boes-2017-neural-networ: Neural Network Predictions of Oxygen Interactions on a Dynamic

{Pd} Surface, Jacob R. Boes and John R. Kitchin

  1. eshet-2010-ab: Ab Initio Quality Neural-Network Potential for Sodium, Hagai Eshet and Rustam Z. Khaliullin and Thomas D. K{\"u}hne

and J{\"o}rg Behler and Michele Parrinello

  1. behler-2014-repres-poten: Representing Potential Energy Surfaces By High-Dimensional

Neural Network Potentials, J Behler

  1. behler-2007-gener-neural: Generalized Neural-Network Representation of High-Dimensional

Potential-Energy Surfaces, J{\"o}rg Behler and Michele Parrinello

  1. artrith-2012-high: High-Dimensional Neural Network Potentials for Metal Surfaces:

A Prototype Study for Copper, Nongnuch Artrith and J{\"o}rg Behler

  1. behler-2015-const: Constructing High-Dimensional Neural Network Potentials: A

Tutorial Review, J{\"o}rg Behler

  1. artrith-2011-high: High-Dimensional Neural-Network Potentials for Multicomponent

Systems: Applications To Zinc Oxide, Nongnuch Artrith and Tobias Morawietz and J{\"o}rg Behler

  1. sosso-2012-neural-gete: Neural Network Interatomic Potential for the Phase Change

Material \ce{GeTe}, Gabriele C. Sosso and Giacomo Miceli and Sebastiano Caravati and J{\"o}rg Behler and Marco Bernasconi

  1. lorenz-2006-descr: Descriptions of Surface Chemical Reactions Using a Neural

Network Representation of the Potential-Energy Surface, S{\"o}nke Lorenz and Matthias Scheffler and Axel Gross

It is evident that this is showing other references containing the words "neural network"! I guess that is a little disappointing, since these would just as easily been narrowed down in org-ref. On the other hand, they are sorted and grouped, which would not happen in org-ref. This is a comparison of pretty short strings (just the titles), so maybe this would be much more interesting if abstracts were also included. Including authors would give a different set as well (I tried it, and got a bunch of my own references!).

I don't think it would be very difficult to get this into an Emacs selection tool, e.g. helm/ivy. Check this out:

import pycse.lisp


'(1592 1650 299 1751 103)'

That is a result that can be read directly by lisp, so we could simply write the code above as a shell script that takes an argument, and returns a list of indices to sort the candidates on. The alternative is to implement this in elisp, perhaps via a dynamic module if there is already a good C library for this. My sense is the Python libraries are more advanced in functionality.

This could have a number of other applications. Given some reference content, you could imagine finding emails that are similar to it, finding RSS entries that are similar to it, finding org headlines that are related, similar files, or similarity with any other set of strings that can be gathered, e.g. from Crossref or some other search, etc. I predict there will be more on these topics in the future!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.7

Read and Post Comments

« Previous Page -- Next Page »