Using autograd in nonlinear regression

| categories: python, regression, autograd | tags: | View Comments

Table raw-data contains the energy as a function of volume for some solid material from a set of density functional theory calculations. Our goal is to fit the Murnaghan equation of state to this data. The model is moderately nonlinear. I have previously done this with the standard nonlinear regression functions in scipy, so today we will use autograd along with a builtin optimizer to minimize an objective function to achieve the same thing.

The basic idea is we define an objective function, in this case the summed squared errors between predicted values from the model and known values from our data. The objective function takes two arguments: the model parameters, and the "step". This function signature is a consequence of the built in optimizer we use; it expects that signature (it is useful for batch training, but we will not use that here). We use autograd to create a gradient of the objective function which the adam optimizer will use to vary the parameters with the goal of minimizing the objective function.

The adam optimizer function takes as one argument a callback function, which we call summary to print out intermediate results during the convergence. We run the optimizer in a loop because the optimizer runs a fixed number of steps on each call. We check if the objective function is sufficiently small, and if it is we break out.

import autograd.numpy as np
from autograd import grad
from autograd.misc.optimizers import adam

np.set_printoptions(precision=3, suppress=True)

# input data
Vinput = np.array([row[0] for row in data]) 
Eknown = np.array([row[1] for row in data])

def Murnaghan(pars, vol):
    '''
    given a vector of parameters and volumes, return a vector of energies.
    equation From PRB 28,5480 (1983)
    '''
    E0, B0, BP, V0 = pars
    E = E0 + B0 * vol / BP * (((V0 / vol)**BP) / (BP - 1.0) + 1.0) - V0 * B0 / (BP - 1.)
    return E

def objective(pars, step):
    "This is what we want to minimize by varying the pars."
    predicted = Murnaghan(pars, Vinput)
    # Note Eknown is not defined in this function scope
    errors = Eknown - predicted
    return np.sum(errors**2)

objective_grad = grad(objective)

def summary(pars, step, gradient):
    # Note i, N are not defined in this function scope
    if step % N == 0: 
        print('step {0:5d}: {1:1.3e}'.format(i * N + step, 
                                             objective(pars, step)))

pars = np.array([-400, 0.5, 2, 210]) # The initial guess
N = 200 # num of steps to take on each optimization
learning_rate = 0.001
for i in range(100):
    pars = adam(objective_grad, pars, step_size=learning_rate, 
                num_iters=N, callback=summary)
    SSE = objective(pars, None)
    if SSE < 0.00002:
        print('Tolerance met.', SSE)
        break
print(pars)
step     0: 3.127e+02
step   200: 1.138e+02
step   400: 2.011e+01
step   600: 1.384e+00
step   800: 1.753e-01
step  1000: 2.044e-03
step  1200: 1.640e-03
step  1400: 1.311e-03
step  1600: 1.024e-03
step  1800: 7.765e-04
step  2000: 5.698e-04
step  2200: 4.025e-04
step  2400: 2.724e-04
step  2600: 1.762e-04
step  2800: 1.095e-04
step  3000: 6.656e-05
step  3200: 3.871e-05
step  3400: 2.359e-05
('Tolerance met.', 1.5768901008364176e-05)
[-400.029    0.004    4.032  211.847]

There are some subtleties in the code above. One is the variables that are used kind of all over the place, which is noted in a few places. Those could get tricky to keep track of. Another is the variable I called learning_rate. I borrowed that terminology from the machine learning community. It is the step_size in this implementation of the optimizer. If you make it too large, the objective function doesn't converge, but if you set it too small, it will take a long time to converge. Note that it took at about 3400 steps of "training". This is a lot more than is typically required by something like pycse.nlinfit. This isn't the typical application for this approach to regression. More on that another day.

As with any fit, it is wise to check it out at least graphically. Here is the fit and data.

%matplotlib inline
import matplotlib
matplotlib.rc('axes.formatter', useoffset=False)
import matplotlib.pyplot as plt

plt.plot(Vinput, Eknown, 'ko', label='known')

vinterp = np.linspace(Vinput.min(), Vinput.max(), 200)

plt.plot(vinterp, Murnaghan(pars, vinterp), 'r-', label='predicted')
plt.xlabel('Vol')
plt.ylabel('E')

The fit looks pretty good.

Table 1: Volume-Energy data for a solid state system.
volume energy
324.85990899 -399.9731688470
253.43999457 -400.0172393178
234.03826687 -400.0256270548
231.12159387 -400.0265690700
228.40609504 -400.0273551120
225.86490337 -400.0280030862
223.47556626 -400.0285313450
221.21992353 -400.0289534593
219.08319566 -400.0292800709
217.05369547 -400.0295224970
215.12089909 -400.0296863867
213.27525144 -400.0297809256
211.51060823 -400.0298110000
203.66743321 -400.0291665573
197.07888649 -400.0275017142
191.39717952 -400.0250998136
186.40163591 -400.0221371852
181.94435510 -400.0187369863
177.92077043 -400.0149820198
174.25380090 -400.0109367042
170.88582166 -400.0066495100
167.76711189 -400.0021478258
164.87096104 -399.9974753449
159.62553397 -399.9876885136
154.97005460 -399.9774175487
150.78475335 -399.9667603369
146.97722201 -399.9557686286
143.49380641 -399.9445262604

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Sensitivity analysis using automatic differentiation in Python

| categories: python, autograd, sensitivity | tags: | View Comments

This paper describes how sensitivity analysis requires access to the derivatives of a function. Say, for example we have a function describing the time evolution of the concentration of species A:

\([A] = \frac{[A]_0}{k_1 + k_{-1}} (k_1 e^{(-(k_1 _ k_{-1})t)} + k_{-1})\)

The local sensitivity of the concentration of A to the parameters \(k1\) and \(k_1\) are defined as \(\frac{\partial A}{\partial k1}\) and \(\frac{\partial A}{\partial k_1}\). Our goal is to plot the sensitivity as a function of time. We could derive those derivatives, but we will use auto-differentiation instead through the autograd package. Here we import numpy from the autograd package and plot the function above.

import autograd.numpy as np

A0 = 1.0

def A(t, k1, k_1):
    return A0 / (k1 + k_1) * (k1 * np.exp(-(k1 + k_1) * t) + k_1)

%matplotlib inline
import matplotlib.pyplot as plt

t = np.linspace(0, 0.5)

k1 = 3.0
k_1 = 3.0
plt.plot(t, A(t, k1, k_1))
plt.xlim([0, 0.5])
plt.ylim([0, 1])
plt.xlabel('t')
plt.ylabel('A')

The figure above reproduces Fig. 1 from the paper referenced above. Next, we use autograd to get the derivatives. This is subtly different than our previous post. First, we need the derivative of the function with respect to the second and third arguments; the default is the first argument. Second, we want to evaluate this derivative at each time value. We use the jacobian function in autograd to get these. This is different than grad, which will sum up the derivatives at each time. That might be useful for regression, but not for sensitivity analysis. Finally, to reproduce Figure 2a, we plot the absolute value of the sensitivities.

from autograd import jacobian

dAdk1 = jacobian(A, 1)
dAdk_1 = jacobian(A, 2)

plt.plot(t, np.abs(dAdk1(t, k1, k_1)))
plt.plot(t, np.abs(dAdk_1(t, k1, k_1)))
plt.xlim([0, 0.5])
plt.ylim([0, 0.1])
plt.xlabel('t')
plt.legend(['$S_{k1}$', '$S_{k\_1}$'])

That looks like the figure in the paper. To summarize the main takeaway, autograd enabled us to readily compute derivatives without having to derive them manually. There was a little subtlety in choosing jacobian over grad or elementwise_grad but once you know what these do, it seems reasonable. It is important to import the wrapped numpy first, to enable autograd to do its work. All the functions here are pretty standard, so everything worked out of the box. We should probably be using autograd, or something like it for more things in science!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

A Hy macro for defining functions with docstrings on each argument

| categories: python, hylang | tags: | View Comments

For functions with a lot of arguments, python style docstrings leave something to be desired. For one, they are not that close to the arguments, so if you have a function with say 20 arguments, the docstring might take up a whole page! That means they are hard to keep synchronized too. Let's not argue now over the merits of a function with 20+ arguments, it is enough that they exist, and are a problem.

So what are typical documentation standards? Here is a Numpy style doc string:

def func(arg1, arg2):
    """multiply arg1 and arg2

    Parameters
    ----------
    arg1 : a number
    arg2 : a number

    """
    return arg1 * arg2

It works well for a small number of arguments with limited descriptions. This is a proper docstring that is accessible by introspection and pydoc. With much longer argument lists, this falls apart. I will not pick on any code in particular here, but suffice it to say I was inspired today to think of a better way. There are some other documentation solutions at http://stackoverflow.com/questions/9195455/how-to-document-a-method-with-parameters, but None of them are better in my opinion. I want accessible docstrings by instrospection, and only if that is unavailable do I want to read the code! Finally, if I have to read the code, I want it to be easy to figure out, which means the documentation is close to the arguments.

There is bad news, I do not have one for vanilla python. Python does not even give you a way to deal with this. But, if we had a lisp, we could make a macro to help us out. In fact, we have a lisp with hy! And we can use a macro to make a syntax that lets us keep the docstring close to the argument, and that constructs a real docstring so we get help later!

Here it is:

(defmacro mydef [func args &optional docstring &rest body]
  `(defn ~func [~@(map (lambda [x] (nth x 0)) args)]
     ~(+ (if docstring (+ docstring "\n\n") "")
         "Parameters\n----------\n"
         (.join "\n" (map (lambda [x]
                            (.format "{} : {}"
                                     (nth x 0)
                                     (nth x 1))) args)))
     ~@body))

We can checkout how it expands like this:

(print (macroexpand '(mydef f [(a "an int")
                               (b "an int")]
                            "some doc"
                            (* a b))))
('setv' 'f' ('fn' ['a' 'b'] 'some doc\n\nParameters\n----------\na : an int\nb : an int' ('*' 'a' 'b')))

That looks ok. Now, for an example of using that. Here is the same function we defined before, but I put the documentation for each argument with the argument.

(mydef func ((arg1 "a number")
             (arg2 "a number"))
  "Multiply arg1 by arg2"
  (* arg1 arg2))

We can use the function now like a regular function.

(print (func 24 3))
72

And now for the help.

(help func)
Help on function func in module __main__:

func(arg1, arg2)
    Multiply arg1 by arg2

    Parameters
    ----------
    arg1 : a number
    arg2 : a number

Now, that should amaze and astonish you if you are a vanilla Pythonista! We have our cake, and we eat it too. You just can not make up your own syntax that way in Python. Imagine, we could add type information, validation code, etc… into that macro. Maybe it could even be possible to store argument dependent documentation on the function, say in the function dictionary. That would require some conventions I guess, but they could become introspectable then. For example, in this vanilla Python:

def f(x): return x*x
f.__dict__['args'] = {'x': 'A number'}
print(f.__dict__)

{'args': {'x': 'A number'}}

In the end, this does not really solve all the problems I have with current docstrings in Python. It does solve a problem with writing and reading the code by keeping documentation close to the arguments, but ultimately the docstring from Python's point of view will basically look the same. It is pretty awesome that it is even possible. Hy lisp for the win here (again!).

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.5

Read and Post Comments

Modeling a Cu dimer by EMT, nonlinear regression and neural networks

| categories: molecular-simulation, neural-network, python, machine-learning | tags: | View Comments

In this post we consider a Cu2 dimer and how its energy varies with the separation of the atoms. We assume we have a way to calculate this, but that it is expensive, and that we want to create a simpler model that is as accurate, but cheaper to run. A simple way to do that is to regress a physical model, but we will illustrate some challenges with that. We then show a neural network can be used as an accurate regression function without needing to know more about the physics.

We will use an effective medium theory calculator to demonstrate this. The calculations are not expected to be very accurate or relevant to any experimental data, but they are fast, and will illustrate several useful points that are independent of that. We will take as our energy zero the energy of two atoms at a large separation, in this case about 10 angstroms. Here we plot the energy as a function of the distance between the two atoms, which is the only degree of freedom that matters in this example.

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

from ase.calculators.emt import EMT
from ase import Atoms

atoms = Atoms('Cu2',[[0, 0, 0], [10, 0, 0]], pbc=[False, False, False])
atoms.set_calculator(EMT())

e0 = atoms.get_potential_energy()

# Array of bond lengths to get the energy for
d = np.linspace(1.7, 3, 30)

def get_e(distance):
    a = atoms.copy()
    a[1].x = distance
    a.set_calculator(EMT())
    e = a.get_potential_energy()
    return e

e = np.array([get_e(dist) for dist in d])
e -=  e0  # set the energy zero

plt.plot(d, e, 'bo ')
plt.xlabel('d (Å)')
plt.ylabel('energy (eV)')

We see there is a minimum, and the energy is asymmetric about the minimum. We have no functional form for the energy here, just the data in the plot. So to get another energy, we have to run another calculation. If that was expensive, we might prefer an analytical equation to evaluate instead. We will get an analytical form by fitting a function to the data. A classic one is the Buckingham potential: \(E = A \exp(-B r) - \frac{C}{r^6}\). Here we perform the regression.

def model(r, A, B, C):
    return A * np.exp(-B * r) - C / r**6

from pycse import nlinfit
import pprint

p0 = [-80, 1, 1]
p, pint, se = nlinfit(model, d, e, p0, 0.05)
print('Parameters = ', p)
print('Confidence intervals = ')
pprint.pprint(pint)
plt.plot(d, e, 'bo ', label='calculations')

x = np.linspace(min(d), max(d))
plt.plot(x, model(x, *p), label='fit')
plt.legend(loc='best')
plt.xlabel('d (Å)')
plt.ylabel('energy (eV)')

Parameters = [ -83.21072545 1.18663393 -266.15259507] Confidence intervals = array([[ -93.47624687, -72.94520404], [ 1.14158438, 1.23168348], [-280.92915682, -251.37603331]])

That fit is ok, but not great. We would be better off with a spline for this simple system! The trouble is how do we get anything better? If we had a better equation to fit to we might get better results. While one might come up with one for this dimer, how would you extend it to more complex systems, even just a trimer? There have been decades of research dedicated to that, and we are not smarter than those researchers so, it is time for a new approach.

We will use a Neural Network regressor. The input will be \(d\) and we want to regress a function to predict the energy.

There are a couple of important points to make here.

  1. This is just another kind of regression.
  2. We need a lot more data to do the regression. Here we use 300 data points.
  3. We need to specify a network architecture. Here we use one hidden layer with 10 neurons, and the tanh activation function on each neuron. The last layer is just the output layer. I do not claim this is any kind of optimal architecture. It is just one that works to illustrate the idea.

Here is the code that uses a neural network regressor, which is lightly adapted from http://scikit-neuralnetwork.readthedocs.io/en/latest/guide_model.html.

from sknn.mlp import Regressor, Layer

D = np.linspace(1.7, 3, 300)

def get_e(distance):
    a = atoms.copy()
    a[1].x = distance
    a.set_calculator(EMT())
    e = a.get_potential_energy()
    return e

E = np.array([get_e(dist) for dist in D])
E -=  e0  # set the energy zero

X_train = np.row_stack(np.array(D))

N = 10
nn = Regressor(layers=[Layer("Tanh", units=N),
                       Layer('Linear')])
nn.fit(X_train, E)

dfit = np.linspace(min(d), max(d))

efit = nn.predict(np.row_stack(dfit))

plt.plot(d, e, 'bo ')
plt.plot(dfit, efit)
plt.legend(['calculations', 'neural network'])
plt.xlabel('d (Å)')
plt.ylabel('energy (eV)')

This fit looks pretty good, better than we got for the Buckingham potential. Well, it probably should look better, we have many more parameters that were fitted! It is not perfect, but it could be systematically improved by increasing the number of hidden layers, and neurons in each layer. I am being a little loose here by relying on a visual assessment of the fit. To systematically improve it you would need a quantitative analysis of the errors. I also note though, that if I run the block above several times in succession, I get different fits each time. I suppose that is due to some random numbers used to initialize the fit, but sometimes the fit is about as good as the result you see above, and sometimes it is terrible.

Ok, what is the point after all? We developed a neural network that pretty accurately captures the energy of a Cu dimer with no knowledge of the physics involved. Now, EMT is not that expensive, but suppose this required 300 DFT calculations at 1 minute or more a piece? That is five hours just to get the data! With this neural network, we can quickly compute energies. For example, this shows we get about 10000 energy calculations in just 287 ms.

%%timeit

dfit = np.linspace(min(d), max(d), 10000)
efit = nn.predict(np.row_stack(dfit))

1 loop, best of 3: 287 ms per loop

Compare that to the time it took to compute the 300 energies with EMT

%%timeit
E = np.array([get_e(dist) for dist in D])

1 loop, best of 3: 230 ms per loop

The neural network is a lot faster than the way we get the EMT energies!

It is true in this case we could have used a spline, or interpolating function and it would likely be even better than this Neural Network. We are aiming to get more complicated soon though. For a trimer, we will have three dimensions to worry about, and that can still be worked out in a similar fashion I think. Past that, it becomes too hard to reduce the dimensions, and this approach breaks down. Then we have to try something else. We will get to that in another post.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.5

Read and Post Comments

ob-ipython and inline figures in org-mode

| categories: ipython, python | tags: | View Comments

ob-ipython provides some nice support for inline images, but it is a little limited. You can only have one inline plot, and you cannot capture the printed output. I often want both, and use more than one figure in a code block. So, here I look at a way to get that.

When ob-ipython executes a cell, it gets two things internally: the output and a list of result elements. The output is all the stuff that is printed, and the result contains result cells. So, we just have to check these for images, and append them to the output in an appropriate way. I will do that using file links so that org automatically renders them. We will save the images as temp files, since they are regenerated each time you run the cell.

I want output and inline figures. This ipython block should output some text and two figures. Note we do not define file names anywhere! See this section for details on how to get ob-ipython to do this.

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

t = np.linspace(0, 20 * np.pi, 350)
x = np.exp(-0.1 * t) * np.sin(t)
y = np.exp(-0.1 * t) * np.cos(t)

plt.plot(x, y)
plt.axis('equal')

plt.figure()
plt.plot(y, x)
plt.axis('equal')

print('Length of t = {}'.format(len(t)))
print('x .dot. y = {}'.format(x @ y))

Length of t = 350 x .dot. y = 1.3598389888491538

Nice, success! Now my code blocks export more cleanly to jupyter notebooks. Speaking of which, if you liked the post on that, there is a new library for it in scimax: https://github.com/jkitchin/scimax/blob/master/ox-ipynb.el. Yes, one day I will put it in its own repo, and probably put it up on MELPA. If it turns out to be useful over the next semester.

1 code for getting output and inline figures

I wrote one new function that writes the base64 data out to a temporary file and returns a link to it. Then, I modified the org-babel-execute:ipython function to append these links onto the output. It seems like you need to use a header like this in your ob-ipython block, notably the results need to be in a drawer like this if you want org-mode to render the images. They do not show up in the results that have colons starting them.

#+BEGIN_SRC ipython :session :results output drawer

Here is the code.

(defun ob-ipython-inline-image (b64-string)
  "Write the b64-string to a temporary file.
Returns an org-link to the file."
  (let* ((tfile (make-temp-file "ob-ipython-" nil ".png"))
         (link (format "[[file:%s]]" tfile)))
    (ob-ipython--write-base64-string tfile b64-string)
    link))


(defun org-babel-execute:ipython (body params)
  "Execute a block of IPython code with Babel.
This function is called by `org-babel-execute-src-block'."
  (let* ((file (cdr (assoc :file params)))
         (session (cdr (assoc :session params)))
         (result-type (cdr (assoc :result-type params))))
    (org-babel-ipython-initiate-session session params)
    (-when-let (ret (ob-ipython--eval
                     (ob-ipython--execute-request
                      (org-babel-expand-body:generic (encode-coding-string body 'utf-8)
                                                     params (org-babel-variable-assignments:python params))
                      (ob-ipython--normalize-session session))))
      (let ((result (cdr (assoc :result ret)))
            (output (cdr (assoc :output ret))))
        (if (eq result-type 'output)
            (concat
             output 
             (format "%s"
                     (mapconcat 'identity
                                (loop for res in result
                                      if (eq 'image/png (car res))
                                      collect (ob-ipython-inline-image (cdr res)))
                                "\n")))
          (ob-ipython--create-stdout-buffer output)
          (cond ((and file (string= (f-ext file) "png"))
                 (->> result (assoc 'image/png) cdr (ob-ipython--write-base64-string file)))
                ((and file (string= (f-ext file) "svg"))
                 (->> result (assoc 'image/svg+xml) cdr (ob-ipython--write-string-to-file file)))
                (file (error "%s is currently an unsupported file extension." (f-ext file)))
                (t (->> result (assoc 'text/plain) cdr))))))))
org-babel-execute:ipython

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.3

Read and Post Comments

« Previous Page -- Next Page »