Solving an eigenvalue differential equation with a neural network

| categories: eigenvalue, bvp, autograd | tags: | View Comments

The 1D harmonic oscillator is described here. It is a boundary value differential equation with eigenvalues. If we let let ω=1, m=1, and units where ℏ=1. then, the governing differential equation becomes:

\(-0.5 \frac{d^2\psi(x)}{dx^2} + (0.5 x^2 - E) \psi(x) = 0\)

with boundary conditions: \(\psi(-\infty) = \psi(\infty) = 0\)

We can further stipulate that the probability of finding the particle over this domain is equal to one: \(\int_{-\infty}^{\infty} \psi^2(x) dx = 1\). In this set of equations, \(E\) is an eigenvalue, which means there are only non-trivial solutions for certain values of \(E\).

Our goal is to solve this equation using a neural network to represent the wave function. This is a different problem than the one here or here because of the eigenvalue. This is an additional adjustable parameter we have to find. Also, we have the normalization constraint to consider, which we did not consider before.

1 The neural network setup

Here we setup the neural network and its derivatives. This is the same as we did before.

import autograd.numpy as np
from autograd import grad, elementwise_grad
import autograd.numpy.random as npr
from autograd.misc.optimizers import adam

def init_random_params(scale, layer_sizes, rs=npr.RandomState(42)):
    """Build a list of (weights, biases) tuples, one for each layer."""
    return [(rs.randn(insize, outsize) * scale,   # weight matrix
             rs.randn(outsize) * scale)           # bias vector
            for insize, outsize in zip(layer_sizes[:-1], layer_sizes[1:])]

def swish(x):
    "see https://arxiv.org/pdf/1710.05941.pdf"
    return x / (1.0 + np.exp(-x))

def psi(nnparams, inputs):
    "Neural network wavefunction"
    for W, b in nnparams:
        outputs = np.dot(inputs, W) + b
        inputs = swish(outputs)    
    return outputs

psip = elementwise_grad(psi, 1) # dpsi/dx 
psipp = elementwise_grad(psip, 1) # d^2psi/dx^2

2 The objective function

The important function we need is the objective function. This function codes the Schrödinger equation, the boundary conditions, and the normalization as a cost function that we will later seek to minimize. Ideally, at the solution the objective function will be zero. We can't put infinity into our objective function, but it turns out that x = ± 6 is practically infinity in this case, so we approximate the boundary conditions there.

Another note is the numerical integration by the trapezoid rule. I use a vectorized version of this because autograd doesn't have a trapz derivative and I didn't feel like figuring one out.

We define the params to vary here as a dictionary containing neural network weights and biases, and the value of the eigenvalue.

# Here is our initial guess of params:
nnparams = init_random_params(0.1, layer_sizes=[1, 8, 1])

params = {'nn': nnparams, 'E': 0.4}

x = np.linspace(-6, 6, 200)[:, None]

def objective(params, step):
    nnparams = params['nn']
    E = params['E']        
    # This is Schrodinger's eqn
    zeq = -0.5 * psipp(nnparams, x) + (0.5 * x**2 - E) * psi(nnparams, x) 
    bc0 = psi(nnparams, -6.0) # This approximates -infinity
    bc1 = psi(nnparams, 6.0)  # This approximates +infinity
    y2 = psi(nnparams, x)**2
    # This is a numerical trapezoid integration
    prob = np.sum((y2[1:] + y2[0:-1]) / 2 * (x[1:] - x[0:-1]))
    return np.mean(zeq**2) + bc0**2 + bc1**2 + (1.0 - prob)**2

# This gives us feedback from the optimizer
def callback(params, step, g):
    if step % 1000 == 0:
        print("Iteration {0:3d} objective {1}".format(step,
                                                      objective(params, step)))

3 The minimization

Now, we just let an optimizer minimize the objective function for us. Note, I ran this next block more than once, as the objective continued to decrease. I ran this one at least two times, and the loss was still decreasing slowly.

params = adam(grad(objective), params,
              step_size=0.001, num_iters=5001, callback=callback) 

print(params['E'])
Iteration   0 objective [[ 0.00330204]]
Iteration 1000 objective [[ 0.00246459]]
Iteration 2000 objective [[ 0.00169862]]
Iteration 3000 objective [[ 0.00131453]]
Iteration 4000 objective [[ 0.00113132]]
Iteration 5000 objective [[ 0.00104405]]
0.5029457355415167

Good news, the lowest energy eigenvalue is known to be 0.5 for our choice of parameters, and that is approximately what we got. Now let's see our solution and compare it to the known solution. Interestingly we got the negative of the solution, which is still a solution. The NN solution is not indistinguishable from the analytical solution, and has some spurious curvature in the tails, but it is approximately correct, and more training might get it closer. A different activation function might also work better.

%matplotlib inline
import matplotlib.pyplot as plt

x = np.linspace(-6, 6)[:, None]
y = psi(params['nn'], x)

plt.plot(x, -y, label='NN')
plt.plot(x, (1/np.pi)**0.25 * np.exp(-x**2 / 2), 'r--', label='analytical')
plt.legend()

4 The first excited state

Now, what about the first excited state? This has an eigenvalue of 1.5, and the solution has odd parity. We can naively change the eigenvalue, and hope that the optimizer will find the right new solution. We do that here, and use the old NN params.

params['E'] = 1.6

Now, we run a round of optimization:

params = adam(grad(objective), params,
              step_size=0.003, num_iters=5001, callback=callback) 

print(params['E'])
Iteration   0 objective [[ 0.09918192]]
Iteration 1000 objective [[ 0.00102333]]
Iteration 2000 objective [[ 0.00100269]]
Iteration 3000 objective [[ 0.00098684]]
Iteration 4000 objective [[ 0.00097425]]
Iteration 5000 objective [[ 0.00096347]]
0.502326347406645

That doesn't work though. The optimizer just pushes the solution back to the known one. Next, we try starting from scratch with the eigenvalue guess.

nnparams = init_random_params(0.1, layer_sizes=[1, 8, 1])

params = {'nn': nnparams, 'E': 1.6}

params = adam(grad(objective), params,
              step_size=0.003, num_iters=5001, callback=callback) 

print(params['E'])
Iteration   0 objective [[ 2.08318762]]
Iteration 1000 objective [[ 0.02358685]]
Iteration 2000 objective [[ 0.00726497]]
Iteration 3000 objective [[ 0.00336433]]
Iteration 4000 objective [[ 0.00229851]]
Iteration 5000 objective [[ 0.00190942]]
0.5066213334684926

That also doesn't work. We are going to have to steer this. The idea is pre-train the neural network to have the basic shape and symmetry we want, and then use that as the input for the objective function. The first excited state has odd parity, and here is a guess of that shape. This is a pretty ugly hacked up version that only roughly has the right shape. I am counting on the NN smoothing out the discontinuities.

xm = np.linspace(-6, 6)[:, None]
ym = -0.5 * ((-1 * (xm + 1.5)**2) + 1.5) * (xm < 0) * (xm > -3)
yp = -0.5 * ((1 * (xm - 1.5)**2 ) - 1.5) * (xm > 0) * (xm < 3)

plt.plot(xm, (ym + yp))
plt.plot(x, (1/np.pi)**0.25 * np.sqrt(2) * x * np.exp(-x**2 / 2), 'r--', label='analytical')

Now we pretrain a bit.

def pretrain(params, step):
    nnparams = params['nn']
    errs = psi(nnparams, xm) - (ym + yp)
    return np.mean(errs**2)

params = adam(grad(pretrain), params,
              step_size=0.003, num_iters=501, callback=callback) 
Iteration   0 objective [[ 1.09283695]]

Here is the new initial guess we are going to use. You can see that indeed a lot of smoothing has occurred.

plt.plot(xm, ym + yp, xm, psi(params['nn'], xm))

That has the right shape now. So we go back to the original objective function.

params = adam(grad(objective), params,
              step_size=0.001, num_iters=5001, callback=callback) 

print(params['E'])
Iteration   0 objective [[ 0.00370029]]
Iteration 1000 objective [[ 0.00358193]]
Iteration 2000 objective [[ 0.00345137]]
Iteration 3000 objective [[ 0.00333]]
Iteration 4000 objective [[ 0.0032198]]
Iteration 5000 objective [[ 0.00311844]]
1.5065724128094344

I ran that optimization block many times. The loss is still decreasing, but slowly. More importantly, the eigenvalue is converging to 1.5, which is the known analytical value, and the solution is converging to the known solution.

x = np.linspace(-6, 6)[:, None]
y = psi(params['nn'], x)

plt.plot(x, y, label='NN')
plt.plot(x, (1/np.pi)**0.25 * np.sqrt(2) * x * np.exp(-x**2 / 2), 'r--', label='analytical')
plt.legend()

We can confirm the normalization is reasonable:

# check the normalization
print(np.trapz(y.T * y.T, x.T))
[ 0.99781886]

5 Summary

This is another example of using autograd to solve an eigenvalue differential equation. Some of these solutions required tens of thousands of iterations of training. The groundstate wavefunction was very easy to get. The first excited state, on the other hand, took some active steering. This is very much like how an initial guess can change which solution a nonlinear optimization (which this is) finds.

There are other ways to solve this particular problem. What I think is interesting about this is the possibility to solve harder problems, e.g. not just a harmonic potential, but a more complex one. You could pretrain a network on the harmonic solution, and then use it as the initial guess for the harder problem (which has no analytical solution).

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Solving ODEs with a neural network and autograd

| categories: ode, autograd | tags: | View Comments

In the last post I explored using a neural network to solve a BVP. Here, I expand the idea to solving an initial value ordinary differential equation. The idea is basically the same, we just have a slightly different objective function.

\(dCa/dt = -k Ca(t)\) where \(Ca(t=0) = 2.0\).

Here is the code that solves this equation, along with a comparison to the analytical solution: \(Ca(t) = Ca0 \exp -kt\).

import autograd.numpy as np
from autograd import grad, elementwise_grad
import autograd.numpy.random as npr
from autograd.misc.optimizers import adam

def init_random_params(scale, layer_sizes, rs=npr.RandomState(0)):
    """Build a list of (weights, biases) tuples, one for each layer."""
    return [(rs.randn(insize, outsize) * scale,   # weight matrix
             rs.randn(outsize) * scale)           # bias vector
            for insize, outsize in zip(layer_sizes[:-1], layer_sizes[1:])]


def swish(x):
    "see https://arxiv.org/pdf/1710.05941.pdf"
    return x / (1.0 + np.exp(-x))


def Ca(params, inputs):
    "Neural network functions"
    for W, b in params:
        outputs = np.dot(inputs, W) + b
        inputs = swish(outputs)    
    return outputs

    
# Here is our initial guess of params:
params = init_random_params(0.1, layer_sizes=[1, 8, 1])

# Derivatives
dCadt = elementwise_grad(Ca, 1)

k = 0.23
Ca0 = 2.0
t = np.linspace(0, 10).reshape((-1, 1))

# This is the function we seek to minimize
def objective(params, step):
    # These should all be zero at the solution
    # dCadt = -k * Ca(t)
    zeq = dCadt(params, t) - (-k * Ca(params, t))
    ic = Ca(params, 0) - Ca0
    return np.mean(zeq**2) + ic**2

def callback(params, step, g):
    if step % 1000 == 0:
        print("Iteration {0:3d} objective {1}".format(step,
                                                      objective(params, step)))

params = adam(grad(objective), params,
              step_size=0.001, num_iters=5001, callback=callback) 


tfit = np.linspace(0, 20).reshape(-1, 1)
import matplotlib.pyplot as plt
plt.plot(tfit, Ca(params, tfit), label='soln')
plt.plot(tfit, Ca0 * np.exp(-k * tfit), 'r--', label='analytical soln')
plt.legend()
plt.xlabel('time')
plt.ylabel('$C_A$')
plt.xlim([0, 20])
plt.savefig('nn-ode.png')
Iteration   0 objective [[ 3.20374053]]
Iteration 1000 objective [[  3.13906829e-05]]
Iteration 2000 objective [[  1.95894699e-05]]
Iteration 3000 objective [[  1.60381564e-05]]
Iteration 4000 objective [[  1.39930673e-05]]
Iteration 5000 objective [[  1.03554970e-05]]

Huh. Those two solutions are nearly indistinguishable. Since we used a neural network, let's hype it up and say we learned the solution to a differential equation! But seriously, note that although we got an "analytical" solution, we should only rely on it in the region we trained the solution on. You can see the solution above is not that good past t=10, even perhaps going negative (which is not even physically correct). That is a reminder that the function we have for the solution is not the same as the analytical solution, it just approximates it really well over the region we solved over. Of course, you can expand that region to the region you care about, but the main point is don't rely on the solution outside where you know it is good.

This idea isn't new. There are several papers in the literature on using neural networks to solve differential equations, e.g. http://www.sciencedirect.com/science/article/pii/S0255270102002076 and https://arxiv.org/pdf/physics/9705023.pdf, and other blog posts that are similar (https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c, even using autograd). That means to me that there is some merit to continuing to investigate this approach to solving differential equations.

There are some interesting challenges for engineers to consider with this approach though. When is the solution accurate enough? How reliable are derivatives of the solution? What network architecture is appropriate or best? How do you know how good the solution is? Is it possible to build in solution features, e.g. asymptotes, or constraints on derivatives, or that the solution should be monotonic, etc. These would help us trust the solutions not to do weird things, and to extrapolate more reliably.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Solving BVPs with a neural network and autograd

| categories: bvp, autograd | tags: | View Comments

In this post we solved a boundary value problem by discretizing it, and approximating the derivatives by finite differences. Here I explore using a neural network to approximate the unknown function, autograd to get the required derivatives, and using autograd to train the neural network to satisfy the differential equations. We will look at the Blasius equation again.

\begin{eqnarray} f''' + \frac{1}{2} f f'' &=& 0 \\ f(0) &=& 0 \\ f'(0) &=& 0 \\ f'(\infty) &=& 1 \end{eqnarray}

Here I setup a simple neural network

import autograd.numpy as np
from autograd import grad, elementwise_grad
import autograd.numpy.random as npr
from autograd.misc.optimizers import adam

def init_random_params(scale, layer_sizes, rs=npr.RandomState(0)):
    """Build a list of (weights, biases) tuples, one for each layer."""
    return [(rs.randn(insize, outsize) * scale,   # weight matrix
             rs.randn(outsize) * scale)           # bias vector
            for insize, outsize in zip(layer_sizes[:-1], layer_sizes[1:])]


def swish(x):
    "see https://arxiv.org/pdf/1710.05941.pdf"
    return x / (1.0 + np.exp(-x))


def f(params, inputs):
    "Neural network functions"
    for W, b in params:
        outputs = np.dot(inputs, W) + b
        inputs = swish(outputs)    
    return outputs

    
# Here is our initial guess of params:
params = init_random_params(0.1, layer_sizes=[1, 8, 1])

# Derivatives
fp = elementwise_grad(f, 1)
fpp = elementwise_grad(fp, 1)
fppp = elementwise_grad(fpp, 1)

eta = np.linspace(0, 6).reshape((-1, 1))

# This is the function we seek to minimize
def objective(params, step):
    # These should all be zero at the solution
    # f''' + 0.5 f'' f = 0
    zeq = fppp(params, eta) + 0.5 * f(params, eta) * fpp(params, eta) 
    bc0 = f(params, 0.0)  # equal to zero at solution
    bc1 = fp(params, 0.0)  # equal to zero at solution
    bc2 = fp(params, 6.0) - 1.0 # this is the one at "infinity"
    return np.mean(zeq**2) + bc0**2 + bc1**2 + bc2**2

def callback(params, step, g):
    if step % 1000 == 0:
        print("Iteration {0:3d} objective {1}".format(step,
                                                      objective(params, step)))

params = adam(grad(objective), params,
              step_size=0.001, num_iters=10000, callback=callback) 

print('f(0) = {}'.format(f(params, 0.0)))
print('fp(0) = {}'.format(fp(params, 0.0)))
print('fp(6) = {}'.format(fp(params, 6.0)))
print('fpp(0) = {}'.format(fpp(params, 0.0)))

import matplotlib.pyplot as plt
plt.plot(eta, f(params, eta))
plt.xlabel('$\eta$')
plt.ylabel('$f(\eta)$')
plt.xlim([0, 6])
plt.ylim([0, 4.5])
plt.savefig('nn-blasius.png')

Iteration 0 objective 1.11472535 Iteration 1000 objective 0.00049768 Iteration 2000 objective 0.0004579 Iteration 3000 objective 0.00041697 Iteration 4000 objective 0.00037408 Iteration 5000 objective 0.00033705 Iteration 6000 objective 0.00031016 Iteration 7000 objective 0.00029197 Iteration 8000 objective 0.00027585 Iteration 9000 objective 0.00024616 f(0) = -0.00014613 fp(0) = 0.0003518041251639459 fp(6) = 0.999518061473252 fpp(0) = 0.3263370503702663

I think it is worth discussing what we accomplished here. You can see we have arrived at an approximate solution to our differential equation and the boundary conditions. The boundary conditions seem pretty closely met, and the figure is approximately the same as the previous post. Even better, our solution is an actual function and not a numeric solution that has to be interpolated. We can evaluate it any where we want, including its derivatives!

We did not, however, have to convert the ODE into a system of first-order differential equations, and we did not have to approximate the derivatives with finite differences.

Note, however, that with finite differences we got f''(0)=0.3325. This site reports f''(0)=0.3321. We get close to that here with f''(0) = 0.3263. We could probably get closer to this with more training to reduce the objective function further, or with a finer grid. It is evident that even after 9000 steps, it is still decreasing. Getting accurate derivatives is a stringent test for this, as they are measures of the function curvature.

It is hard to tell how broadly useful this is; I have not tried it beyond this example. In the past, I have found solving BVPs to be pretty sensitive to the initial guesses of the solution. Here we made almost no guess at all, and still got a solution. I find that pretty remarkable.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

More auto-differentiation goodness for science and engineering

| categories: python, autograd | tags: | View Comments

In this post I continue my investigations in the use of auto-differentiation via autograd in scientific and mathematical programming. The main focus of today is using autograd to get derivatives that either have mathematical value, eg. accelerating root finding, or demonstrating mathematical rules, or scientific value, e.g. the derivative is related to a property, or illustrates some constraint.

All the code in this post relies on these imports:

import autograd.numpy as np
from autograd import grad, jacobian

In the following sections I explore some applications in calculus, root-finding, materials and thermodynamics.

1 Showing mixed partial derivatives are equal

In calculus, we know that if we have a well-behaved function \(f(x, y)\), then it should be true that \(\frac{\partial^2f}{\partial x \partial y} = \frac{\partial^2f}{\partial y \partial y}\).

Here we use autograd to compute the mixed partial derivatives and show for 10 random points that this statement is true. This doesnt' prove it for all points, of course, but it is easy to prove for any point of interest.

def f(x, y):
    return x * y**2

dfdx = grad(f)
d2fdxdy = grad(dfdx, 1)

dfdy = grad(f, 1)
d2fdydx = grad(dfdy)

x = np.random.rand(10)
y = np.random.rand(10)

T = [d2fdxdy(x1, y1) == d2fdydx(x1, y1) for x1, y1 in zip(x, y)]

print(np.all(T))

True

2 Root finding with jacobians

fsolve often works fine without access to derivatives. In this example from here, we solve a set of equations with two variables, and it takes 21 iterations to reach the solution.

from scipy.optimize import fsolve

def f(x):
    return np.array([x[1] - 3*x[0]*(x[0]+1)*(x[0]-1),
                     .25*x[0]**2 + x[1]**2 - 1])


ans, info, flag, msg = fsolve(f, (0.5, 0.5), full_output=1)
print(ans)
print(info['nfev'])

[ 1.117 0.83 ] 21

If we add the jacobian, we get the same result with only 15 iterations, about 1/3 fewer iterations. If the iterations are expensive, this might save a lot of time.

df = jacobian(f)
x0 = np.array([0.5, 0.5])

ans, info, flag, msg  = fsolve(f, x0, fprime=df, full_output=1)
print(ans)
print(info['nfev'])

[ 1.117 0.83 ] 15

There is a similar example provided by autograd.

3 Getting the pressure from a solid equation of state

In this post we described how to fit a solid equation of state to describe the energy of a solid under isotropic strain. Now, we can readily compute the pressure at a particular volume from the equation:

\(P = -\frac{dE}{dV}\)

We just need the derivative of this equation:

\(E = E_0+\frac{B_0 V}{B'_0}\left[\frac{(V_0/V)^{B'_0}}{B'_0-1}+1\right]-\frac{V_0 B_0}{B'_0-1}\)

or we use autograd to get it for us.

E0, B0, BP, V0 = -56.466,   0.49,    4.753,  16.573

def Murnaghan(vol):
    E = E0 + B0 * vol / BP * (((V0 / vol)**BP) / (BP - 1.0) + 1.0) - V0 * B0 / (BP - 1.)
    return E

def P(vol):
    dEdV = grad(Murnaghan)
    return -dEdV(vol) * 160.21773  # in Gpa

print(P(V0)) # Pressure at the minimum
print(P(0.99 * V0))  # Compressed
4.44693531998e-15
0.808167684691

So it takes positive pressure to compress the system, as expected, and at the minimum the pressure is equal to zero. Seems pretty clear autograd is better than deriving the required pressure derivative.

4 Deriving activity coefficients and demonstration of the Gibbs-Duhem relation

Thermodynamics tells us that in a binary mixture the following is true:

\(0 = x_1 \frac{d \ln \gamma_1}{dx_1} + (1 - x_1) \frac{d \ln \gamma_2}{dx_1}\)

In other words, the activity coefficients of the two species can't be independent.

Suppose we have the Margules model for the excess free energy:

\(G^{ex}/RT = n x_1 (1 - x_1) (A_{21} x_1 + A_{12} (1 - x_1))\)

where \(n = n_1 + n_2\), and \(x_1 = n1 / n\), and \(x_2 = n_2 / n\).

From this expression, we know:

\(\ln \gamma_1 = \frac{\partial G_{ex}/RT}{\partial n_1}\)

and

\(\ln \gamma_2 = \frac{\partial G_{ex}/RT}{\partial n_2}\)

It is also true that (the Gibbs-Duhem equation):

\(0 = x_1 \frac{d \ln \gamma_1}{d n_1} + x_2 \frac{d \ln \gamma_2}{d n_1}\)

Here we will use autograd to get these derivatives, and demonstrate the Gibbs-Duhem eqn holds for this excess Gibbs energy model.

A12, A21 = 2.04, 1.5461  # Acetone/water https://en.wikipedia.org/wiki/Margules_activity_model

def GexRT(n1, n2):
    n = n1 + n2
    x1 = n1 / n
    x2 = n2 / n
    return n * x1 * x2 * (A21 * x1 + A12 * x2)

lngamma1 = grad(GexRT)     # dGex/dn1
lngamma2 = grad(GexRT, 1)  # dGex/dn2

n1, n2 = 1.0, 2.0
n = n1 + n2
x1 = n1 / n
x2 = n2 / n

# Evaluate the activity coefficients
print('AD:         ',lngamma1(n1, n2), lngamma2(n1, n2))

# Compare that to these analytically derived activity coefficients
print('Analytical: ', (A12 + 2 * (A21 - A12) * x1) * x2**2, (A21 + 2 * (A12 - A21) * x2) * x1**2)

# Demonstration of the Gibbs-Duhem rule
dg1 = grad(lngamma1)
dg2 = grad(lngamma2)

n = 1.0 # Choose a basis number of moles
x1 = np.linspace(0, 1)
x2 = 1 - x1
n1 = n * x1
n2 = n - n1

GD = [_x1 * dg1(_n1, _n2) + _x2 * dg2(_n1, _n2)
      for _x1, _x2, _n1, _n2 in zip(x1, x2, n1, n2)]

print(np.allclose(GD, np.zeros(len(GD))))
('AD:         ', 0.76032592592592585, 0.24495925925925932)
('Analytical: ', 0.760325925925926, 0.24495925925925924)
True

That is pretty compelling. The autograd derivatives are much easier to code than the analytical solutions (which also had to be derived). You can also see that the Gibbs-Duhem equation is satisfied for this model, at least with a reasonable tolerance for the points we evaluated it at.

5 Summary

Today we examined four ways to use autograd in scientific or mathematical programs to replace the need to derive derivatives by hand. The main requirements for this to work are that you use the autograd.numpy module, and only the functions in it that are supported. It is possible to add your own functions (described in the tutorial) if needed. It seems like there are a lot of opportunities for scientific programming for autograd.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Timing Lennard-Jones implementations - ASE vs autograd

| categories: lennardjones, python, autograd | tags: | View Comments

Table of Contents

In a comment on this post Konrad Hinsen asked if the autograd forces on a Lennard-Jones potential would be useable in production. I wasn't sure, and was suspicious that analytical force functions would be faster. It turns out to not be so simple. In this post, I attempt to do some timing experiments for comparison. These are tricky to do right, and in a meaningful way, so I will start by explaining what is tricky and why I think the results are meaningful.

The ASE calculators cache their results, and return the cached results after the first run. We will do these on a 13-atom icosahedron cluster.

from ase.calculators.lj import LennardJones
from ase.cluster.icosahedron import Icosahedron

atoms = Icosahedron('Ar', noshells=2, latticeconstant=3)
atoms.set_calculator(LennardJones())

import time
t0 = time.time()
print('energy: ', atoms.get_potential_energy())
print(' time: ', time.time() - t0)
print()

t0 = time.time()
print('energy: ', atoms.get_potential_energy())
print(' time: ', time.time() - t0)
print()

atoms.calc.results = {}
t0 = time.time()
print('energy: ', atoms.get_potential_energy())
print('time: ', time.time() - t0)
energy:  -1.25741774649
 time:  0.0028629302978515625

energy:  -1.25741774649
 time:  0.00078582763671875

energy:  -1.25741774649
time:  0.0031850337982177734

Note the approximate order of magnitude reduction in elapsed time for the second call. After that, we reset the calculator results, and the time goes back up. So, we have to incorporate that into our timing. Also, in the ASE calculator, the forces are simultaneously calculated. There isn't a way to separate the calls. I am going to use the timeit feature in Ipython for the timing. I don't have a lot of control over what else is running on my machine, so I have observed some variability in the timing results. Finally, I am running these on a MacBook Air.

%%timeit
atoms.get_potential_energy()
atoms.calc.results = {} # this resets it so it runs each time. Otherwise, we get cached results

1.46 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

That seems like a surprisingly long time. If you neglect the calculator reset, it looks about 10 times faster because the cache lookup is fast!

Let's compare that to an implementation of the Lennard-Jones potential similar to the last time. This implementation differs from the first one I blogged about. This one is fully vectorized. It still does not support periodic boundary conditions though. This version may be up to 10 times faster than the previous version. I haven't tested this very well, I only assured it gives the same energy as ASE for the example in this post.

import autograd.numpy as np

def energy(positions):
    "Compute the energy of a Lennard-Jones system."
    
    sigma = 1.0
    epsilon = 1.0
    rc = 3 * sigma

    e0 = 4 * epsilon * ((sigma / rc)**12 - (sigma / rc)**6)
    
    natoms = len(positions)
    # These are the pairs of indices we need to compute distances for.
    a, b = np.triu_indices(natoms, 1)

    d = positions[a] - positions[b]
    r2 = np.sum(d**2, axis=1)
    c6 = np.where(r2 <= rc**2, (sigma**2 / r2)**3, np.zeros_like(r2))
    
    energy = -e0 * (c6 != 0.0).sum()
    c12 = c6**2
    energy += np.sum(4 * epsilon * (c12 - c6))
    
    return energy

# Just to check we get the same answer
print(energy(atoms.positions))

-1.25741774649

The energy looks good. For timing, we store the positions in a variable, in case there is any lookup time, since this function only needs an array.

pos = atoms.positions

There is no caching to worry about here, we can just get the timing.

%%timeit
energy(pos)

82.2 µs ± 2.85 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Wow, that is a lot faster than the ASE implementation. Score one for vectorization.

print('Vectorized is {0:1.1f} times faster than ASE at energy.'.format(1.46e-3 / 82.5e-6))
Vectorized is 17.7 times faster than ASE at energy.

Yep, a fully vectorized implementation is a lot faster than the ASE version which uses loops. So far the difference has nothing to do with autograd.

1 Timing on the forces

The forces are where derivatives are important, and it is a reasonable question of whether hand-coded derivatives are faster or slower than autograd derivatives. We first look at the forces from ASE. The analytical forces take about the same time as the energy, which is not surprising. The same work is done for both of them.

np.set_printoptions(precision=3, suppress=True)
print(atoms.get_forces())
[[-0.    -0.    -0.   ]
 [-0.296 -0.     0.183]
 [-0.296 -0.    -0.183]
 [ 0.296 -0.     0.183]
 [ 0.296 -0.    -0.183]
 [ 0.183 -0.296 -0.   ]
 [-0.183 -0.296  0.   ]
 [ 0.183  0.296 -0.   ]
 [-0.183  0.296  0.   ]
 [-0.     0.183 -0.296]
 [ 0.    -0.183 -0.296]
 [-0.     0.183  0.296]
 [ 0.    -0.183  0.296]]

%%timeit
atoms.get_forces()
atoms.calc.results = {}
1.22 ms ± 38.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Here is our auto-differentiated force function.

from autograd import elementwise_grad

def forces(pos):
    dEdR = elementwise_grad(energy)
    return -dEdR(pos)

Let's just check the forces for consistency.

print(forces(atoms.positions))

print(np.allclose(forces(atoms.positions), atoms.get_forces()))
[[-0.    -0.    -0.   ]
 [-0.296 -0.     0.183]
 [-0.296 -0.    -0.183]
 [ 0.296 -0.     0.183]
 [ 0.296 -0.    -0.183]
 [ 0.183 -0.296 -0.   ]
 [-0.183 -0.296  0.   ]
 [ 0.183  0.296 -0.   ]
 [-0.183  0.296  0.   ]
 [-0.     0.183 -0.296]
 [ 0.    -0.183 -0.296]
 [-0.     0.183  0.296]
 [ 0.    -0.183  0.296]]
True

Those all look the same, so now performance for that:

%%timeit 

forces(pos)
727 µs ± 47.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This is faster than the ASE version. I suspect that it is largely because of the faster, vectorized algorithm overall.

print('autograd is {0:1.1f} times faster than ASE on forces.'.format(1.22e-3 / 727e-6))
autograd is 1.7 times faster than ASE on forces.

autograd forces are consistently 2-6 times faster than the ASE implementation. It could be possible to hand-code a faster function for the forces, if it was fully vectorized. I spent a while seeing what would be required for that, and it is not obvious how to do that. Any solution that uses loops will be slower I think.

This doesn't directly answer the question of whether this can work in production. Everything is still written in Python here, which might limit the size and length of calculations you can practically do. With the right implementation though, it looks promising.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.2

Read and Post Comments

Next Page »