## 2018 in a nutshell for the Kitchin Research Group

| categories: news | tags: | View Comments

The majority of this year was spent finishing my sabbatical in the Google Accelerated Science Research group. I finished that up in August, and have returned to Pittsburgh now. I spent that time learning about differentiable programming and machine learning applications in science and engineering. It was a great learning year for me, and the beginning of some new research directions.

It wasn't all work, I was able to bike over 3000 miles while we lived in California, spent lots of weekends at beaches, state parks, San Francisco, and many other beautiful places. There is a lot I miss from my sabbatical, but I am mostly glad to be back home.

## 1 An all new research group

In 2017, the last of my students finished, and my group shrunk temporarily to zero for a semester. That luckily coincided with the beginning of my sabbatical, which allowed me to focus exclusively on starting new research directions. Towards the end of last year, three new PhD students joined my group. I did not take any new PhD students this year, but several new MS students joined the group at the end of 2018:

1. Siddhant Lambor (co-advised with Prof. Lynn Walker)
2. Gautham Swaminathan (co-advised with Prof. Lynn Walker)
3. Siddarth Achar
4. Senhong Liu
5. Dingqi Nai
6. Qiong Wang

These students will all work on some aspect of machine learning in formulation research, design of experiments, and molecular simulation. It should be an exciting year!

## 2 Publications

2018 was a light year on publications, largely due to my group shrinking to zero, and being on sabbatical. I wrote a nice perspective article on machine learning in catalysis:

kitchin-2018-machin-learn-catal
This perspective article describes how machine learning is being used in catalysis research and opportunities for further research.

And several publications from past M.S. students got finished and published:

thirumalai-2018-inves-react
This paper shows that many single atom alloys have unique electronic structure features that are responsible for their special catalytic properties.
We used DFT calculations to build a neural network potential to model adatom diffusion on a metal surface.
wang-2018-densit-funct
We used DFT calculations to build a neural network potential to model zirconia polymorphs, oxygen vacancy formation and diffusion.

This was technically published in 2017, but it was the most cited article in J. Phys.: Cond. Matt. in 2018!

larsen-2017-atomic-simul
This is a modern update on the Atomic Simulation Environment Python software. We have been using and contributing to this software for about 15 years now!

Citations on our past work continue to grow.

# Bibliography

• [kitchin-2018-machin-learn-catal] John Kitchin, Machine Learning in Catalysis, Nature Catalysis, 1(4), 230-232 (2018). link. doi.
• [thirumalai-2018-inves-react] Hari Thirumalai & John Kitchin, Investigating the Reactivity of Single Atom Alloys Using Density Functional Theory, Topics in Catalysis, 61(5-6), 462-474 (2018). link. doi.
• [gao-2018-model-pallad] Tianyu Gao & John Kitchin, Modeling Palladium Surfaces With Density Functional Theory, Neural Networks and Molecular Dynamics, Catalysis Today, 312, 132-140 (2018). link. doi.
• [wang-2018-densit-funct] Chen Wang, Akshay Tharval & John Kitchin, A Density Functional Theory Parameterised Neural Network Model of Zirconia, Molecular Simulation, 44(8), 623-630 (2018). link. doi.
• [larsen-2017-atomic-simul] Ask Hjorth Larsen, Jens J\orgen Mortensen, Jakob, Blomqvist, Ivano E Castelli, Rune Christensen, , Marcin Dułak, Jesper Friis, Michael N Groves, , Bj\ork Hammer, Cory Hargus, Eric D Hermes, Paul C, Jennings, Peter Bjerre Jensen, James Kermode, John, R Kitchin, Esben Leonhard Kolsbjerg, Joseph Kubal, , Kristen Kaasbjerg, Steen Lysgaard, J\'on Bergmann, Maronsson, Tristan Maxson, Thomas Olsen, Lars, Pastewka, Andrew Peterson, Carsten Rostgaard, Jakob, Schi\otz, Ole Sch\"utt, Mikkel Strange, Kristian, S Thygesen, Tejs Vegge, Lasse Vilhelmsen, Michael, Walter, Zhenhua Zeng & Karsten W Jacobsen, The Atomic Simulation Environment-A Python Library for Working With Atoms, Journal of Physics: Condensed Matter, 29(27), 273002 (2017). link.

## 3 New courses

On my return from my sabbatical, I taught a new course for me, 06-623 Mathematical modeling of chemical engineering processes. I taught this course in Python, and it was a tour of mathematical and scientific programming that started with "Hello world" 'Hello world' and ended with machine learning. We traveled through differential equations, nonlinear algebra, optimization, and regression along the way using numpy and scipy. It was a fun class, and I look forward to teaching it again next Fall. You can see the course notes at https://github.com/jkitchin/f18-06623. I ran this course using Jupyter notebooks (of course I wrote the notes in org-mode and used the jupyter notebook exporter I wrote to make these!) and Box.com. It worked, but wasn't my favorite. I will try to go back to Emacs+org-mode for this next year.

This coming spring will be another new course: our junior Transport Lab course.

## 4 Emacs, org-mode and scimax

org-ref has been downloaded close to 40K times! I thought it passed 40K early in December, but MELPA shows it with just under 40K today. They switched servers recently, so maybe some statistics were lost. If you count the Melpa-stable downloads, it is still over 40K. There are now over 50 contributors to org-ref besides me! That is pretty awesome, and hopefully speaks to the number of people interested in using Emacs for scientific publishing.

This fall I picked up scimax development again after my sabbatical break, and have a few notable improvements I will launch in 2019. These include:

• Many improvements to ipython in scimax:
• Code completion
• Inspection
• More jupyter-like features (?, ??) and key-bindings
• export to Jupyter notebooks
• src-block keymaps
• and more.
• A new editmarks package that will allow persistent comments, highlights, and track-change mode.
• Improved support for literate programming in org-mode including jump to the definition in org-mode.

Scimax is an increasingly important project to me, and in 2019 I am going to work on some ways that will make it easier for me to spend more time on it in the future.

## 5 Online activity

### 5.1 kitchingroup.cheme.cmu.edu

Since I was on sabbatical, it was a low volume blogging year with only 22 posts. Traffic to the blog was up nonetheless from the last year. I suspect I will blog more this year.

### 5.2 Github

I was even less active in 2018 than in 2017 on GitHub activity. You can see it picked back up this past fall as I returned to my day job as a professor. I expect 2019 will pick back up as usual.

Our YouTube traffic is down this year compared to last year. It is still always interesting to see the spikes in traffic on the org-mode is awesome video. Maybe it got mentioned on Hacker News or something. I only made one video last year; I took a break while on sabbatical, and was busy this fall with a new course. Maybe 2019 will be a better year for that. I have some plans for new videos in the new year on ipython, and some updates in scimax.

We did cross 1000 subscribers this year. That doesn't qualify my channel for monetization yet, you also need 4000 watch hours in the past year. Last year we only had 1466 hours, so not that close yet. Why am I interested in this? I am actively looking for ways to support scimax development, and this could be one way to do that.

It wasn't super easy to get all the Twitter data, I had to manually download the information from each month. Now that I have it, I did some analysis, so here it is. First we look at how many tweets, likes, retweets, etc. there were last year:

import csv
import glob

tweets = 0
impressions = 0
texts = []
times = [] # times of the tweets
likes = 0
retweets = 0
replies = 0

for csvfile in glob.glob('*.csv'):
with open(csvfile) as f:
tweets += 1
impressions += float(row['impressions'])
texts += [row['Tweet text']]
times += [row['time']]
likes += float(row['likes'])
replies += float(row['replies'])
retweets += float(row['retweets'])

print(f'''{tweets} Tweets with {int(impressions)} impressions.
There were {int(likes)} likes, {int(retweets)} retweets, and {int(replies)} replies.''')


471 Tweets with 282655 impressions. There were 1089 likes, 220 retweets, and 341 replies.

Next, we look at the time distribution of these tweets. It seems like this should be easier to do (it probably is in Pandas).

import datetime
import numpy as np

x = np.array([datetime.datetime.strptime(time[0:-6], "%Y-%m-%d %H:%M")
for time in times])

months = np.zeros(12)
for time in x:
months[time.month - 1] += 1

plt.bar(np.arange(12), months)
plt.xticks(np.arange(12), ['January', 'February', 'March',
'April', 'May', 'June',
'July', 'August', 'September',
'October', 'November', 'December'],
rotation=45);
plt.ylabel('Tweet count')


## 6 Summary and outlook

2018 was a pretty good year. I took a break from several things I have spent a lot of time in the past and learned some new things I spend my time on now. I am looking forward to getting back to some of the old things, especially scimax development. I still think it is a key component of modern scientific research and publishing, and that it has an important role in education. Stay tuned!

org-mode source

Org-mode version = 9.1.14

## Line integrals in Python with autograd

| categories: | tags: | View Comments

A line integral is an integral of a function along a curve in space. We usually represent the curve by a parametric equation, e.g. $$\mathbf{r}(t) = [x(t), y(t), z(t)] = x(t)\mathbf{i} + y(t)\mathbf{j} + z(t)\mathbf{k}$$. So, in general the curve will be a vector function, and the function we want to integrate will also be a vector function.

Then, we can write the line integral definition as:

$$\int_C \mathbf{F(r)}\cdot d\mathbf{r} = \int_a^b \mathbf{F}({\mathbf{r}(t)) \cdot \mathbf{r'}(t) dt$$ where $$\mathbf{r'}(t) = \frac{d\mathbf{r}}{dt}$$. This integrand is a scalar function, because of the dot product.

The following examples are adapted from Chapter 10 in Advanced Engineering Mathematics by Kreysig.

The first example is the evaluation of a line integral in the plane. We want to evaluate the integral of $$\mathbf{F(r)}=[-y, -xy]$$ on the curve $$\mathbf{r(t)}=[-sin(t), cos(t)]$$ from t=0 to t = π/2. The answer in the book is given as 0.4521. Here we evaluate this numerically, using autograd for the relevant derivative. Since the curve has multiple outputs, we have to use the jacobian function to get the derivatives. After that, it is a simple bit of matrix multiplication, and a call to the quad function.

import autograd.numpy as np

def F(X):
x, y = X
return -y, -x * y

def r(t):
return np.array([-np.sin(t), np.cos(t)])

drdt = jacobian(r)

def integrand(t):
return F(r(t)) @ drdt(t)

I, e = quad(integrand, 0.0, np.pi / 2)
print(f'The integral is {I:1.4f}.')

The integral is 0.4521.



We get the same result as the analytical solution.

The next example is in three dimensions. Find the line integral along $$\mathbf{r}(t)=[cos(t), sin(t), 3t]$$ of the function $$\mathbf{F(r)}=[z, x, y]$$ from t=0 to t=2 π. The solution is given as 21.99.

import autograd.numpy as np

def F(X):
x, y, z = X
return [z, x, y]

def C(t):
return np.array([np.cos(t), np.sin(t), 3 * t])

dCdt = jacobian(C, 0)

def integrand(t):
return F(C(t)) @ dCdt(t)

I, e = quad(integrand, 0, 2 * np.pi)
print(f'The integral is {I:1.2f}.')

The integral is 21.99.



That is also the same as the analytical solution. Note the real analytical solution was 7 π, which is nearly equivalent to our answer.

7 * np.pi - I

3.552713678800501e-15



As a final example, we consider an alternate form of the line integral. In this form we do not use a dot product, so the integral results in a vector. This doesn't require anything from autograd, but does require us to be somewhat clever in how to do the integrals since quad can only integrate scalar functions. We need to integrate each component of the integrand independently. Here is one approach where we use lambda functions for each component. You could also manually separate the components.

def F(r):
x, y, z = r
return x * y, y * z, z

def r(t):
return np.array([np.cos(t), np.sin(t), 3 * t])

def integrand(t):
return F(r(t))

[quad(lambda t: integrand(t)[i], 0, 2 * np.pi)[0] for i in [0, 1, 2]]

[-6.9054847581172525e-18, -18.849555921538755, 59.21762640653615]



The analytical solution in this case was given as:

[0, -6 * np.pi, 6 * np.pi**2]

[0, -18.84955592153876, 59.21762640653615]



which is evidently the same as our numerical solution.

Maybe an alternative, but more verbose is this vectorized integrate function. We still make temporary functions for integrating, and the vectorization is essentially like the list comprehension above, but we avoid the lambda functions.

@np.vectorize
def integrate(i):
def integrand(t):
return F(r(t))[i]
I, e = quad(integrand, 0, 2 * np.pi)
return I

integrate([0, 1, 2])

array([ -6.90548476e-18,  -1.88495559e+01,   5.92176264e+01])



## 1 Summary

Once again, autograd provides a convenient way to compute function jacobians which make it easy to evaluate line integrals in Python.

org-mode source

Org-mode version = 9.1.14

## Using autograd for error propagation

| categories: | tags: | View Comments

Back in 2013 I wrote about using the uncertainties package to propagate uncertainties. The problem setup was for finding the uncertainty in the exit concentration from a CSTR when there are uncertainties in the other parameters. In this problem we were given this information about the parameters and their uncertainties.

Parameter value σ
Fa0 5 0.05
v0 10 0.1
V 66000 100
k 3 0.2

The exit concentration is found by solving this equation:

$$0 = Fa0 - v0 * Ca - k * Ca**2 * V$$

So the question was what is Ca, and what is the uncertainty on it? Finding Ca is easy with fsolve.

from scipy.optimize import fsolve

Fa0 = 5.0
v0 = 10.0

V = 66000.0
k = 3.0

def func(Ca, v0, k, Fa0, V):
"Mole balance for a CSTR. Solve this equation for func(Ca)=0"
Fa = v0 * Ca     # exit molar flow of A
ra = -k * Ca**2  # rate of reaction of A L/mol/h
return Fa0 - Fa + V * ra

Ca, = fsolve(func, 0.1 * Fa0 / v0, args=(v0, k, Fa0, V))
Ca

0.0050000000000000001



The uncertainty on Ca is a little trickier. A simplified way to estimate it is:

$$\sigma_{Ca} = \sqrt{(dCa/dv0)^2 \sigma_{v0}^2 + (dCa/dv0)^2 \sigma_{v0}^2 + (dCa/dFa0)^2 \sigma_{Fa0}^2 + (dCa/dV)^2 \sigma_{V}^2}$$

We know the σ_i for each input, we just need those partial derivatives. However, we only have the implicit function we used to solve for Ca, and I do not want to do the algebra to solve for Ca. Luckily, we previously worked out how to get these derivatives from an implicit function using autograd. We just need to loop through the arguments, get the relevant derivatives, and accumulate the product of the squared derivatives and errors. Finally, take the square root of that sum.

import autograd.numpy as np

# these are the uncertainties on the inputs
s = [None, 0.1, 0.2, 0.05, 100]

S2 = 0.0

for i in range(1, 5):
dCadarg2 = -dfdarg2(Ca, v0, k, Fa0, V) / dfdCa(Ca, v0, k, Fa0, V)

Ca_s = np.sqrt(S2)
print(f'Ca = {Ca:1.5f} +\- {Ca_s}')

Ca = 0.00500 +\- 0.00016776432898276802



That is the same uncertainty estimate the quantities package provided. One benefit here is I did not have to do the somewhat complicated wrapping procedure around fsolve that was required with uncertainties to get this. On the other hand, I did have to derive the formula and implement them. It worked fine here, since we have an implicit function and a way to get the required derivatives. It could take some work to do this with the exit concentration of a PFR, which requires an integrator. Maybe that differentiable integrator will come in handy again!

org-mode source

Org-mode version = 9.1.14

## Constrained optimization with Lagrange multipliers and autograd

| categories: | tags: | View Comments

Constrained optimization is common in engineering problems solving. A prototypical example (from Greenberg, Advanced Engineering Mathematics, Ch 13.7) is to find the point on a plane that is closest to the origin. The plane is defined by the equation $$2x - y + z = 3$$, and we seek to minimize $$x^2 + y^2 + z^2$$ subject to the equality constraint defined by the plane. scipy.optimize.minimize provides a pretty convenient interface to solve a problem like this, ans shown here.

import numpy as np
from scipy.optimize import minimize

def objective(X):
x, y, z = X
return x**2 + y**2 + z**2

def eq(X):
x, y, z = X
return 2 * x - y + z - 3

sol = minimize(objective, [1, -0.5, 0.5], constraints={'type': 'eq', 'fun': eq})
sol

    fun: 1.5
jac: array([ 2.00000001, -0.99999999,  1.00000001])
message: 'Optimization terminated successfully.'
nfev: 5
nit: 1
njev: 1
status: 0
success: True
x: array([ 1. , -0.5,  0.5])



I like the minimize function a lot, although I am not crazy for how the constraints are provided. The alternative used to be that there was an argument for equality constraints and another for inequality constraints. Analogous to scipy.integrate.solve_ivp event functions, they could have also used function attributes.

Sometimes, it might be desirable to go back to basics though, especially if you are unaware of the minimize function or perhaps suspect it is not working right and want an independent answer. Next we look at how to construct this constrained optimization problem using Lagrange multipliers. This converts the problem into an augmented unconstrained optimization problem we can use fsolve on. The gist of this method is we formulate a new problem:

$$F(X) = f(X) - \lambda g(X)$$

and then solve the simultaneous resulting equations:

$$F_x(X) = F_y(X) = F_z(X) = g(X) = 0$$ where $$F_x$$ is the derivative of $$f*$$ with respect to $$x$$, and $$g(X)$$ is the equality constraint written so it is equal to zero. Since we end up with four equations that equal zero, we can simply use fsolve to get the solution. Many years ago I used a finite difference approximation to the derivatives. Today we use autograd to get the desired derivatives. Here it is.

import autograd.numpy as np

def F(L):
'Augmented Lagrange function'
x, y, z, _lambda = L
return objective([x, y, z]) - _lambda * eq([x, y, z])

# Gradients of the Lagrange function

# Find L that returns all zeros in this function.
def obj(L):
x, y, z, _lambda = L
dFdx, dFdy, dFdz, dFdlam = dfdL(L)
return [dFdx, dFdy, dFdz, eq([x, y, z])]

from scipy.optimize import fsolve
x, y, z, _lam = fsolve(obj, [0.0, 0.0, 0.0, 1.0])
print(f'The answer is at {x, y, z}')

The answer is at (1.0, -0.5, 0.5)



That is the same answer as before. Note we have still relied on some black box solver inside of fsolve (instead of inside minimize), but it might be more clear what problem we are solving (e.g. finding zeros). It takes a bit more work to set this up, since we have to construct the augmented function, but autograd makes it pretty convenient to set up the final objective function we want to solve.

How do we know we are at a minimum? We can check that the Hessian is positive definite in the original function we wanted to minimize. You can see here the array is positive definite, e.g. all the eigenvalues are positive. autograd makes this easy too.

from autograd import hessian
h = hessian(objective, 0)
h(np.array([x, y, z]))

array([[ 2.,  0.,  0.],
[ 0.,  2.,  0.],
[ 0.,  0.,  2.]])



In case it isn't evident from that structure that the eigenvalues are all positive, here we compute them:

np.linalg.eig(h(np.array([x, y, z])))[0]

array([ 2.,  2.,  2.])



In summary, autograd continues to enable advanced engineering problems to be solved.

org-mode source

Org-mode version = 9.1.14

## Solving coupled ODEs with a neural network and autograd

| categories: | tags: | View Comments

In a previous post I wrote about using ideas from machine learning to solve an ordinary differential equation using a neural network for the solution. A friend recently tried to apply that idea to coupled ordinary differential equations, without success. It seems like that should work, so here we diagnose the issue and figure it out. This is a long post, but it works in the end.

In the classic series reaction $$A \rightarrow B \rightarrow C$$ in a batch reactor, we get the set of coupled mole balances:

$$dC_A/dt = -k_1 C_A$$

$$dC_B/dt = k_1 C_A - k_2 C_B$$

$$dC_C/dt = k2 C_B$$

## 1 The standard numerical solution

Here is the standard numerical solution to this problem. This will give us a reference for what the solution should look like.

from scipy.integrate import solve_ivp

def ode(t, C):
Ca, Cb, Cc = C
dCbdt = k1 * Ca - k2 * Cb
dCcdt = k2 * Cb

C0 = [1.0, 0.0, 0.0]
k1 = 1
k2 = 1

sol = solve_ivp(ode, (0, 10), C0)

%matplotlib inline
import matplotlib.pyplot as plt

plt.plot(sol.t, sol.y.T)
plt.legend(['A', 'B', 'C'])
plt.xlabel('Time')
plt.ylabel('C')


## 2 Can a neural network learn the solution?

The first thing I want to show is that you can train a neural network to reproduce this solution. That is certainly a prerequisite to the idea working. We use the same code I used before, but this time our neural network will output three values, one for each concentration.

import autograd.numpy as np

def init_random_params(scale, layer_sizes, rs=npr.RandomState(0)):
"""Build a list of (weights, biases) tuples, one for each layer."""
return [(rs.randn(insize, outsize) * scale,   # weight matrix
rs.randn(outsize) * scale)           # bias vector
for insize, outsize in zip(layer_sizes[:-1], layer_sizes[1:])]

def swish(x):
"see https://arxiv.org/pdf/1710.05941.pdf"
return x / (1.0 + np.exp(-x))

def C(params, inputs):
"Neural network functions"
for W, b in params:
outputs = np.dot(inputs, W) + b
inputs = swish(outputs)
return outputs

# initial guess for the weights and biases
params = init_random_params(0.1, layer_sizes=[1, 8, 3])


Now, we train our network to reproduce the solution. I ran this block manually a bunch of times, but eventually you see that we can train a one layer network with 8 nodes to output all three concentrations pretty accurately. So, there is no issue there, a neural network can represent the solution.

def objective_soln(params, step):
return np.sum((sol.y.T - C(params, sol.t.reshape([-1, 1])))**2)

step_size=0.001, num_iters=500)

plt.plot(sol.t.reshape([-1, 1]), C(params, sol.t.reshape([-1, 1])),
sol.t, sol.y.T, 'o')
plt.legend(['A', 'B', 'C', 'Ann', 'Bnn', 'Cnn'])
plt.xlabel('Time')
plt.ylabel('C')


## 3 Given a neural network function how do we get the right derivatives?

The next issue is how do we get the relevant derivatives. The solution method I developed here relies on using optimization to find a set of weights that produces a neural network whose derivatives are consistent with the ODE equations. So, we need to be able to get the derivatives that are relevant in the equations.

The neural network outputs three concentrations, and we need the time derivatives of them. Autograd provides three options: grad, elementwise_grad and jacobian. We cannot use grad because our function is not scalar. We cannot use elementwise_grad because that will give the wrong shape (I think it may be the sum of the gradients). That leaves us with the jacobian. This, however, gives an initially unintuitive (i.e. it isn't what we need out of the box) result. The output is 4-dimensional in this case, consistent with the documentation of that function.

jacC = jacobian(C, 1)
jacC(params, sol.t.reshape([-1, 1])).shape

(17, 3, 17, 1)



Why does it have this shape? Our time input vector we used has 17 time values, in a column vector. That leads to an output from the NN with a shape of (17, 3), i.e. the concentrations of each species at each time. The jacobian will output an array of shape (17, 3, 17, 1), and we have to extract the pieces we want from that. The first and third dimensions are related to the time steps. The second dimension is the species, and the last dimension is nothing here, but is there because the input is in a column. I use some fancy indexing on the array to get the desired arrays of the derivatives. This is not obvious out of the box. I only figured this out by direct comparison of the data from a numerical solution and the output of the jacobian. Here I show how to do that, and make sure that the derivatives we pull out are comparable to the derivatives defined by the ODEs above. Parity here means they are comparable.

i = np.arange(len(sol.t))
plt.plot(jacC(params, sol.t.reshape([-1, 1]))[i, 0, i, 0],   -k1 * sol.y[0], 'ro')
plt.plot(jacC(params, sol.t.reshape([-1, 1]))[i, 1, i, 0],   -k2 * sol.y[1] + k1 * sol.y[0], 'bo')
plt.plot(jacC(params, sol.t.reshape([-1, 1]))[i, 2, i, 0],   k2 * sol.y[1], 'go')

[<matplotlib.lines.Line2D at 0x118a2e860>]



Note this is pretty inefficient. It requires a lot of calculations (the jacobian here has print(17*3*17) 867 elements) to create the jacobian, and we don't need most of them. You could avoid this by creating separate neural networks for each species, and then just use elementwise_grad on each one. Alternatively, one might be able to more efficiently compute some vector-jacobian product. Nevertheless, it looks like we can get the correct derivatives out of the neural network, we just need a convenient function to return them. Here is one such function for this problem, using a fancier slicing and reshaping to get the derivative array.

# Derivatives
jac = jacobian(C, 1)

def dCdt(params, t):
i = np.arange(len(t))
return jac(params, t)[i, :, i].reshape((len(t), 3))


## 4 Solving the system of ODEs with a neural network

Finally, we are ready to try solving the ODEs solely by the neural network approach. We reinitialize the neural network first, and define a time grid to solve it on.

t = np.linspace(0, 10, 25).reshape((-1, 1))
params = init_random_params(0.1, layer_sizes=[1, 8, 3])
i = 0    # number of training steps
N = 501  # epochs for training
et = 0.0 # total elapsed time


We define our objective function. This function will be zero at the perfect solution, and has contributions for each mole balance and the initial conditions. It could make sense to put additional penalties for things like negative concentrations, or the sum of concentrations is a constant, but we do not do that here, and it does not seem to be necessary.

def objective(params, step):
Ca, Cb, Cc = C(params, t).T
dCadt, dCbdt, dCcdt = dCdt(params, t).T

z1 = np.sum((dCadt + k1 * Ca)**2)
z2 = np.sum((dCbdt - k1 * Ca + k2 * Cb)**2)
z3 = np.sum((dCcdt - k2 * Cb)**2)
ic = np.sum((np.array([Ca[0], Cb[0], Cc[0]]) - C0)**2)  # initial conditions
return z1 + z2 + z3 + ic

def callback(params, step, g):
if step % 100 == 0:
print("Iteration {0:3d} objective {1}".format(step,
objective(params, step)))

objective(params, 0)  # make sure the objective is scalar

5.2502237371050295



Finally, we run the optimization. I also manually ran this block several times.

import time
t0 = time.time()

step_size=0.001, num_iters=N, callback=callback)

i += N
t1 = (time.time() - t0) / 60
et += t1

plt.plot(t, C(params, t), sol.t, sol.y.T, 'o')
plt.legend(['Ann', 'Bnn', 'Cnn', 'A', 'B', 'C'])
plt.xlabel('Time')
plt.ylabel('C')
print(f'{t1:1.1f} minutes elapsed this time. Total time = {et:1.2f} min. Total epochs = {i}.')

Iteration   0 objective 0.00047651643957525214
Iteration 100 objective 0.0004473301532609342
Iteration 200 objective 0.00041218410058863227
Iteration 300 objective 0.00037161526137030344
Iteration 400 objective 0.000327567400443358
Iteration 500 objective 0.0002836975879675981
0.6 minutes elapsed this time. Total time = 4.05 min. Total epochs = 3006.



The effort seems to have been worth it though, we get a pretty good solution from our neural network.

We can check the accuracy of the derivatives by noting the sum of the derivatives in this case should be zero. Here you can see that the sum is pretty small. It would take additional optimization to a lower error to get this to be smaller.

plt.plot(t, np.sum(dCdt(params, t), axis=1))
plt.xlabel('Time')
plt.ylabel(r'$\Sigma dC/dt$')


## 5 Summary

In the end, this method is illustrated to work for systems of ODEs also. There is some subtlety in how to get the relevant derivatives from the jacobian, but after that, it is essentially the same. I think it would be much faster to do this with separate neural networks for each function in the solution because then you do not need the jacobian, you can use elementwise_grad.

This is not faster than direct numerical integration. One benefit to this solution over a numerical solution is we get an actual continuous function as the solution, rather than an array of data. This solution is not reliable at longer times, but then again neither is extrapolation of numeric data. It could be interesting to explore if this has any benefits for stiff equations. Maybe another day. For now, I am declaring victory for autograd on this problem.