<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <atom:link href="http://kitchingroup.cheme.cmu.edu/blog/feed/index.xml" rel="self" type="application/rss+xml" />
    <title>The Kitchin Research Group</title>
    <link>https://kitchingroup.cheme.cmu.edu/blog</link>
    <description>Chemical Engineering at Carnegie Mellon University</description>
    <pubDate>Sat, 01 Nov 2025 13:47:46 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    
    <item>
      <title>Timing Lennard-Jones implementations - ASE vs autograd</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2017/11/20/Timing-Lennard-Jones-implementations-ASE-vs-autograd</link>
      <pubDate>Mon, 20 Nov 2017 21:19:17 EST</pubDate>
      <category><![CDATA[lennardjones]]></category>
      <category><![CDATA[autograd]]></category>
      <category><![CDATA[python]]></category>
      <guid isPermaLink="false">nSG47QhTDNSZivajfbKAXC-gvIU=</guid>
      <description>Timing Lennard-Jones implementations - ASE vs autograd</description>
      <content:encoded><![CDATA[


&lt;div id="table-of-contents"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;div id="text-table-of-contents"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#orgb9c2faf"&gt;1. Timing on the forces&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;
In a comment on this &lt;a href="http://kitchingroup.cheme.cmu.edu/blog/2017/11/14/Forces-by-automatic-differentiation-in-molecular-simulation/"&gt;post&lt;/a&gt; Konrad Hinsen asked if the autograd forces on a Lennard-Jones potential would be useable in production. I wasn't sure, and was suspicious that analytical force functions would be faster. It turns out to not be so simple. In this post, I attempt to do some timing experiments for comparison. These are tricky to do right, and in a meaningful way, so I will start by explaining what is tricky and why I think the results are meaningful. 
&lt;/p&gt;

&lt;p&gt;
The ASE calculators cache their results, and return the cached results after the first run. We will do these on a 13-atom icosahedron cluster.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="orgb9b1952"&gt;&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; ase.calculators.lj &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; LennardJones
&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; ase.cluster.icosahedron &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; Icosahedron

&lt;span style="color: #BA36A5;"&gt;atoms&lt;/span&gt; = Icosahedron(&lt;span style="color: #008000;"&gt;'Ar'&lt;/span&gt;, noshells=2, latticeconstant=3)
atoms.set_calculator(LennardJones())

&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; time
&lt;span style="color: #BA36A5;"&gt;t0&lt;/span&gt; = time.time()
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'energy: '&lt;/span&gt;, atoms.get_potential_energy())
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;' time: '&lt;/span&gt;, time.time() - t0)
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;()

&lt;span style="color: #BA36A5;"&gt;t0&lt;/span&gt; = time.time()
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'energy: '&lt;/span&gt;, atoms.get_potential_energy())
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;' time: '&lt;/span&gt;, time.time() - t0)
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;()

&lt;span style="color: #BA36A5;"&gt;atoms.calc.results&lt;/span&gt; = {}
&lt;span style="color: #BA36A5;"&gt;t0&lt;/span&gt; = time.time()
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'energy: '&lt;/span&gt;, atoms.get_potential_energy())
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'time: '&lt;/span&gt;, time.time() - t0)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
energy:  -1.25741774649
 time:  0.0028629302978515625

energy:  -1.25741774649
 time:  0.00078582763671875

energy:  -1.25741774649
time:  0.0031850337982177734

&lt;/pre&gt;

&lt;p&gt;
Note the approximate order of magnitude reduction in elapsed time for the second call. After that, we reset the calculator results, and the time goes back up. So, we have to incorporate that into our timing. Also, in the ASE calculator, the forces are simultaneously calculated. There isn't a way to separate the calls. I am going to use the timeit feature in Ipython for the timing. I don't have a lot of control over what else is running on my machine, so I have observed some variability in the timing results. Finally, I am running these on a MacBook Air.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org23d2e9b"&gt;%%timeit
atoms.get_potential_energy()
&lt;span style="color: #BA36A5;"&gt;atoms.calc.results&lt;/span&gt; = {} &lt;span style="color: #8D8D84;"&gt;# &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;this resets it so it runs each time. Otherwise, we get cached results&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
1.46 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
&lt;/p&gt;

&lt;p&gt;
That seems like a surprisingly long time. If you neglect the calculator reset, it looks about 10 times faster because the cache lookup is fast!
&lt;/p&gt;

&lt;p&gt;
Let's compare that to an implementation of the Lennard-Jones potential similar to the last time. This implementation differs from &lt;a href="http://kitchingroup.cheme.cmu.edu/blog/2017/11/14/Forces-by-automatic-differentiation-in-molecular-simulation/"&gt;the first one I blogged about&lt;/a&gt;. This one is fully vectorized. It still does not support periodic boundary conditions though. This version may be up to 10 times faster than the previous version. I haven't tested this very well, I only assured it gives the same energy as ASE for the example in this post.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org2d00a75"&gt;&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; autograd.numpy &lt;span style="color: #0000FF;"&gt;as&lt;/span&gt; np

&lt;span style="color: #0000FF;"&gt;def&lt;/span&gt; &lt;span style="color: #006699;"&gt;energy&lt;/span&gt;(positions):
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #036A07;"&gt;"Compute the energy of a Lennard-Jones system."&lt;/span&gt;
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;sigma&lt;/span&gt; = 1.0
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;epsilon&lt;/span&gt; = 1.0
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;rc&lt;/span&gt; = 3 * sigma

&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;e0&lt;/span&gt; = 4 * epsilon * ((sigma / rc)**12 - (sigma / rc)**6)
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;natoms&lt;/span&gt; = &lt;span style="color: #006FE0;"&gt;len&lt;/span&gt;(positions)
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #8D8D84;"&gt;# &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;These are the pairs of indices we need to compute distances for.&lt;/span&gt;
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;a&lt;/span&gt;, &lt;span style="color: #BA36A5;"&gt;b&lt;/span&gt; = np.triu_indices(natoms, 1)

&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;d&lt;/span&gt; = positions[a] - positions[b]
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;r2&lt;/span&gt; = np.&lt;span style="color: #006FE0;"&gt;sum&lt;/span&gt;(d**2, axis=1)
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;c6&lt;/span&gt; = np.where(r2 &amp;lt;= rc**2, (sigma**2 / r2)**3, np.zeros_like(r2))
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;energy&lt;/span&gt; = -e0 * (c6 != 0.0).&lt;span style="color: #006FE0;"&gt;sum&lt;/span&gt;()
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;c12&lt;/span&gt; = c6**2
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;energy&lt;/span&gt; += np.&lt;span style="color: #006FE0;"&gt;sum&lt;/span&gt;(4 * epsilon * (c12 - c6))
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #0000FF;"&gt;return&lt;/span&gt; energy

&lt;span style="color: #8D8D84;"&gt;# &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;Just to check we get the same answer&lt;/span&gt;
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(energy(atoms.positions))
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
-1.25741774649
&lt;/p&gt;

&lt;p&gt;
The energy looks good. For timing, we store the positions in a variable, in case there is any lookup time, since this function only needs an array.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org7312f34"&gt;&lt;span style="color: #BA36A5;"&gt;pos&lt;/span&gt; = atoms.positions
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;

&lt;/p&gt;

&lt;p&gt;
There is no caching to worry about here, we can just get the timing.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org38e167b"&gt;%%timeit
energy(pos)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
82.2 µs ± 2.85 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
&lt;/p&gt;

&lt;p&gt;
Wow, that is a lot faster than the ASE implementation. Score one for vectorization.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org9087076"&gt;&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'Vectorized is {0:1.1f} times faster than ASE at energy.'&lt;/span&gt;.&lt;span style="color: #006FE0;"&gt;format&lt;/span&gt;(1.46e-3 / 82.5e-6))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
Vectorized is 17.7 times faster than ASE at energy.

&lt;/pre&gt;

&lt;p&gt;
Yep, a fully vectorized implementation is a lot faster than the ASE version which uses loops. So far the difference has nothing to do with autograd.
&lt;/p&gt;

&lt;div id="outline-container-orgb9c2faf" class="outline-2"&gt;
&lt;h2 id="orgb9c2faf"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; Timing on the forces&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
The forces are where derivatives are important, and it is a reasonable question of whether hand-coded derivatives are faster or slower than autograd derivatives. We first look at the forces from ASE. The analytical forces take about the same time as the energy, which is not surprising. The same work is done for both of them.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org689833d"&gt;np.set_printoptions(precision=3, suppress=&lt;span style="color: #D0372D;"&gt;True&lt;/span&gt;)
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(atoms.get_forces())
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[[-0.    -0.    -0.   ]
 [-0.296 -0.     0.183]
 [-0.296 -0.    -0.183]
 [ 0.296 -0.     0.183]
 [ 0.296 -0.    -0.183]
 [ 0.183 -0.296 -0.   ]
 [-0.183 -0.296  0.   ]
 [ 0.183  0.296 -0.   ]
 [-0.183  0.296  0.   ]
 [-0.     0.183 -0.296]
 [ 0.    -0.183 -0.296]
 [-0.     0.183  0.296]
 [ 0.    -0.183  0.296]]

&lt;/pre&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="orgcaaa884"&gt;%%timeit
atoms.get_forces()
&lt;span style="color: #BA36A5;"&gt;atoms.calc.results&lt;/span&gt; = {}
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
1.22 ms ± 38.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

&lt;/pre&gt;

&lt;p&gt;
Here is our auto-differentiated force function.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org767b1e9"&gt;&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; autograd &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; elementwise_grad

&lt;span style="color: #0000FF;"&gt;def&lt;/span&gt; &lt;span style="color: #006699;"&gt;forces&lt;/span&gt;(pos):
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;dEdR&lt;/span&gt; = elementwise_grad(energy)
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #0000FF;"&gt;return&lt;/span&gt; -dEdR(pos)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Let's just check the forces for consistency.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org777df8d"&gt;&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(forces(atoms.positions))

&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(np.allclose(forces(atoms.positions), atoms.get_forces()))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[[-0.    -0.    -0.   ]
 [-0.296 -0.     0.183]
 [-0.296 -0.    -0.183]
 [ 0.296 -0.     0.183]
 [ 0.296 -0.    -0.183]
 [ 0.183 -0.296 -0.   ]
 [-0.183 -0.296  0.   ]
 [ 0.183  0.296 -0.   ]
 [-0.183  0.296  0.   ]
 [-0.     0.183 -0.296]
 [ 0.    -0.183 -0.296]
 [-0.     0.183  0.296]
 [ 0.    -0.183  0.296]]
True

&lt;/pre&gt;

&lt;p&gt;
Those all look the same, so now performance for that:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org90b4b07"&gt;%%timeit 

forces(pos)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
727 µs ± 47.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

&lt;/pre&gt;

&lt;p&gt;
This is faster than the ASE version. I suspect that it is largely because of the faster, vectorized algorithm overall. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ipython" id="org7de1cf3"&gt;&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'autograd is {0:1.1f} times faster than ASE on forces.'&lt;/span&gt;.&lt;span style="color: #006FE0;"&gt;format&lt;/span&gt;(1.22e-3 / 727e-6))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
autograd is 1.7 times faster than ASE on forces.

&lt;/pre&gt;

&lt;p&gt;
autograd forces are consistently 2-6 times faster than the ASE implementation. It could be possible to hand-code a faster function for the forces, if it was fully vectorized. I spent a while seeing what would be required for that, and it is not obvious how to do that. Any solution that uses loops will be slower I think.
&lt;/p&gt;

&lt;p&gt;
This doesn't directly answer the question of whether this can work in production. Everything is still written in Python here, which might limit the size and length of calculations you can practically do. With the right implementation though, it looks promising.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2017 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;
&lt;p&gt;&lt;a href="/org/2017/11/20/Timing-Lennard-Jones-implementations---ASE-vs-autograd.org"&gt;org-mode source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Org-mode version = 9.1.2&lt;/p&gt;]]></content:encoded>
    </item>
  </channel>
</rss>
