<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <atom:link href="http://kitchingroup.cheme.cmu.edu/blog/feed/index.xml" rel="self" type="application/rss+xml" />
    <title>The Kitchin Research Group</title>
    <link>https://kitchingroup.cheme.cmu.edu/blog</link>
    <description>Chemical Engineering at Carnegie Mellon University</description>
    <pubDate>Sat, 01 Nov 2025 13:47:46 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    
    <item>
      <title>Generating an atomic stoichiometric matrix</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2014/09/23/Generating-an-atomic-stoichiometric-matrix</link>
      <pubDate>Tue, 23 Sep 2014 14:25:36 EDT</pubDate>
      <category><![CDATA[thermodynamics]]></category>
      <category><![CDATA[python]]></category>
      <guid isPermaLink="false">YHj8SHJ4ch0GpUH3gZR1J-nqq2g=</guid>
      <description>Generating an atomic stoichiometric matrix</description>
      <content:encoded><![CDATA[



&lt;p&gt;
In computing thermodynamic properties with species, it is sometimes required to get a matrix that specifies number of each type of atom in each species. For example, we can create this by hand as follows:
&lt;/p&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="left" /&gt;

&lt;col  class="right" /&gt;

&lt;col  class="right" /&gt;

&lt;col  class="right" /&gt;

&lt;col  class="right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="left"&gt;&amp;#xa0;&lt;/td&gt;
&lt;td class="right"&gt;H2O&lt;/td&gt;
&lt;td class="right"&gt;CO2&lt;/td&gt;
&lt;td class="right"&gt;H2&lt;/td&gt;
&lt;td class="right"&gt;CO&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="left"&gt;H&lt;/td&gt;
&lt;td class="right"&gt;2&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;td class="right"&gt;2&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="left"&gt;C&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;td class="right"&gt;1&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;td class="right"&gt;1&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="left"&gt;O&lt;/td&gt;
&lt;td class="right"&gt;1&lt;/td&gt;
&lt;td class="right"&gt;2&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;td class="right"&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Here we aim to generate this table from code. Why? 1. We can readily add species to it if we do it right. 2. We are less likely to make mistakes in generation of the table, and if we do, it will be faster to regenerate the table. 
&lt;/p&gt;

&lt;p&gt;
We will start with a list of strings that represent the chemical formula of each species. We will need to parse the strings to find the elements, and number of them. We will use a fairly naive regular expression to parse a chemical formula. Basically, we match a capital letter + an optional lowercase letter, followed by an optional number. Here is a fictitous example to illustrate. Note, this will not work with formulas that have parentheses, or charges. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; re
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;m&lt;/span&gt; = re.findall(&lt;span style="color: #228b22;"&gt;'([A-Z][a-z]?)(\d?)'&lt;/span&gt; , &lt;span style="color: #228b22;"&gt;'ArC2H6Cu56Pd47Co'&lt;/span&gt;)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; m
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[('Ar', ''), ('C', '2'), ('H', '6'), ('Cu', '5'), ('Pd', '4'), ('Co', '')]
&lt;/pre&gt;

&lt;p&gt;
Now, we need to loop over the species, and collect all the elements in them. We will just make a list of all of the elments, and then get the set.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; re

# &lt;span style="color: #ff0000; font-weight: bold;"&gt;save for future use&lt;/span&gt;
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;cf&lt;/span&gt; = re.compile(&lt;span style="color: #228b22;"&gt;'([A-Z][a-z]?)(\d?)'&lt;/span&gt;)

&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;species&lt;/span&gt; = [&lt;span style="color: #228b22;"&gt;'H2O'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'CO2'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'H2'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'CO2'&lt;/span&gt;]

&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;all_elements&lt;/span&gt; = []

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; s &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; species:
    &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; el, count &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; re.findall(cf, s):
        &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;all_elements&lt;/span&gt; += [el]

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt;&lt;span style="color: #cd0000;"&gt; set&lt;/span&gt;(all_elements)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
set(['H', 'C', 'O'])
&lt;/pre&gt;

&lt;p&gt;
Finally, we can create the table. We need to loop through each element, and then through each species
&lt;/p&gt;


&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; re

# &lt;span style="color: #ff0000; font-weight: bold;"&gt;save for future use&lt;/span&gt;
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;cf&lt;/span&gt; = re.compile(&lt;span style="color: #228b22;"&gt;'([A-Z][a-z]?)(\d?)'&lt;/span&gt;)

&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;species&lt;/span&gt; = [&lt;span style="color: #228b22;"&gt;'H2O'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'CO2'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'H2'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'CO2'&lt;/span&gt;]

&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;all_elements&lt;/span&gt; = []

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; s &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; species:
    &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; el, count &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; re.findall(cf, s):
        &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;all_elements&lt;/span&gt; += [el]

&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;atoms&lt;/span&gt; =&lt;span style="color: #cd0000;"&gt; set&lt;/span&gt;(all_elements)

# &lt;span style="color: #ff0000; font-weight: bold;"&gt;we put a placeholder in the first row&lt;/span&gt;
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;counts&lt;/span&gt; = [[&lt;span style="color: #228b22;"&gt;""&lt;/span&gt;] + species]
&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; e &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; atoms:
    # &lt;span style="color: #ff0000; font-weight: bold;"&gt;store the element in the first column&lt;/span&gt;
    &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;count&lt;/span&gt; = [e]
    &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; s &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; species:    
        &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;d&lt;/span&gt; =&lt;span style="color: #cd0000;"&gt; dict&lt;/span&gt;(re.findall(cf, s))
        &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;n&lt;/span&gt; = d.get(e, &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;0&lt;/span&gt;)
        &lt;span style="color: #8b0000;"&gt;if&lt;/span&gt; &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;n&lt;/span&gt; == &lt;span style="color: #228b22;"&gt;''&lt;/span&gt;: &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;n&lt;/span&gt; = &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;1&lt;/span&gt;
        &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;count&lt;/span&gt; += &lt;span style="color: #cd0000;"&gt;[int&lt;/span&gt;(n)]
    &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;counts&lt;/span&gt; += [count]

# &lt;span style="color: #ff0000; font-weight: bold;"&gt;this directly returns the array to org-mode&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; counts
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="left" /&gt;

&lt;col  class="right" /&gt;

&lt;col  class="right" /&gt;

&lt;col  class="right" /&gt;

&lt;col  class="right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="left"&gt;&amp;#xa0;&lt;/td&gt;
&lt;td class="right"&gt;H2O&lt;/td&gt;
&lt;td class="right"&gt;CO2&lt;/td&gt;
&lt;td class="right"&gt;H2&lt;/td&gt;
&lt;td class="right"&gt;CO2&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="left"&gt;H&lt;/td&gt;
&lt;td class="right"&gt;2&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;td class="right"&gt;2&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="left"&gt;C&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;td class="right"&gt;1&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;td class="right"&gt;1&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="left"&gt;O&lt;/td&gt;
&lt;td class="right"&gt;1&lt;/td&gt;
&lt;td class="right"&gt;2&lt;/td&gt;
&lt;td class="right"&gt;0&lt;/td&gt;
&lt;td class="right"&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
For this simple example it seems like a lot of code. If there were 200 species though, it would be the same code! Only the list of species would be longer. It might be possible to avoid the two sets of looping, if you could represent the stoichiometric matrix as a sparse matrix, i.e. only store non-zero elements. The final comment I have is related to the parsing of the chemical formulas. Here we can only parse simple formulas. To do better than this would require a pretty sophisticated parser, probably built on the grammar of chemical formulas. The example &lt;a href="http://www.onlamp.com/pub/a/python/2006/01/26/pyparsing.html?page=3"&gt;here&lt;/a&gt; implements the code above using pyparsing, and could probably be extended to include more complex formulas such as (CH3)3CH.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2014 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2014/09/23/Generating-an-atomic-stoichiometric-matrix.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;&lt;p&gt;Org-mode version = 8.2.7c&lt;/p&gt;]]></content:encoded>
    </item>
  </channel>
</rss>
