Generating an atomic stoichiometric matrix

Tue, 23 Sep 2014 14:25:36 EDT

In computing thermodynamic properties with species, it is sometimes required to get a matrix that specifies number of each type of atom in each species. For example, we can create this by hand as follows:

	H2O	CO2	H2	CO
H	2	0	2	0
C	0	1	0	1
O	1	2	0	1

Here we aim to generate this table from code. Why? 1. We can readily add species to it if we do it right. 2. We are less likely to make mistakes in generation of the table, and if we do, it will be faster to regenerate the table.

We will start with a list of strings that represent the chemical formula of each species. We will need to parse the strings to find the elements, and number of them. We will use a fairly naive regular expression to parse a chemical formula. Basically, we match a capital letter + an optional lowercase letter, followed by an optional number. Here is a fictitous example to illustrate. Note, this will not work with formulas that have parentheses, or charges.

import re
m = re.findall('([A-Z][a-z]?)(\d?)' , 'ArC2H6Cu56Pd47Co')
print m

[('Ar', ''), ('C', '2'), ('H', '6'), ('Cu', '5'), ('Pd', '4'), ('Co', '')]

Now, we need to loop over the species, and collect all the elements in them. We will just make a list of all of the elments, and then get the set.

import re

# save for future use
cf = re.compile('([A-Z][a-z]?)(\d?)')

species = ['H2O', 'CO2', 'H2', 'CO2']

all_elements = []

for s in species:
    for el, count in re.findall(cf, s):
        all_elements += [el]

print set(all_elements)

set(['H', 'C', 'O'])

Finally, we can create the table. We need to loop through each element, and then through each species

import re

# save for future use
cf = re.compile('([A-Z][a-z]?)(\d?)')

species = ['H2O', 'CO2', 'H2', 'CO2']

all_elements = []

for s in species:
    for el, count in re.findall(cf, s):
        all_elements += [el]

atoms = set(all_elements)

# we put a placeholder in the first row
counts = [[""] + species]
for e in atoms:
    # store the element in the first column
    count = [e]
    for s in species:    
        d = dict(re.findall(cf, s))
        n = d.get(e, 0)
        if n == '': n = 1
        count += [int(n)]
    counts += [count]

# this directly returns the array to org-mode
return counts

	H2O	CO2	H2	CO2
H	2	0	2	0
C	0	1	0	1
O	1	2	0	2

For this simple example it seems like a lot of code. If there were 200 species though, it would be the same code! Only the list of species would be longer. It might be possible to avoid the two sets of looping, if you could represent the stoichiometric matrix as a sparse matrix, i.e. only store non-zero elements. The final comment I have is related to the parsing of the chemical formulas. Here we can only parse simple formulas. To do better than this would require a pretty sophisticated parser, probably built on the grammar of chemical formulas. The example here implements the code above using pyparsing, and could probably be extended to include more complex formulas such as (CH3)3CH.

org-mode source

Org-mode version = 8.2.7c

The Kitchin Research Group

Generating an atomic stoichiometric matrix