Generating an atomic stoichiometric matrix
Posted September 23, 2014 at 02:25 PM | categories: thermodynamics, python | tags:
In computing thermodynamic properties with species, it is sometimes required to get a matrix that specifies number of each type of atom in each species. For example, we can create this by hand as follows:
H2O | CO2 | H2 | CO | |
H | 2 | 0 | 2 | 0 |
C | 0 | 1 | 0 | 1 |
O | 1 | 2 | 0 | 1 |
Here we aim to generate this table from code. Why? 1. We can readily add species to it if we do it right. 2. We are less likely to make mistakes in generation of the table, and if we do, it will be faster to regenerate the table.
We will start with a list of strings that represent the chemical formula of each species. We will need to parse the strings to find the elements, and number of them. We will use a fairly naive regular expression to parse a chemical formula. Basically, we match a capital letter + an optional lowercase letter, followed by an optional number. Here is a fictitous example to illustrate. Note, this will not work with formulas that have parentheses, or charges.
import re m = re.findall('([A-Z][a-z]?)(\d?)' , 'ArC2H6Cu56Pd47Co') print m
[('Ar', ''), ('C', '2'), ('H', '6'), ('Cu', '5'), ('Pd', '4'), ('Co', '')]
Now, we need to loop over the species, and collect all the elements in them. We will just make a list of all of the elments, and then get the set.
import re # save for future use cf = re.compile('([A-Z][a-z]?)(\d?)') species = ['H2O', 'CO2', 'H2', 'CO2'] all_elements = [] for s in species: for el, count in re.findall(cf, s): all_elements += [el] print set(all_elements)
set(['H', 'C', 'O'])
Finally, we can create the table. We need to loop through each element, and then through each species
import re # save for future use cf = re.compile('([A-Z][a-z]?)(\d?)') species = ['H2O', 'CO2', 'H2', 'CO2'] all_elements = [] for s in species: for el, count in re.findall(cf, s): all_elements += [el] atoms = set(all_elements) # we put a placeholder in the first row counts = [[""] + species] for e in atoms: # store the element in the first column count = [e] for s in species: d = dict(re.findall(cf, s)) n = d.get(e, 0) if n == '': n = 1 count += [int(n)] counts += [count] # this directly returns the array to org-mode return counts
H2O | CO2 | H2 | CO2 | |
H | 2 | 0 | 2 | 0 |
C | 0 | 1 | 0 | 1 |
O | 1 | 2 | 0 | 2 |
For this simple example it seems like a lot of code. If there were 200 species though, it would be the same code! Only the list of species would be longer. It might be possible to avoid the two sets of looping, if you could represent the stoichiometric matrix as a sparse matrix, i.e. only store non-zero elements. The final comment I have is related to the parsing of the chemical formulas. Here we can only parse simple formulas. To do better than this would require a pretty sophisticated parser, probably built on the grammar of chemical formulas. The example here implements the code above using pyparsing, and could probably be extended to include more complex formulas such as (CH3)3CH.
Copyright (C) 2014 by John Kitchin. See the License for information about copying.
Org-mode version = 8.2.7c