The Kitchin Research Group: python

Finding VASP calculations in a directory tree

Posted March 20, 2014 at 08:09 PM | categories: python, vasp | tags:

The goal in this post is to work out a way to find all the directories in some root directory that contain VASP calculations. This is a precursor to doing something with those directories, e.g. creating a summary file, adding entries to a database, doing some analysis, etc… For fun, we will just calculate the total elapsed time in the calculations.

What is challenging about this problem is that the calculations are often nested in a variety of different subdirectories, and we do not always know the structure. We need to recursively descend into those directories to check if they contain VASP calculations.

We will use a function that returns True or False to tell us if a particular directory is a VASP calculation or not. We can tell that because a completed VASP calculation has specific files in it, and specific content in those files. Notably, there is an OUTCAR file that contains the text "General timing and accounting informations for this job:" near the end of the file.

We will also use os.walk as the way to recursively descend into the root directory.

import os
from jasp import *

def vasp_p(directory):
    'returns True if a finished OUTCAR file exists in the current directory, else False'
    outcar = os.path.join(directory, 'OUTCAR')
    if os.path.exists(outcar):
        with open(outcar, 'r') as f:
            contents = f.read()
            if 'General timing and accounting informations for this job:' in contents:
                return True
    return False
            
        
total_time = 0

for root, dirs, files in os.walk('/home-research/jkitchin/research/rutile-atat'):
    for d in dirs:
        # compute absolute path to each directory in the current root
        absd = os.path.join(root, d)
        if vasp_p(absd):
            # we found a vasp directory, so we can do something in it. 
            # here we get the elapsed time from the calculation
            with jasp(absd) as calc:
                total_time += calc.get_elapsed_time()

print 'Total computational time on this project is {0:1.0f} minutes.'.format(total_time / 60)

Total computational time on this project is 231 minutes.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter

Using YAML in python for structured data

Posted February 03, 2014 at 09:36 AM | categories: python, template, yaml | tags:

YAML is a data format that is most text, with some indentation. It is like JSON, but without the braces. What is important here is that you can read a yaml document into a python dictionary. Here is an example of reading a yaml string so you can see the format.

import yaml
document = """
a: 1
b:
  c: 3
  d: 4
"""
print yaml.load(document)

{'a': 1, 'b': {'c': 3, 'd': 4}}

Everything indented by the same level is grouped in its own dictionary. If we put that string into a file (test.yaml ), we can read that in to python like this.

import yaml
document = open('test.yaml').read()
print yaml.load(document)

{'a': 1, 'b': {'c': 3, 'd': 4}}

That example is pretty trivial. What I want to do is have yaml file that represents a course syllabus. Then, if I had a set of these files, I could write code to analyze the collection of syllabi. For example, to figure out how many units of particular category there are. Alternatively, I could create different representations of the document, e.g. a pdf or html file for students or accreditation boards. Below is a YAML representtion of an ABET syllabus. It is pretty readable for a person.

import yaml
document = """
course:
  course-number: 06-364
  title: Chemical Reaction Engineering
  units: 9
  description: Fundamental concepts in the kinetic modeling of chemical reactions, the treatment and analysis of rate data. Multiple reactions and reaction mechanisms. Analysis and design of ideal and non-ideal reactor systems. Energy effects and mass transfer in reactor systems. Introductory principles in heterogeneous catalysis. 

  textbook: H. S. Fogler, Elements of Chemical Reaction Engineering, 4th edition, Prentice Hall, New York, 2006.
  prerequisites: [06-321, 06-323, 09-347]
  required: Yes

  goals:
    goal1: 
      description: To analyze kinetic data and obtain rate laws 
      outcomes: [a, k]
      criteria: [A, F]
    goal2:
      description: To develop a mechanism that is consistent with an experimental rate law 
    goal3:
      description: To understand the behavior of different reactor types when they are used either individually or in combination 
    goal4: 
      description: To choose a reactor and determine its size for a given application
    goal5:
      description: To work with mass and energy balances in the design of non-isothermal reactors 
    goal6:
      description: To understand the importance of selectivity and know the strategies that are commonly used in maximizing yields
    goal7:
      description: To effectively use mathematical software in the design of reactors and analysis of data 

  topics:
    - Conversion and reactor sizing
    - Rate laws and stoichiometry
    - Isothermal reactor design
    - Collection and analysis of rate data
    - Multiple reactions and selectivity
    - Non-elementary reaction kinetics
    - Non-isothermal reactor design
    - Unsteady operation of reactors
    - Catalysis and catalytic reactors
"""
with open('06-364.yaml', 'w') as f:
    f.write(document)

print yaml.load(document)

{'course': {'description': 'Fundamental concepts in the kinetic modeling of chemical reactions, the treatment and analysis of rate data. Multiple reactions and reaction mechanisms. Analysis and design of ideal and non-ideal reactor systems. Energy effects and mass transfer in reactor systems. Introductory principles in heterogeneous catalysis.', 'title': 'Chemical Reaction Engineering', 'prerequisites': ['06-321', '06-323', '09-347'], 'topics': ['Conversion and reactor sizing', 'Rate laws and stoichiometry', 'Isothermal reactor design', 'Collection and analysis of rate data', 'Multiple reactions and selectivity', 'Non-elementary reaction kinetics', 'Non-isothermal reactor design', 'Unsteady operation of reactors', 'Catalysis and catalytic reactors'], 'required': True, 'textbook': 'H. S. Fogler, Elements of Chemical Reaction Engineering, 4th edition, Prentice Hall, New York, 2006.', 'goals': {'goal6': {'description': 'To understand the importance of selectivity and know the strategies that are commonly used in maximizing yields'}, 'goal7': {'description': 'To effectively use mathematical software in the design of reactors and analysis of data'}, 'goal4': {'description': 'To choose a reactor and determine its size for a given application'}, 'goal5': {'description': 'To work with mass and energy balances in the design of non-isothermal reactors'}, 'goal2': {'description': 'To develop a mechanism that is consistent with an experimental rate law'}, 'goal3': {'description': 'To understand the behavior of different reactor types when they are used either individually or in combination'}, 'goal1': {'outcomes': ['a', 'k'], 'description': 'To analyze kinetic data and obtain rate laws', 'criteria': ['A', 'F']}}, 'units': 9, 'course-number': '06-364'}}

You can see here the whole document is now stored as a dictionary. You might ask why? I have the following interests:

If I have a set of these files, I could loop through them and generate some kind of summary, e.g. total units of some category.
I could generate a consistent format using a template.

Let us explore the template. We will generate a LaTeX document using the Cheetah template engine (http://www.cheetahtemplate.org/ ). I have also used Mako , and jinja . A template is a fancy string that has code in that can be evaluated and substituted at generation time. We use this to replace elements of the template with data from our yaml document. Below I created a template that generates a LaTeX document.

import yaml
from Cheetah.Template import Template

with open('06-364.yaml', 'r') as f:
    document = yaml.load(f.read())

data = document['course']

template = r'''\documentclass{article}
\renewcommand{\abstractname}{Course Description}

\begin{document}
\title{$data['course-number'] $data['title']}
\maketitle
\begin{abstract}
$data['description']
\end{abstract}

\textbf{Required:} $data['required']

\textbf{Prerequisites:} #echo ', '.join($data['prerequisites'])

{\textbf{Textbook:} $data['textbook']

\section{Course goals}
\begin{enumerate}
#for $goal in $data['goals']
\item $data['goals'][$goal]['description'] \label{$goal}
#end for
\end{enumerate}

\section{Topics}
\begin{itemize}
#for $topic in $data['topics']
\item $topic
#end for
\end{itemize}
\end{document}'''

t = Template(template, searchList=locals())

#import sys; sys.exit()
with open('06-364.tex', 'w') as f:
    f.write(t.respond())

None

You can see the results of the tex file here: 06-364.tex , and the corresponding pdf here: 06-364.pdf . It is not spectacular by any means, but if I had 16 of these to create, this sure would be convenient! And if we need some other format, we just make a new template!

Some notes about this:

The course goals are not in the order defined in the yaml file. That is not too surprising, since dictionaries do not preserve order.
Yes in yaml apparently is read in as a boolean, so in the pdf, it is printed as True.
I have not thought about how to prepare a table that maps student outcomes (a-k in ABET) to the course goals
It would be nice if there were links in the pdf to other syllabi, e.g. the prerequisites. See http://ctan.mirrorcatalogs.com/macros/latex/required/tools/xr.pdf

org-mode source

Org-mode version = 8.2.5g

Discuss on Twitter

Printing unicode characters in Python strings

Posted February 02, 2014 at 12:18 PM | categories: python, unicode | tags:

Are you tired of printing strings like this:

print 'The volume is {0} Angstrom^3'.format(125)

The volume is 125 Angstrom^3

Wish you could get Å in your string? That is the unicode character U+212B. We can get that to print in Python, but we have to create it in a unicode string, and print the string properly encoded. Let us try it out.

print u'\u212B'.encode('utf-8')

Å

We use u'' to indicate a unicode string. Note we have to encode the string to print it, or will get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u212b' in position 0: ordinal not in range(128)

Do more, do more, we wish we could! Unicode also supports some superscripted and subscripted numbers (http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts ). Let us see that in action.

print u'\u212B\u00B3'.encode('utf-8')

Å³

Pretty sweet. The code is not all that readable if you aren't fluent in unicode, but if it was buried in some library it would just print something nice looking. We can use this to print chemical formulas too.

print u'''The chemical formula of water is H\u2082O.
Water dissociates into H\u207A and OH\u207B'''.encode('utf-8')

=The chemical formula of water is H₂O. Water dissociates into H⁺ and OH⁻

There are other encodings too. See the symbols here: http://en.wikipedia.org/wiki/Number_Forms

print u'1/4 or \u00BC'.encode('latin-1')

1/4 or ¼

That seems like:

print u'A good idea\u00AE'.encode('latin-1')

A good idea®

I can not tell how you know exactly what encoding to use. If you use utf-8 in the example above, you get a stray character in front of the desired trademark symbol. Still, it is interesting you can get prettier symbols!

org-mode source

Org-mode version = 8.2.5g

Discuss on Twitter

Using tags to filter lists in Python

Posted January 29, 2014 at 12:52 PM | categories: python | tags:

Updated January 29, 2014 at 12:52 PM

Suppose you have a collection of items in a list, and you want to filter the list based on some properties of the items, and then accumulate some other property on the filtered items. We will look at some strategies for this here.

The particular application is that I have a list of courses that make up a curriculum, and I want to summarize the curriculum in a variety of ways. For example, I might want to know how many Gen Ed courses there are, or how many math, chemistry, biology and physics courses there are. I may want to know how may units overall are required.

A course will be represented by a class, which simply holds the data about the course. Here we consider the course number (which is really a string), the number of units of the course, and what category the course fits into. There will be 7 categories here: chemistry, biology, physics, math, engineering, general education, and free elective.

We will use some binary math to represent the categories. Essentially we define tags as if they are binary numbers, and then we can use binary operators to tell if an item is tagged a particular way. We use & to do a logical AND between a variable and a TAG. If it comes out True, the variable has that tag.

This works basically by defining a TAG like a binary number, e.g. TAG1 = 100, TAG2 = 010, TAG3 = 001. Then, if you have a number like 110, you know it is tagged with TAG1 and TAG2, but not TAG3. We can figure that out with code too.

100 & 110 = 100 = 1
010 & 110 = 010 = 2

print 1 & 3
print 2 & 3

1
2

Let us try out an example. The easiest way to define the tags, is as powers of two.

# define some tags
TAG1 = 2**0  # 100
TAG2 = 2**1  # 010

# Now define a variable that is "tagged"
a = TAG1
print a & TAG1 # remember that 0 = False, everything else is true
print a & TAG2

1
0

We can use multiple tags by adding them together.

# define some tags
TAG1 = 2**0  # 100
TAG2 = 2**1  # 010
TAG3 = 2**2  # 001

# Now define a variable that is "tagged"
a = TAG1 + TAG2  # 1 + 2 = 3 = 110 in binary
print a & TAG1 
print a & TAG2
print a & TAG3

1
2
0

You can see that the variable is not tagged by TAG3, but is tagged with TAG1 and TAG2. We might want to tag an item with more than one tag. We create groups of tags by simply adding them together. We can still check if a variable has a particular tag like we did before.

# define some tags
TAG1 = 2**0  # 100
TAG2 = 2**1  # 010
TAG3 = 2**2  # 001

# Now define a variable that is "tagged"
a = TAG1 + TAG2  # 1 + 2 = 3 = 110 in binary
print a & TAG1
print a & TAG2
print a & TAG3

1
2
0

It is trickier to say if a variable is tagged with a particular set of tags. Let us consider why. The binary representation of TAG1 + TAG2 is 110. The binary representation of TAG2 + TAG3 is 011. If we simply consider (TAG1 + TAG2) & (TAG2 & TAG3) we get 010. That actually tells us that we do not have a match, because 010 is not equal to (TAG2 & TAG3 = 011). In other words, the logical AND of the tag with some sum of tags is equal to the sum of tags when there is a match. So, we can check if that is the case with an equality comparison.

# define some tags
TAG1 = 2**0  # 100
TAG2 = 2**1  # 010
TAG3 = 2**2  # 001

# Now define a variable that is "tagged"
a = TAG1 + TAG2  # 1 + 2 = 3 = 110 in binary
print (a & (TAG1 + TAG2)) == TAG1 + TAG2
print (a & (TAG1 + TAG3)) == TAG1 + TAG3
print (a & (TAG2 + TAG3)) == TAG2 + TAG3

True
False
False

Ok, enough binary math, let us see an application. Below we create a set of tags indicating the category a course falls into, a class definition to store course data in attributes of an object, and a list of courses. Then, we show some examples of list comprehension filtering based on the tags to summarize properties of the list. The logical comparisons are simple below, as the courses are not multiply tagged at this point.

CHEMISTRY = 2**0
BIOLOGY = 2**1
PHYSICS = 2**2
MATH = 2**3
ENGINEERING = 2**4
GENED = 2**5
FREE = 2**6

class Course:
    '''simple container for course information'''
    def __init__(self, number, units, category):
        self.number = number
        self.units = units
        self.category = category
    def __repr__(self):
        return self.number


courses = [Course('09-105', 9, CHEMISTRY),
           Course('09-106', 9, CHEMISTRY),
           Course('33-105', 12, PHYSICS),
           Course('33-106', 12, PHYSICS),
           Course('21-120', 10, MATH),
           Course('21-122', 10, MATH),
           Course('21-259', 10, MATH),
           Course('06-100', 12, ENGINEERING),
           Course('xx-xxx', 9, GENED),     
           Course('xx-xxx', 9, FREE), 
           Course('03-232', 9, BIOLOGY)]

# print the total units
print ' Total units = {0}'.format(sum([x.units for x in courses]))

# get units of math required
math_units = sum([x.units  for x in courses if x.category & MATH])

# get total units of math, chemistry, physics and biology a | b is a
# logical OR. This gives a prescription for tagged with MATH OR
# CHEMISTRY OR PHYSICS OR BIOLOGY
BASIC_MS = MATH | CHEMISTRY | PHYSICS | BIOLOGY

# total units in those categories
basic_math_science = sum([x.units for x in courses if x.category & BASIC_MS])

print 'We require {0} units of math out of {1} units of basic math and science courses.'.format(math_units, basic_math_science)

# We are required to have at least 96 units of Math and Sciences.
print 'We are compliant on number of Math and science: ',basic_math_science >= 96

 Total units = 111
We require 30 units of math out of 81 units of basic math and science courses.
We are compliant on number of Math and science:  False

That is all for this example. With more data for each course, you could see what courses are taken in what semesters, how many units are in each semester, maybe create a prerequisite map, and view the curriculum by categories of courses, etc…

org-mode source

Org-mode version = 8.2.5g

Discuss on Twitter

Python as alternative to Matlab for engineering calculations

Posted December 30, 2013 at 03:20 PM | categories: python | tags:

Updated December 30, 2013 at 09:15 PM

1. The significance of an open-source alternative
2. Easy to install Python distributions
3. Python + numpy/scipy/matplotlib does almost everything you need
4. Python from the educator perspective
5. Final thoughts

For the past year I have been seriously exploring whether Python could be used as a practical alternative to Matlab in engineering calculations, particularly in chemical engineering undergraduate and graduate courses. Matlab is very well suited for these calculations, and I have used it extensively in teaching in the past. For example, there is my Matlab blog (http://matlab.cheme.cmu.edu ), and my cmu Matlab package that contains a very nice units package (https://github.com/jkitchin/matlab-cmu ). Matlab is widely used and recognized as a standard software package in engineering. My university has a site license for Matlab, so it doesn't cost me or my students anything to use. Matlab is easy to install, and has almost everything we need out of the box. So why try using something else then? Here are the main reasons:

Not everyone has access to a "free" Matlab license, and Matlab may not be available to my students when they leave the University. Python offers a free, always available option to them.
There are several recent Python distributions that are easy to install, and contain almost everything we need for engineering calculations.
I have been increasingly integrating code into my lecture notes, and this is not easy with Matlab, but it is easy with Python.
I use Python exclusively in my research, and although Matlab and Python are similar, they are different enough that switching between them is bothersome to me. I do not like teaching students to use tools I do not regularly use, and I believe I can provide them with more value by teaching with tools I have a high level of proficiency in.

1 The significance of an open-source alternative

Many people will be able to use Matlab or some other proprietary software that someone has paid for the license to use. Some people, however, will not have that option for a variety of reasons. Maybe the company they work for will not pay for the license, maybe they are unemployed, or entrepeneurs in a small startup that cannot afford it, maybe they are students at a University without a site license,… For these people Python is a viable option that is always available. That makes me happy.

2 Easy to install Python distributions

An important development in using Python as an alternative to Matlab is the development of many "one-click" installers. Ten years ago it took me about 2 weeks to download and build a Python environment suitable for scientific and engineering calculations. That has kept me from trying to use Python in teaching in the past. Today, I can download a package and install one in about 10 minutes! More importantly, so can my students.

My favorite distribution is the Enthought Canopy distribution (https://www.enthought.com/products/canopy/ ). This distribution comes with all the essential python modules, and an integrated editor with IPython. It is available for Windows, Macs and Linux. They offer free academic licenses.

Another good alternative is the Anaconda distribution (https://store.continuum.io/cshop/anaconda/ ) by Continuum Analytics. It is also available for Windows, Macs and Linux. I have not used this one, but it looks like it would be very good. Anaconda comes with the Spyder editor. They offer free academic licenses.

Python(x,y) is available for Windows (http://code.google.com/p/pythonxy/ ) and comes with the Spyder editor.

WinPython (http://winpython.sourceforge.net/ ) is also available for Windows, and comes with the Spyder editor.

The point here is that there are many options available now that make installing a Python distribution as easy as installing packages like Matlab. Enthought Canopy also provides a "desktop environment" similar to Matlab with an editor, documentation browser, package manager and console that is pretty easy to use.

3 Python + numpy/scipy/matplotlib does almost everything you need

Python by itself is not suitable for typical engineering calculations. You need the numerical, scientific and plotting libraries that provide that functionality. These are provided in numpy, scipy and matplotlib, which are included in the distributions described above.

Typical chemical engineering calculations involve one or more of the following kinds of math problems:

Linear algebra
Root finding (nonlinear algebra)
linear regression
Nonlinear regression
Integration and ordinary differential equations
statistics
plotting

All of these are doable out of the box with the Python distributions discussed above. You can find many examples of using these, and more on my PYCSE blog (http://kitchingroup.cheme.cmu.edu ) and http://kitchingroup.cheme.cmu.edu/pycse/ . In short, almost every example I put in the Matlab blog has been done in Python. The only ones I did not do yet are some of the interactive graphics with the steam tables. I have not had time to work those out in detail.

I have found it convenient to augment theses with a package I wrote called pycse (https://github.com/jkitchin/pycse ).

an ode integrator with events similar to the one in Matlab
some numerical differentiation functions
linear and nonlinear regression with confidence intervals
some boundary value problem solvers
a publish function to convert python scripts to PDF via LaTeX

This package is still a work in progress. Notably, there is not a really good units package in Python that works as well as my Matlab units package does. Two that come close are quantities and pint . Both have some nuances that make them tricky for regular use, and both have some challenges in covering all the functions you might want to use them for.

4 Python from the educator perspective

Make that my perspective. I have developed an approach to using code in my lectures where I use the code to reinforce the structure of the problems, and to analyze the solutions that result. Doing that means I need to have code to show students, and the output, and sometimes to run the code to illustrate something. I also like these examples integrated into my lecture notes, so they have the right context around them.

I have found that Emacs+org-mode+python allows me to easily integrate notes, equations, images, code and output in one place, and then export it to a PDF which I can annotate in class. This ensures that the code and output stay synchronized, that the code is always right where it needs to be, in the right context, and that I can annotate actual code in class, and not pseudocode. This heavily influenced my decision to use Python because it leverages what I already know and want to do. In fact, using it makes me even better at what I already know and helps me learn more about it. That makes me happy!

Not everyone will be a content developer like this, but that is what I like to do. Python makes that process fun, and worth doing for me.

5 Final thoughts

In my opinion Python is and is becoming a more viable alternative to other packages like Matlab for scientific and engineering calculations. I have used it exclusively for about a year solving all kinds of engineering problems that I used to solve in Matlab.

Python is different, for sure. The main differences in my opinion are:

Python is less consistent in syntax than Matlab. For example, there are two ODE solvers in scipy with incompatible syntax. That is a result of the fact that you install a Python distribution made of packages written by many different people with different needs.
There is duplicated functionality between numpy and scipy.
Some functionality in scipy is provided by external "scikits" (http://scikits.appspot.com/ ).
Support for boundary value problems and partial differential equations is not as good in Python as it is in Matlab ¹. At the undergraduate level, this is not a big deal. It is not like the Matlab functions are that easy to use!
Data regression in Python is not as complete as in Matlab.
indexing in Python starts at 0, and uses [], whereas in Matlab it starts at 1 and uses ()
You have to import most functions into Python. In contrast, Matlab has them all in one big namespace.

It is certainly doable to use Python for many scientific and engineering calculations. This past Fall I took the plunge, and taught a whole core course in chemical reaction engineering using Python! It was a Master's level course with 59 graduate students in it. I have also taught a graduate elective course in Molecular Simulation using Python. I still have some polishing to do before I would teach this to undergraduates, but I think it is definitely worth trying!

Footnotes:

It is true there are packages like FiPy (nil) for PDEs, and scikits for BVPS, (nil, nil). But these are not easily installed on all platforms, and typically require some developer experience in compiling.

org-mode source

Discuss on Twitter

« Previous Page -- Next Page »

The Kitchin Research Group

Chemical Engineering at Carnegie Mellon University

Finding VASP calculations in a directory tree

Using YAML in python for structured data

Printing unicode characters in Python strings

Using tags to filter lists in Python

Python as alternative to Matlab for engineering calculations

Table of Contents

1 The significance of an open-source alternative

2 Easy to install Python distributions

3 Python + numpy/scipy/matplotlib does almost everything you need

4 Python from the educator perspective

5 Final thoughts

Footnotes: