Using YAML in python for structured data

Posted February 03, 2014 at 09:36 AM | categories: python, template, yaml | tags:

YAML is a data format that is most text, with some indentation. It is like JSON, but without the braces. What is important here is that you can read a yaml document into a python dictionary. Here is an example of reading a yaml string so you can see the format.

import yaml
document = """
a: 1
b:
  c: 3
  d: 4
"""
print yaml.load(document)

{'a': 1, 'b': {'c': 3, 'd': 4}}

Everything indented by the same level is grouped in its own dictionary. If we put that string into a file (test.yaml ), we can read that in to python like this.

import yaml
document = open('test.yaml').read()
print yaml.load(document)

{'a': 1, 'b': {'c': 3, 'd': 4}}

That example is pretty trivial. What I want to do is have yaml file that represents a course syllabus. Then, if I had a set of these files, I could write code to analyze the collection of syllabi. For example, to figure out how many units of particular category there are. Alternatively, I could create different representations of the document, e.g. a pdf or html file for students or accreditation boards. Below is a YAML representtion of an ABET syllabus. It is pretty readable for a person.

import yaml
document = """
course:
  course-number: 06-364
  title: Chemical Reaction Engineering
  units: 9
  description: Fundamental concepts in the kinetic modeling of chemical reactions, the treatment and analysis of rate data. Multiple reactions and reaction mechanisms. Analysis and design of ideal and non-ideal reactor systems. Energy effects and mass transfer in reactor systems. Introductory principles in heterogeneous catalysis. 

  textbook: H. S. Fogler, Elements of Chemical Reaction Engineering, 4th edition, Prentice Hall, New York, 2006.
  prerequisites: [06-321, 06-323, 09-347]
  required: Yes

  goals:
    goal1: 
      description: To analyze kinetic data and obtain rate laws 
      outcomes: [a, k]
      criteria: [A, F]
    goal2:
      description: To develop a mechanism that is consistent with an experimental rate law 
    goal3:
      description: To understand the behavior of different reactor types when they are used either individually or in combination 
    goal4: 
      description: To choose a reactor and determine its size for a given application
    goal5:
      description: To work with mass and energy balances in the design of non-isothermal reactors 
    goal6:
      description: To understand the importance of selectivity and know the strategies that are commonly used in maximizing yields
    goal7:
      description: To effectively use mathematical software in the design of reactors and analysis of data 

  topics:
    - Conversion and reactor sizing
    - Rate laws and stoichiometry
    - Isothermal reactor design
    - Collection and analysis of rate data
    - Multiple reactions and selectivity
    - Non-elementary reaction kinetics
    - Non-isothermal reactor design
    - Unsteady operation of reactors
    - Catalysis and catalytic reactors
"""
with open('06-364.yaml', 'w') as f:
    f.write(document)

print yaml.load(document)

{'course': {'description': 'Fundamental concepts in the kinetic modeling of chemical reactions, the treatment and analysis of rate data. Multiple reactions and reaction mechanisms. Analysis and design of ideal and non-ideal reactor systems. Energy effects and mass transfer in reactor systems. Introductory principles in heterogeneous catalysis.', 'title': 'Chemical Reaction Engineering', 'prerequisites': ['06-321', '06-323', '09-347'], 'topics': ['Conversion and reactor sizing', 'Rate laws and stoichiometry', 'Isothermal reactor design', 'Collection and analysis of rate data', 'Multiple reactions and selectivity', 'Non-elementary reaction kinetics', 'Non-isothermal reactor design', 'Unsteady operation of reactors', 'Catalysis and catalytic reactors'], 'required': True, 'textbook': 'H. S. Fogler, Elements of Chemical Reaction Engineering, 4th edition, Prentice Hall, New York, 2006.', 'goals': {'goal6': {'description': 'To understand the importance of selectivity and know the strategies that are commonly used in maximizing yields'}, 'goal7': {'description': 'To effectively use mathematical software in the design of reactors and analysis of data'}, 'goal4': {'description': 'To choose a reactor and determine its size for a given application'}, 'goal5': {'description': 'To work with mass and energy balances in the design of non-isothermal reactors'}, 'goal2': {'description': 'To develop a mechanism that is consistent with an experimental rate law'}, 'goal3': {'description': 'To understand the behavior of different reactor types when they are used either individually or in combination'}, 'goal1': {'outcomes': ['a', 'k'], 'description': 'To analyze kinetic data and obtain rate laws', 'criteria': ['A', 'F']}}, 'units': 9, 'course-number': '06-364'}}

You can see here the whole document is now stored as a dictionary. You might ask why? I have the following interests:

If I have a set of these files, I could loop through them and generate some kind of summary, e.g. total units of some category.
I could generate a consistent format using a template.

Let us explore the template. We will generate a LaTeX document using the Cheetah template engine (http://www.cheetahtemplate.org/ ). I have also used Mako , and jinja . A template is a fancy string that has code in that can be evaluated and substituted at generation time. We use this to replace elements of the template with data from our yaml document. Below I created a template that generates a LaTeX document.

import yaml
from Cheetah.Template import Template

with open('06-364.yaml', 'r') as f:
    document = yaml.load(f.read())

data = document['course']

template = r'''\documentclass{article}
\renewcommand{\abstractname}{Course Description}

\begin{document}
\title{$data['course-number'] $data['title']}
\maketitle
\begin{abstract}
$data['description']
\end{abstract}

\textbf{Required:} $data['required']

\textbf{Prerequisites:} #echo ', '.join($data['prerequisites'])

{\textbf{Textbook:} $data['textbook']

\section{Course goals}
\begin{enumerate}
#for $goal in $data['goals']
\item $data['goals'][$goal]['description'] \label{$goal}
#end for
\end{enumerate}

\section{Topics}
\begin{itemize}
#for $topic in $data['topics']
\item $topic
#end for
\end{itemize}
\end{document}'''

t = Template(template, searchList=locals())

#import sys; sys.exit()
with open('06-364.tex', 'w') as f:
    f.write(t.respond())

None

You can see the results of the tex file here: 06-364.tex , and the corresponding pdf here: 06-364.pdf . It is not spectacular by any means, but if I had 16 of these to create, this sure would be convenient! And if we need some other format, we just make a new template!

Some notes about this:

The course goals are not in the order defined in the yaml file. That is not too surprising, since dictionaries do not preserve order.
Yes in yaml apparently is read in as a boolean, so in the pdf, it is printed as True.
I have not thought about how to prepare a table that maps student outcomes (a-k in ABET) to the course goals
It would be nice if there were links in the pdf to other syllabi, e.g. the prerequisites. See http://ctan.mirrorcatalogs.com/macros/latex/required/tools/xr.pdf

org-mode source

Org-mode version = 8.2.5g

The Kitchin Research Group

Chemical Engineering at Carnegie Mellon University

Using YAML in python for structured data