## Using YAML in python for structured data

| categories: | tags: | View Comments

YAML is a data format that is most text, with some indentation. It is like JSON, but without the braces. What is important here is that you can read a yaml document into a python dictionary. Here is an example of reading a yaml string so you can see the format.

import yaml
document = """
a: 1
b:
c: 3
d: 4
"""

{'a': 1, 'b': {'c': 3, 'd': 4}}


Everything indented by the same level is grouped in its own dictionary. If we put that string into a file (test.yaml ), we can read that in to python like this.

import yaml

{'a': 1, 'b': {'c': 3, 'd': 4}}


That example is pretty trivial. What I want to do is have yaml file that represents a course syllabus. Then, if I had a set of these files, I could write code to analyze the collection of syllabi. For example, to figure out how many units of particular category there are. Alternatively, I could create different representations of the document, e.g. a pdf or html file for students or accreditation boards. Below is a YAML representtion of an ABET syllabus. It is pretty readable for a person.

import yaml
document = """
course:
course-number: 06-364
title: Chemical Reaction Engineering
units: 9
description: Fundamental concepts in the kinetic modeling of chemical reactions, the treatment and analysis of rate data. Multiple reactions and reaction mechanisms. Analysis and design of ideal and non-ideal reactor systems. Energy effects and mass transfer in reactor systems. Introductory principles in heterogeneous catalysis.

textbook: H. S. Fogler, Elements of Chemical Reaction Engineering, 4th edition, Prentice Hall, New York, 2006.
prerequisites: [06-321, 06-323, 09-347]
required: Yes

goals:
goal1:
description: To analyze kinetic data and obtain rate laws
outcomes: [a, k]
criteria: [A, F]
goal2:
description: To develop a mechanism that is consistent with an experimental rate law
goal3:
description: To understand the behavior of different reactor types when they are used either individually or in combination
goal4:
description: To choose a reactor and determine its size for a given application
goal5:
description: To work with mass and energy balances in the design of non-isothermal reactors
goal6:
description: To understand the importance of selectivity and know the strategies that are commonly used in maximizing yields
goal7:
description: To effectively use mathematical software in the design of reactors and analysis of data

topics:
- Conversion and reactor sizing
- Rate laws and stoichiometry
- Isothermal reactor design
- Collection and analysis of rate data
- Multiple reactions and selectivity
- Non-elementary reaction kinetics
- Non-isothermal reactor design
- Catalysis and catalytic reactors
"""
with open('06-364.yaml', 'w') as f:
f.write(document)


{'course': {'description': 'Fundamental concepts in the kinetic modeling of chemical reactions, the treatment and analysis of rate data. Multiple reactions and reaction mechanisms. Analysis and design of ideal and non-ideal reactor systems. Energy effects and mass transfer in reactor systems. Introductory principles in heterogeneous catalysis.', 'title': 'Chemical Reaction Engineering', 'prerequisites': ['06-321', '06-323', '09-347'], 'topics': ['Conversion and reactor sizing', 'Rate laws and stoichiometry', 'Isothermal reactor design', 'Collection and analysis of rate data', 'Multiple reactions and selectivity', 'Non-elementary reaction kinetics', 'Non-isothermal reactor design', 'Unsteady operation of reactors', 'Catalysis and catalytic reactors'], 'required': True, 'textbook': 'H. S. Fogler, Elements of Chemical Reaction Engineering, 4th edition, Prentice Hall, New York, 2006.', 'goals': {'goal6': {'description': 'To understand the importance of selectivity and know the strategies that are commonly used in maximizing yields'}, 'goal7': {'description': 'To effectively use mathematical software in the design of reactors and analysis of data'}, 'goal4': {'description': 'To choose a reactor and determine its size for a given application'}, 'goal5': {'description': 'To work with mass and energy balances in the design of non-isothermal reactors'}, 'goal2': {'description': 'To develop a mechanism that is consistent with an experimental rate law'}, 'goal3': {'description': 'To understand the behavior of different reactor types when they are used either individually or in combination'}, 'goal1': {'outcomes': ['a', 'k'], 'description': 'To analyze kinetic data and obtain rate laws', 'criteria': ['A', 'F']}}, 'units': 9, 'course-number': '06-364'}}


You can see here the whole document is now stored as a dictionary. You might ask why? I have the following interests:

1. If I have a set of these files, I could loop through them and generate some kind of summary, e.g. total units of some category.
2. I could generate a consistent format using a template.

Let us explore the template. We will generate a LaTeX document using the Cheetah template engine (http://www.cheetahtemplate.org/ ). I have also used Mako , and jinja . A template is a fancy string that has code in that can be evaluated and substituted at generation time. We use this to replace elements of the template with data from our yaml document. Below I created a template that generates a LaTeX document.

import yaml
from Cheetah.Template import Template

with open('06-364.yaml', 'r') as f:

data = document['course']

template = r'''\documentclass{article}
\renewcommand{\abstractname}{Course Description}

\begin{document}
\title{$data['course-number']$data['title']}
\maketitle
\begin{abstract}
$data['description'] \end{abstract} \textbf{Required:}$data['required']

\textbf{Prerequisites:} #echo ', '.join($data['prerequisites']) {\textbf{Textbook:}$data['textbook']

\section{Course goals}
\begin{enumerate}
#for $goal in$data['goals']
\item $data['goals'][$goal]['description'] \label{$goal} #end for \end{enumerate} \section{Topics} \begin{itemize} #for$topic in $data['topics'] \item$topic
#end for
\end{itemize}
\end{document}'''

t = Template(template, searchList=locals())

#import sys; sys.exit()
with open('06-364.tex', 'w') as f:
f.write(t.respond())

None


You can see the results of the tex file here: 06-364.tex , and the corresponding pdf here: 06-364.pdf . It is not spectacular by any means, but if I had 16 of these to create, this sure would be convenient! And if we need some other format, we just make a new template!