Serializing an Atoms object in xml

| categories: xml, ase, python | tags:

I have a future need to serialize an Atoms object from ase as XML. I would use json usually, but I want to use a program that will index xml. I have previously used pyxser for this, but I recall it being difficult to install, and it does not pip install on my Mac. So, here we look at xmlwitch which does pip install ;). This package does some serious magic with context managers.

One thing I am not sure about here is the best way to represent numbers and lists/arrays. I am using repr here, and assuming you would want to read this back in to Python where this could simply be eval'ed. Some alternatives would be to convert them to lists, or save them as arrays of xml elements.

from ase.data.g2 import data
from ase.structure import molecule
import xmlwitch

atoms = molecule('H2O')

def serialize_atoms(atoms):
    'Return an xml string of an ATOMS object.'
    xml = xmlwitch.Builder(version='1.0', encoding='utf-8')

    with xml.atoms():
        for atom in atoms:
            with xml.atom(index=repr(atom.index)):
                xml.symbol(atom.symbol)
                xml.position(repr(atom.position))
                xml.magmom(repr(atom.magmom))
                xml.mass(repr(atom.mass))
                xml.momentum(repr(atom.momentum))
                xml.number(repr(atom.number))
        xml.cell(repr(atoms.cell))
        xml.pbc(repr(atoms.pbc))
    return xml

atoms_xml = serialize_atoms(atoms)
print atoms_xml

with open('atoms.xml', 'w') as f:
    f.write(str(atoms_xml))
<?xml version="1.0" encoding="utf-8"?>
<atoms>
  <atom index="0">
    <symbol>O</symbol>
    <position>array([ 0.      ,  0.      ,  0.119262])</position>
    <magmom>0.0</magmom>
    <mass>15.9994</mass>
    <momentum>array([ 0.,  0.,  0.])</momentum>
    <number>8</number>
  </atom>
  <atom index="1">
    <symbol>H</symbol>
    <position>array([ 0.      ,  0.763239, -0.477047])</position>
    <magmom>0.0</magmom>
    <mass>1.0079400000000001</mass>
    <momentum>array([ 0.,  0.,  0.])</momentum>
    <number>1</number>
  </atom>
  <atom index="2">
    <symbol>H</symbol>
    <position>array([ 0.      , -0.763239, -0.477047])</position>
    <magmom>0.0</magmom>
    <mass>1.0079400000000001</mass>
    <momentum>array([ 0.,  0.,  0.])</momentum>
    <number>1</number>
  </atom>
  <cell>array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])</cell>
  <pbc>array([False, False, False], dtype=bool)</pbc>
</atoms>

Now, we can try reading that file. I am going to use emacs-lisp here for fun, and compute the formula.

(let* ((xml (car (xml-parse-file "atoms.xml")))
       (atoms (xml-get-children xml 'atom))
       (symbol-elements (mapcar (lambda (atom)
                                  (car (xml-get-children atom 'symbol)))
                                atoms))
       (symbols (mapcar (lambda (x)
                          (car (xml-node-children x)))
                        symbol-elements)))
  (mapconcat (lambda (c)
               (format "%s%s" (car c)
                       (if (= 1 (cdr c))
                           ""
                         (cdr c))))
             (loop for sym in (-uniq symbols)
                   collect (cons
                            sym
                            (-count (lambda (x) (string= x sym)) symbols)))
             ""))
OH2

Here is a (misleadingly) concise way to do this in Python. It is so short thanks to there being a Counter that does what we want, and some pretty nice list comprehension!

import xml.etree.ElementTree as ET
from collections import Counter
with open('atoms.xml') as f:
    xml = ET.fromstring(f.read())

counts = Counter([el.text for el in xml.findall('atom/symbol')])

print ''.join(['{0}{1}'.format(a,b) if b>1 else a for a,b in counts.iteritems()])
H2O

And finally a test on reading a unit cell.

import xml.etree.ElementTree as ET
from numpy import array

with open('atoms.xml') as f:
    xml = ET.fromstring(f.read())

print eval(xml.find('cell').text)
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]

That seems to work but, yeah, you won't want to read untrusted xml with that! See http://stupidpythonideas.blogspot.com/2013/11/repr-eval-bad-idea.html . It might be better (although not necessarily more secure) to use pickle or some other serialization strategy for this.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter