Enough with the hyperbole - hy does things that are not as easy in Python

| categories: python, hylang | tags:

We run a lot of molecular simulations using Python. Here is a typical script we would use. It creates an instance of a calculator inside a context manager.

from ase import Atoms, Atom
from jasp import *

co = Atoms([Atom('C',[0,   0, 0]),
            Atom('O',[1.2, 0, 0])],
            cell=(6., 6., 6.))

with jasp('molecules/simple-co', #output dir
          xc='PBE',  # the exchange-correlation functional
          nbands=6,  # number of bands
          encut=350, # planewave cutoff
          ismear=1,  # Methfessel-Paxton smearing
          sigma=0.01,# very small smearing factor for a molecule
          atoms=co) as calc:
    print 'energy = {0} eV'.format(co.get_potential_energy())
    print co.get_forces()

This basic approach has served us for more than a decade! Still, there are things about it that bug me. Most significantly is the arbitrary keyword args. We keep a list of legitimate kwargs in the module, but there is no documentation or validation to go with them that is accessible to users (short of reading the code). There are well over 100 kwargs that are possible, so documenting them in the init docstring is not that useful (we did it once, see https://gitlab.com/ase/ase/blob/master/ase/calculators/jacapo/jacapo.py#L143 , and it made a really long docstring). Providing validation for these (some can only be integers, floats, specific strings, lists, or dictionaries) is not easy. I did this for another simulation code by providing validation functions that could be looked up dynamically by name. I never did come up with a way to provide kwarg specific documentation though.

The access to documentation while writing code is becoming increasingly important to me; I don't remember all the kwargs and what values are valid. More importantly, as I teach people how to use these tools, it is not practical to tell them to "read the code". I don't even want to do that while running simulations, I just want to setup the simulation and run it.

Today, I had an idea that a macro in hy would allow me to get documentation and validation of these kwargs.

The pseudocode would look like this. Each "kwarg" will actually be a function that has a docstring, performs validation, and evaluates to its argument. "vaspm" is a macro that will expand to the calculator with the desired kwargs. We will have to be careful that these function names don't conflict with other function names, but that could be addressed in a variety of ways with namespaces and function names.

;; pseudocode of the macro
(setv calc (vaspm "molecules/simple-co"
                  (xc "PBE")
                  (nbands 6)
                  (encut 350)
                  (ismear 1)
                  (sigma 0.01)
                  (atoms co)))

This would expand to the following block, which is equivalent to what we already do today. In the process of expansion though, we gain docstrings and validation!

(setv calc (Vasp "molecules/simple-co"
                 :xc "PBE"
                 :nbands 6
                 :encut 6
                 :ismear 1
                 :sigma 0.01
                 :atoms co))

Here is a toy implementation that illustrates what the functions are, and how we build up the code from the macro.

(defn encut [cutoff]
  "The planewave cutoff energy in eV."
  (assert (integer? cutoff))
  (assert (> cutoff 0))
  (print "encut validated")
  cutoff)

(defn xc [exc]
  "The exchange correlation functional. Should be a string of PBE or LDA."
  (assert (string? exc))
  (assert (in exc ["PBE" "LDA"]))
  (print "exc validated")
  exc)


(defclass Calculator []
  "Toy class representing a calculator."
  (defn __init__ [self wd &kwargs kwargs]
    (setattr self "wd" wd)
    (for [key kwargs]
      (setattr self key (get kwargs key)))))

We tangle that block to calculator.hy so we can reuse it. First we show the traditional syntax.

(import [calculator [*]])

(setv calc (Calculator "some-dir" :encut 400 :xc "PBE"))

(print calc.wd)
(print calc.encut)
(print calc.xc)
some-dir
400
PBE

Note, we can also do this, and get the validation too. It is verbose for my taste, but shows what we need the final code to look like, and incidentally how this would be done in Python too. We just need a macro that expands to this code.

(import [calculator [*]])

(setv calc (Calculator "some-dir" :encut (encut 400) :xc (xc "PBE")))

(print calc.wd)
(print calc.encut)
(print calc.xc)
encut validated
exc validated
some-dir
400
PBE

That is what this macro below does. We build up that code by making a keyword of the function name, and setting it to the value of the form the function is in.

(defmacro vaspm [wd &rest body]
  "Macro to build a Calculator with validation of arguments in BODY"
  (let [code `(Calculator ~wd)]
    (for [form body]
      (.append code (keyword (name (car form))))
      (.append code form))
    code))

Now, lets consider the macro version.

(import [calculator [*]])
(require calculator)

(setv calc (vaspm "some-dir" (encut 400) (xc "PBE")))
(print calc.wd)
(print calc.encut)
(print calc.xc)

;; proof we can get to the encut docstring!
(help encut)
encut validated
exc validated
some-dir
400
PBE
Help on function encut in module calculator:

encut(cutoff)
    The planewave cutoff energy in eV.

Sweet. The macro allows us to simplify our notation to be approximately the same as the original function, but with validation and docstring availability. Here is a variation of the macro that even uses keywords and builds the validation in from the keyword. It is not clear we can access the docstrings so easily here (ok, we can build an eldoc function that works either way, but the function method above is "more native").

(import [calculator [*]])
(require calculator)


(defmacro vasp2 [wd &rest kwargs]
  (let [code `(Calculator ~wd)]
    (for [x (range   0 (len kwargs) 2)]
      (let [kw (nth kwargs x)
            val (nth kwargs (+ 1 x))]
        (.append code kw)
        (.append code `(~(HySymbol (name kw)) ~val))))
    code))

(print (macroexpand '(vasp2 "/tmp" :encut 1 :xc "PBE")))

(setv calc (vasp2 "some-dir"
                  :encut 400
                  :xc "PBE"))
(print calc.wd)
(print calc.encut)
(print calc.xc)
(u'Calculator' u'/tmp' u'\ufdd0:encut' (u'encut' 1L) u'\ufdd0:xc' (u'xc' u'PBE'))
encut validated
exc validated
some-dir
400
PBE

To summarize here, we have looked at some ways to incorporate validation and documentation into kwargs. There are certainly ways to do this in Python, using these auxiliary functions. In fact we use them in hy too. We could build the validation into a Python init function too, using dynamic lookup of the function names, and evaluation of the functions. The macro features of hy give different opportunities for this, and different syntactical sugars to work with. The hy approach leads to less duplication (e.g. only a keyword, not a keyword and a function name that are the same), which will lead to fewer mistakes of the type xc=xd(something). Overall, interesting differences to contemplate.

From a developer point of view there is the burden of writing all the validation functions, but the payoff is access to documentation and optionally, validation. Also, no kwargs that are not allowed will work. Right now, with **kwargs, they might silently fail.

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter