Another approach to docstrings and validation of args and kwargs in Python

Posted April 30, 2016 at 10:22 AM | categories: python | tags:

Updated April 30, 2016 at 10:28 AM

We have been exploring various ways to add documentation and validation to arbitrary arguments that our molecular simulation codes use. In our previous we derived a method where we created functions that provide docstrings, and validate the input. One issue we had was the duplication of keywords and function names. Here we consider an approach that allows this kind of syntax:

calc = Calculator('/tmp',
                  encut(400),
                  xc('PBE'),
                  generic('kpts', [1, 1, 1]))

Those are regular *args, not **kwargs.

Compare this to:

calc = Calculator('/tmp',
                  encut=encut(400),
                  xc=xc('PBE'),
                  kpts=generic('kpts', [1, 1, 1]))

where those are kwargs. The duplication of the keywords is what we aim to eliminate, because 1) they are redundant, 2) why type things twice?

Here we work out an approach with *args that avoids the duplication. We use the same kind of validation functions as before, but we will decorate each one so it returns a tuple of (key, value), where key is based on the function name. This is so we don't have to duplicate the function name ourselves; we let the decorator do it for us. Then, in our Calculator class init function, we use this tuple to assign the values to self.key as the prototypical way to handle the *args. Other setter functions could also be used.

Here is the template for this approach.

def input(func):
    """Input decorator to convert a validation function to input function."""
    def inner(*args, **kwargs):
        res = func.__name__, func(*args, **kwargs)
        print('{} validated to {}'.format(func.__name__, res))
        return res
    # magic incantations to make the decorated function look like the old one.
    inner.__name__ = func.__name__
    inner.__doc__ = func.__doc__
    return inner

@input
def encut(cutoff):
    "Planewave cutoff in eV."
    assert isinstance(cutoff, int) and (cutoff > 0)
    return cutoff

@input
def xc(s):
    """Exchange-correlation functional.

    Should be 'PBE' or 'LDA'

    """
    assert isinstance(s, str)
    assert s in ['PBE', 'LDA']
    return s

def generic(key, val):
    """Generic function with no validation.

    Use this for other key,val inputs not yet written.

    """
    return (key, val)

class Calculator(object):
    def __init__(self, wd, *args):
        """each arg should be of the form (attr, val)."""
        self.wd = wd
        self.args = args
        for attr, val in args:
            setattr(self, attr, val)

    def __str__(self):
        return '\n'.join(['{}'.format(x) for x in self.args])

##################################################################

calc = Calculator('/tmp',
                  encut(400),
                  xc('PBE'),
                  generic('kpts', [1, 1, 1]))

print(calc)

print(help(encut))

encut validated to ('encut', 400)
xc validated to ('xc', 'PBE')
('encut', 400)
('xc', 'PBE')
('kpts', [1, 1, 1])
Help on function encut in module __main__:

encut(*args, **kwargs)
    Planewave cutoff in eV.

None

This approach obviously works. I don't think I like the syntax as much, although in most python editors, it should directly give access to the docstrings of the functions. This is pretty explicit in what is happening, which is an advantage. Compare this to the following approach, which uses our traditional kwarg syntax, with dynamic, but hidden validation.

def encut(cutoff):
    "Planewave cutoff in eV."
    assert isinstance(cutoff, int) and (cutoff > 0)
    return cutoff

def xc(s):
    """Exchange-correlation functional.

    Should be 'PBE' or 'LDA'.

    """
    assert isinstance(s, str), "xc should be a string"
    assert s in ['PBE', 'LDA'], "xc should be 'PBE' or 'LDA'"
    return s


class Calculator(object):
    def __init__(self, wd, **kwargs):
        """each arg should be of the form (attr, val)."""
        self.wd = wd

        for kwarg, val in kwargs.iteritems():
            f = globals().get(kwarg, None)
            if f is not None:
                print('{} evaluated to {}'.format(kwarg, f(val)))
            else:
                print('No validation for {}'.format(kwarg))

            setattr(self, kwarg, val)

##################################################################

calc = Calculator('/tmp',
                  encut=400,
                  xc='PBE',
                  kpts=[1, 1, 1])

print(calc.encut)
help(xc)

xc evaluated to PBE
No validation for kpts
encut evaluated to 400
400
Help on function xc in module __main__:

xc(s)
    Exchange-correlation functional.

    Should be 'PBE' or 'LDA'.

The benefit of this approach is no change in the syntax we are used to. We still get access to docstrings via tools like pydoc. It should not be too hard to get helpful tooltips in Emacs for this, using pydoc to access the docstrings. This might be the winner.

It is up for debate if we should use assert or Exceptions. If anyone uses python with -O the assert statements are ignored. That might not be desirable though. Maybe it would be better to use Exceptions, with a user customizable variable that determines if validation is performed.

org-mode source

Org-mode version = 8.2.10

The Kitchin Research Group

Chemical Engineering at Carnegie Mellon University

Another approach to docstrings and validation of args and kwargs in Python