A framework for automated feedback with Python and org-mode

| categories: python, emacs | tags:

Autolab is an autograding service that automatically grades code assignments. It uses a program to evaluate a program on a secure virtual system. Using this requires you to run a server, and run code from students. I have never liked that because it is hard to sandbox code well enough to prevent malicious code from doing bad things. Autolab does it well, but it is a heavy solution. Here we explore a local version, one that is used to test for correctness, and not for grading. Here, if you are malicious, you reap what you sow…

The basic idea I am working towards is that Emacs will provide content to be learned (through org-mode) with active exercises. The exercises will involve a code block, and the user will run a command on their code (or an advised C-c C-c) that checks the solution for correctness. A user will be able to see the solution, and maybe get hints.

Suppose we have a problem to solve \(e^x = 3\). This is a simple problem to solve, and here is a solution.

from scipy.optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)

print solve()
[ 1.09861229]

We would like to test this for correctness. We code this in a function-based form because we will later use the function solve to test for correctness. Let's see how we could test it with a test function. We will use exec on a string representing our code to get it into our namespace. I don't see a security issue here. You are writing the code! Eventually, we will be passing code to the test framework this way from an org-mode source block.

import unittest
TOLERANCE = 1e-5

s = '''from scipy.optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)[0]

print solve()'''

def test_solve(s):
    exec s in globals()
    if (abs(np.log(3) - solve()) <= TOLERANCE):
        print('Correct')
    else:
        print('incorrect')

test_solve(s)
1.09861228867
Correct

Next, we need to think about how we could generate an import statement from a code block name, import in python, and run a test function. We can assume that the test code will be in a file called "test_%s.py" on your python path. Here are the contents of test_solve.py.

import numpy as np
TOLERANCE = 1e-5

def solve_solution():
    from scipy. optimize import fsolve
    import numpy as np

    def objective(x):
        return np.exp(x) - 3

    return fsolve(objective, 3)[0]


def test_solve(s):
    exec s in globals()
    if (abs(solve_solution() - solve()) <= TOLERANCE):
        print('Correct!')
    else:
        print('Incorrect')

Now, we can import that, and use the functions. Here is the Python script we need to run to test it.

import test_solve
test_solve.test_solve('''
from scipy. optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)[0]

print solve()''')
1.09861228867
Correct!

Now, an elisp block to do that. One way to do this is to just run a shell command passing the string to a python interpreter. This is a short way away from an Emacs command now.

(let* ((string "import test_solve
test_solve.test_solve('''
from scipy. optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)[0]

print solve()''')"))
  (shell-command-to-string (format "python -c \"%s\"" string)))
1.09861228867
Correct!

Ok, now to wrap it all up in a function we can run from Emacs in a code block to test it. With the cursor in a code block, we get the name, and build the python code, and run it. The function is more complex than I anticipated because I end up running the code block essentially twice, once to get a results block and once to get the test results. For short problems this is not an issue. I also add the test results in a way that is compatible with the current results.

(defun check ()
  (interactive)
  (let* ((src-block (org-element-context))
         (name (org-element-property :name src-block))
         (code (org-element-property :value src-block))
         (end (org-element-property :end src-block))
         (results)
         (template (format "import test_%s
test_%s.test_%s('''%s''')" name name name code))
         (output (format
                  "\n%s\n"
                  (s-join
                   "\n"
                   (mapcar
                    (lambda (s)
                      (if (s-starts-with? ":" s)
                          s
                        (concat ": " s)))
                    (s-split
                     "\n"
                     (shell-command-to-string
                      (format "python -c \"%s\"" template))))))))
    ;; execute block as normal
    (org-babel-execute-src-block)
    ;; and add some output to the Results block
    (if (org-babel-where-is-src-block-result)
        (progn
          (goto-char (org-babel-where-is-src-block-result))
          (setq results (org-element-context))
          ;; delete results line
          (kill-line)
          ;;  delete the results
          (setf (buffer-substring (org-element-property :begin results)
                                  (org-element-property :post-affiliated results))
                "")
          ;; paste results line back
          (yank)
          ;; and the output from your code
          (insert output))
      (message "%s" output))))
check

Now, we use a named src-block so we can call M-x check in it, and check the answer.

from scipy.optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)

print solve()
[ 1.09861229]
Correct!

I would like to be able to provide a solution function that would show a user my solution they were tested against. Python provides the inspect module that can do this. Here is how we get the code in Python.

import inspect
import test_solve

print inspect.getsource(test_solve.solve_solution)
def solve_solution():
    from scipy. optimize import fsolve
    import numpy as np

    def objective(x):
        return np.exp(x) - 3

    return fsolve(objective, 3)[0]

This makes it easy to wrap up a function in emacs that will show this from at src block. We just get the block name, and build the python code and execute it here.

(defun show-solution ()
  (interactive)
  (let* ((src-block (org-element-context))
         (name (org-element-property :name src-block))
         (template (format  "import inspect
import test_%s

print inspect.getsource(test_%s.%s_solution)" name name name)))
    (switch-to-buffer-other-window (get-buffer-create "solution"))
    (erase-buffer)
    (insert (shell-command-to-string
             (format "python -c \"%s\"" template)))
    (python-mode)))
show-solution

That summarizes the main features. It allows me to write a test module that has some name conventions to define a solution function, and a test function. Emacs can generate some boilerplate code for different problem names, and run the test to give the user some feedback. Most of the code in this post would not be directly visible to a user, it would be buried in a python module somewhere on the path, and in elisp files providing the glue. I am not sure how much obfuscation you can put in the python files, e.g. just providing byte-compiled code, so it is less easy to just read it. That is not as big a deal when it is just a study guide/feedback system.

From an authoring point of view, this seems pretty good to me. It is feasible I think to write an org-source document like this with tangling for the test modules, and an export to org that does not have the solutions in it. The only subtle point might be needing to alter Python paths to find the test modules if they aren't installed via something like pip.

I think this is pretty flexible, and could handle problems that take arguments, e.g. write a function that sorts a list. Here is a simple example of that. First we write the test_sort.py file with a solution, and some tests.

def sort_solution(LIST):
    return LIST.sort()

def test_sort(s):
    exec s in globals()
    if sort([3, 4, 2]) == [2, 3, 4]:
        print('passed test 1')
    if sort(['z', 'b']) == ['b', 'z']:
        print('passed test 2')
def sort(LIST):
    s = sorted(LIST)
    return s
passed test 1
passed test 2

Maybe it would make sense to use unittests, or nose or some other testing framework if it makes writing the tests easier. Another day.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Pyparsing meets Emacs to find chemical formulas

| categories: emacs, python | tags:

see the video: https://www.youtube.com/watch?v=sjxS9m8QCoo

Today we expand the concepts of clickable text and merge an idea from Python with Emacs. Here we will use Python to find chemical formulas in the buffer, and then highlight them with Emacs. We will use pyparsing to find the chemical formulas and then use them to create a pattern for button-lock. I chose this approach because regular expressions are hard to use on the most general kinds of chemical formulas, and a (possibly recursive) parser should be better equipped to handle this. I adapted an example grammar to match simple chemical formulas, i.e. ones that do not have any parentheses, or charges different than + or -. I think something like this could be done in Emacs, but I am not as familiar with this kind of parsing in Emacs.

Basically, we treat a formula as a group of one or more Elements that have an optional number following them. Spoiler alert: This mostly works, but in the end I conclude there is a clear benefit to a markup language for chemical formulas. Here is an example usage of a parser:

# adapted from [[https://pyparsing.wikispaces.com/file/view/chemicalFormulas.py/31041705/chemicalFormulas.py]]

from pyparsing import *

element = oneOf( """H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No""" )

integer = Word(nums)
elementRef = Group(element + Optional(integer))
chemicalFormula = (WordStart(alphas.upper())
                   + OneOrMore(elementRef).leaveWhitespace()
                   + Optional(Or([Literal("-"),
                                  Literal("+")]))
                   + WordEnd(alphas + nums + "-+"))


s = '''Water is  H2O or OH2  not h2O, methane is CH4 and of course there is PtCl4.
What about H+ and OH-? and carbon or Carbon or H2SO4?

Is this C6H6? or C2H5OH?

and a lot of elements:
H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No'''

matches = []
for match, start, stop in chemicalFormula.scanString(s):
   matches.append(s[start:stop])

print sorted(matches, key=lambda x: len(x), reverse=True)
['C2H5OH', 'PtCl4', 'H2SO4', 'C6H6', 'H2O', 'OH2', 'CH4', 'OH-', 'Uub', 'Uut', 'Uuq', 'Uup', 'Uuh', 'Uus', 'Uuo', 'H+', 'He', 'Li', 'Be', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'Cl', 'Ar', 'Ca', 'Sc', 'Ti', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'Xe', 'Cs', 'Ba', 'Lu', 'Hf', 'Ta', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Lr', 'Rf', 'Db', 'Sg', 'Bh', 'Hs', 'Mt', 'Ds', 'Rg', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Ac', 'Th', 'Pa', 'Np', 'Pu', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'O', 'H', 'B', 'C', 'N', 'O', 'F', 'P', 'S', 'K', 'V', 'Y', 'I', 'W', 'U']

That is pretty good. If the string was actually our buffer, we could use those to create a regexp to put text-properties on them. The trick is how to get the buffer string to the Python function, and then get back usable information in lisp. We actually explored this before ! Rather than use that, we will just create the lisp output manually since this is a simple list of strings.

The first thing we should do is work out a Python script that will output the lisp results we want, which are the found formulas (I tried getting the start and stop positions, but I don't think they map onto the buffer positions very well). Here it is. We set it up as a command line tool that takes a string. We use set to get a unique list, then sort the list by length so we try matching the longest patterns first. There are a few subtle differences in this script and the example above because of some odd false hits I unsuccessfully tried to get rid of.

import sys
from pyparsing import *

element_string =  """H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No"""
element = oneOf([x for x in element_string.split()])

integer = Word(nums)
elementRef = Group(element + Optional(integer))
chemicalFormula = (WordStart(alphas.upper()).leaveWhitespace()
                   + OneOrMore(elementRef).leaveWhitespace()
                   + Optional(Or([Literal("-"),
                                  Literal("+")])).leaveWhitespace()
                   + WordEnd(alphas + alphas.lower() + nums + "-+").leaveWhitespace())

s = sys.stdin.read().strip()

matches = []
for match, start, stop in chemicalFormula.scanString(s):
   matches.append(s[start:stop])
matches = list(set(matches))
matches.sort(key=lambda x: len(x), reverse=True)

print "'(" + ' '.join(["\"{}\"".format(m) for m in matches]) + ')'

Now we can test this:

echo "Water is H2O, methane is CH4 and of course PtCl4, what about H+ and OH-? and carbon or Carbon. Water is H2O not h2o or mH2o, methane is CH4 and of course PtCl4, what about H+ and OH-? carbon, Carbon and SRC, or H2SO4? Is this C6H6? Ethanol is C2H5OH in a sentence.

 C2H5OH firs con

This is CH3OH

H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No
" | ./parse_chemical_formulas.py
'("C2H5OH" "CH3OH" "PtCl4" "H2SO4" "C6H6" "CH4" "OH-" "Uub" "Uuq" "Uup" "Uus" "Uuo" "Uuh" "H2O" "Uut" "Ru" "Re" "Rf" "Rg" "Ra" "Rb" "Rn" "Rh" "Be" "Ba" "Bh" "Bi" "Bk" "Br" "Ho" "Os" "Es" "Hg" "Ge" "Gd" "Ga" "Pr" "Pt" "Pu" "Pb" "Pa" "Pd" "Cd" "Po" "Pm" "Hs" "Hf" "He" "Md" "Mg" "Mo" "Mn" "Mt" "Zn" "H+" "Eu" "Zr" "Er" "Ni" "No" "Na" "Nb" "Nd" "Ne" "Np" "Fr" "Fe" "Fm" "Sr" "Kr" "Si" "Sn" "Sm" "Sc" "Sb" "Sg" "Se" "Co" "Cm" "Cl" "Ca" "Cf" "Ce" "Xe" "Tm" "Cs" "Cr" "Cu" "La" "Li" "Tl" "Lu" "Lr" "Th" "Ti" "Te" "Tb" "Tc" "Ta" "Yb" "Db" "Dy" "Ds" "Ac" "Ag" "Ir" "Am" "Al" "As" "Ar" "Au" "At" "In" "H" "P" "C" "K" "O" "S" "W" "B" "F" "N" "V" "I" "U" "Y")

That seems to work great. Now, we have a list of chemical formulas. Now, the Emacs side to call that function. We do not use regexp-opt here because I found it optimizes too much, and doesn't always match the formulas. We want explicit matches on each formula.

(defun shell-command-on-region-to-string (start end command)
  (with-output-to-string
    (shell-command-on-region start end command standard-output)))

(read (shell-command-on-region-to-string
        (point-min) (point-max)
        "./parse_chemical_formulas.py"))
quote (C2H5OH ext; t CH3OH PtCl4 H2SO4 the fir C6H6 CH4 OH- OH2 Uub co Uuq Uup Uus Uuo Uuh ord H2O Uut Ru Re Rf Rg Ra Rb Rn Rh Be Ba Bh Bi Bk Br Ho Os Es Hg Ge Gd Ga Pr t Pt Pu Pb Pa Pd Cd Po Pm Hs Hf He Md Mg Mo Mn Mt Zn H+ Eu Zr Er Ni No Na Nb Nd Ne Np Fr Fe Fm Sr Kr Si Sn Sm Sc Sb Sg Se Co Cm Cl Ca Cf Ce Xe Tm Cs Cr Cu La Li Tl Lu Lr Th Ti Te Tb Tc as Ta Yb Db Dy Ds In Ac Ag Ir Am Al As Ar Au At n H P l t C r K O S W w B F N V I U Y e i)

That is certainly less than perfect, you can see a few false hits that are not too easy to understand, e.g. why is "fir" or "the " or "as" in the list? They don't even start with an uppercase letter. One day maybe I will figure it out. I assume it is a logic flaw in my parser. Until then, let's go ahead and make the text functional, so it looks up the formula in the NIST webbook. The regexp is a little funny, we have to add word-boundaries to each formula to avoid some funny, bad matches.

(defvar chemical-formula-button nil "store button for removal later.")

(require 'nist-webbook)
(setq chemical-formula-button
      (button-lock-set-button
       (mapconcat
        (lambda (formula)
          (concat "\\<" (regexp-quote formula) "\\>"))
        (eval (read (shell-command-on-region-to-string
                     (point-min) (point-max)
                     "./parse_chemical_formulas.py")))
        "\\|")
       (lambda () (interactive)
         (nist-webbook-formula
          (get-surrounding-text-with-property
           'chemical-formula)))
       :face '((:underline t) (:background "gray80"))
       :help-echo "A chemical formula"
       :additional-property 'chemical-formula))

Here are a few tests: CH4, C2H5OH, C6H6. C(CH3)4. C6H6 is benzene. As you can see our pattern lacks context; the first word of the sentence is "as" not the symbol for arsenic. Also, our parser does not consider formulas with parentheses in them. Whenever I refer to myself, I mean myself, and not the element iodine. There are a few weird matchs I just don't understand, like firs d t x rn lac? These do not seem to match anything, and I wonder how they are getting in the list. I think this really shows that it would be useful to use some light markup for chemical formulas which would a) provide context, and b) enhance parsing accuracy. In LaTeX you would use \ce{I} to indicate that is iodine, and not a reference to myself. That is more clear than saying I use I in chemical reactions ;) And it also clarifies sentences like the letter W is used to represent tungsten as the symbol \ce{W}.

Nevertheless, we can click on the formulas, and get something to happen that is potentially useful. Is this actually useful? Conceptually yes, I think it could be, but clearly the parsing is not recognizing formulas perfectly. Sending the buffer to a dedicated program that can return a list of matches to highlight in Emacs is a good idea, especially if it is not easy to build in Emacs, or if a proven solution already exists.

Finally, we can remove the highlighted text like this. That was the reason for saving the button earlier!

(when chemical-formula-button
  (button-lock-unset-button chemical-formula-button)
  (setq chemical-formula-button nil))

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Serializing an Atoms object in xml

| categories: xml, ase, python | tags:

I have a future need to serialize an Atoms object from ase as XML. I would use json usually, but I want to use a program that will index xml. I have previously used pyxser for this, but I recall it being difficult to install, and it does not pip install on my Mac. So, here we look at xmlwitch which does pip install ;). This package does some serious magic with context managers.

One thing I am not sure about here is the best way to represent numbers and lists/arrays. I am using repr here, and assuming you would want to read this back in to Python where this could simply be eval'ed. Some alternatives would be to convert them to lists, or save them as arrays of xml elements.

from ase.data.g2 import data
from ase.structure import molecule
import xmlwitch

atoms = molecule('H2O')

def serialize_atoms(atoms):
    'Return an xml string of an ATOMS object.'
    xml = xmlwitch.Builder(version='1.0', encoding='utf-8')

    with xml.atoms():
        for atom in atoms:
            with xml.atom(index=repr(atom.index)):
                xml.symbol(atom.symbol)
                xml.position(repr(atom.position))
                xml.magmom(repr(atom.magmom))
                xml.mass(repr(atom.mass))
                xml.momentum(repr(atom.momentum))
                xml.number(repr(atom.number))
        xml.cell(repr(atoms.cell))
        xml.pbc(repr(atoms.pbc))
    return xml

atoms_xml = serialize_atoms(atoms)
print atoms_xml

with open('atoms.xml', 'w') as f:
    f.write(str(atoms_xml))
<?xml version="1.0" encoding="utf-8"?>
<atoms>
  <atom index="0">
    <symbol>O</symbol>
    <position>array([ 0.      ,  0.      ,  0.119262])</position>
    <magmom>0.0</magmom>
    <mass>15.9994</mass>
    <momentum>array([ 0.,  0.,  0.])</momentum>
    <number>8</number>
  </atom>
  <atom index="1">
    <symbol>H</symbol>
    <position>array([ 0.      ,  0.763239, -0.477047])</position>
    <magmom>0.0</magmom>
    <mass>1.0079400000000001</mass>
    <momentum>array([ 0.,  0.,  0.])</momentum>
    <number>1</number>
  </atom>
  <atom index="2">
    <symbol>H</symbol>
    <position>array([ 0.      , -0.763239, -0.477047])</position>
    <magmom>0.0</magmom>
    <mass>1.0079400000000001</mass>
    <momentum>array([ 0.,  0.,  0.])</momentum>
    <number>1</number>
  </atom>
  <cell>array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])</cell>
  <pbc>array([False, False, False], dtype=bool)</pbc>
</atoms>

Now, we can try reading that file. I am going to use emacs-lisp here for fun, and compute the formula.

(let* ((xml (car (xml-parse-file "atoms.xml")))
       (atoms (xml-get-children xml 'atom))
       (symbol-elements (mapcar (lambda (atom)
                                  (car (xml-get-children atom 'symbol)))
                                atoms))
       (symbols (mapcar (lambda (x)
                          (car (xml-node-children x)))
                        symbol-elements)))
  (mapconcat (lambda (c)
               (format "%s%s" (car c)
                       (if (= 1 (cdr c))
                           ""
                         (cdr c))))
             (loop for sym in (-uniq symbols)
                   collect (cons
                            sym
                            (-count (lambda (x) (string= x sym)) symbols)))
             ""))
OH2

Here is a (misleadingly) concise way to do this in Python. It is so short thanks to there being a Counter that does what we want, and some pretty nice list comprehension!

import xml.etree.ElementTree as ET
from collections import Counter
with open('atoms.xml') as f:
    xml = ET.fromstring(f.read())

counts = Counter([el.text for el in xml.findall('atom/symbol')])

print ''.join(['{0}{1}'.format(a,b) if b>1 else a for a,b in counts.iteritems()])
H2O

And finally a test on reading a unit cell.

import xml.etree.ElementTree as ET
from numpy import array

with open('atoms.xml') as f:
    xml = ET.fromstring(f.read())

print eval(xml.find('cell').text)
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]

That seems to work but, yeah, you won't want to read untrusted xml with that! See http://stupidpythonideas.blogspot.com/2013/11/repr-eval-bad-idea.html . It might be better (although not necessarily more secure) to use pickle or some other serialization strategy for this.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

A python version of the s-exp bibtex entry

| categories: bibtex, ref, python | tags:

In this post we explored representing a bibtex entry in lisp s-exp notation, and showed interesting things that enables. Here, I explore something similar in Python. The s-exp notation in Python is really more like tuples. It looks almost identical, except we need a lot of commas for the Python syntax. One significant difference in Python is we need to define the functions in advance because otherwise the function symbols are undefined. Similar to lisp, we can define the field functions at run-time in a loop. We have to use an eval statement, which some Pythonistas find distasteful, but it is not that different to me than what we did in lisp.

The syntax for "executing" the data structure is quite different than in lisp, because this data is not code in Python. Instead, we have to deconstruct the data, knowing that the function is the first object, and it takes the remaining arguments in the tuple.

Here is the proof of concept:

def article(bibtex_key, *args):
    "Return the bibtex formatted entry"
    return ',\n'.join(['@article{{{0}}}'.format(bibtex_key)] +[arg[0](arg[1]) for arg in args[0]] + ['}'])

fields = ("author", "title", "journal", "pages", "number", "doi", "url", "eprint", "year")

for f in fields:
    locals()[f] = eval ('lambda x: "  {0} = {{{1}}}".format("' + f + '", x)')

entry = (article, "hallenbeck-2013-effec-o2",
         (author, "Hallenbeck, Alexander P. and Kitchin, John R."),
         (title, "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent"),
         (journal, "Industrial \& Engineering Chemistry Research"),
         (pages, "10788-10794"),
         (year, 2013),
         (number, 31),
         (doi, "10.1021/ie400582a"),
         (url, "http://pubs.acs.org/doi/abs/10.1021/ie400582a"),
         (eprint, "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))


print entry[0](entry[1], entry[2:])
@article{hallenbeck-2013-effec-o2},
  author = {Hallenbeck, Alexander P. and Kitchin, John R.},
  title = {Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent},
  journal = {Industrial \& Engineering Chemistry Research},
  pages = {10788-10794},
  year = {2013},
  number = {31},
  doi = {10.1021/ie400582a},
  url = {http://pubs.acs.org/doi/abs/10.1021/ie400582a},
  eprint = {http://pubs.acs.org/doi/pdf/10.1021/ie400582a},
}

We can still get specific fields out. Since we used a tuple here, it is not quite as nice as using a dictionary, but it is neither too bad, and it can be wrapped in a reasonably convenient function.

def article(bibtex_key, *args):
    "Return the bibtex formatted entry"
    return ',\n'.join(['@article{{{0}}}'.format(bibtex_key)] +[arg[0](arg[1]) for arg in args[0]] + ['}'])

fields = ("author", "title", "journal", "pages", "number", "doi", "url", "eprint", "year")

for f in fields:
    locals()[f] = eval ('lambda x: "  {0} = {{{1}}}".format("' + f + '", x)')

entry = (article, "hallenbeck-2013-effec-o2",
         (author, "Hallenbeck, Alexander P. and Kitchin, John R."),
         (title, "Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent"),
         (journal, "Industrial \& Engineering Chemistry Research"),
         (pages, "10788-10794"),
         (year, 2013),
         (number, 31),
         (doi, "10.1021/ie400582a"),
         (url, "http://pubs.acs.org/doi/abs/10.1021/ie400582a"),
         (eprint, "http://pubs.acs.org/doi/pdf/10.1021/ie400582a"))


for field in entry[2:]:
    if field[0] == author:
        print field

def get_field(entry, field):
    for element in entry[2:]:
        if element[0] == field:
            return element[1]
    else:
        return None

print get_field(entry, title)
print get_field(entry, "bad")
(<function <lambda> at 0x1005975f0>, 'Hallenbeck, Alexander P. and Kitchin, John R.')
Effects of \ce{O_2} and \ce{SO_2} on the capture capacity of a primary-amine based polymeric \ce{CO_2} sorbent
None

So, it seems Python can do some things like lisp in treating functions like first-class objects that can be used as functions, or keys. I still like the lisp s-exp better, but this is an interesting idea for Python too.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Python data structures to lisp

| categories: lisp, emacs, python | tags:

I have an idea in mind that would use the output of python scripts in lisp functions. Xah Lee posted an idea for writing emacs commands in scripting languages . In this post I want to explore an extension of the idea, where a Python script will return output that can be read in Lisp, e.g. we can convert a Python list to a lisp list, or a dictionary to an a-list or p-list. I can already see that simple data structures will be "simple", and arbitrary data structures will offer a lot of challenges, e.g. nested lists or dictionaries…

If I could add some custom functions to the basic builtin types in Python, then I could use another approach to format python objects as lisp data types. This isn't recommended by Pythonistas, but I guess they don't want to use lisp as much as I do ;) I found this approach to modifying builtins:

http://stackoverflow.com/questions/2444680/how-do-i-add-my-own-custom-attributes-to-existing-built-in-python-types-like-a

We use that almost verbatim here to get what I want. This is a super low level way to add functions to the builtins. I add some simple formatting to floats, ints and strings. I add a more complex recursive formatting function to lists, tuples and dictionaries. A dictionary can be represented as an alist or plist. Both examples are shown, but I leave the alist version commented out. Finally, we add a lispify function to numpy arrays.

import ctypes as c

class PyObject_HEAD(c.Structure):
    _fields_ = [('HEAD', c.c_ubyte * (object.__basicsize__ -
                                      c.sizeof(c.c_void_p))),
                ('ob_type', c.c_void_p)]

_get_dict = c.pythonapi._PyObject_GetDictPtr
_get_dict.restype = c.POINTER(c.py_object)
_get_dict.argtypes = [c.py_object]

def get_dict(object):
    return _get_dict(object).contents.value

get_dict(str)['lisp'] = lambda s:'"{}"'.format(str(s))
get_dict(float)['lisp'] = lambda f:'{}'.format(str(f))
get_dict(int)['lisp'] = lambda f:'{}'.format(str(f))

import collections
import numpy as np

def lispify(L):
    "Convert a Python object L to a lisp representation."
    if (isinstance(L, str)
        or isinstance(L, float)
        or isinstance(L, int)):
        return L.lisp()
    elif (isinstance(L, list)
          or isinstance(L, tuple)
          or isinstance(L, np.ndarray)):
        s = []
        for element in L:
            s += [element.lisp()]
        return '(' + ' '.join(s) + ')'
    elif isinstance(L, dict):
        s = []
        for key in L:
            # alist format
            # s += ["({0} . {1})".format(key, L[key].lisp())]
            # plist
            s += [":{0} {1}".format(key, L[key].lisp())]
        return '(' + ' '.join(s) + ')'

get_dict(list)['lisp'] = lispify
get_dict(tuple)['lisp'] = lispify
get_dict(dict)['lisp'] = lispify
get_dict(np.ndarray)['lisp'] = lispify

Let us test these out.

from pylisp import *
a = 4.5
print int(a).lisp()
print a.lisp()
print "test".lisp()

print [1, 2, 3].lisp()
print (1, 2, 3).lisp()

print [[1, 3], (5, 6)].lisp()

print {"a": 5}.lisp()
print [[1, 3], (5, 6), {"a": 5, "b": "test"}].lisp()


A = np.array([1, 3, 4])
print A.lisp()
print ({"tree": [5, 6]}, ["a", 4, "list"], 5, 2.0 / 3.0).lisp()
4
4.5
"test"
(1 2 3)
(1 2 3)
((1 3) (5 6))
(:a 5)
((1 3) (5 6) (:a 5 :b "test"))
(1 3 4)
((:tree (5 6)) ("a" 4 "list") 5 0.666666666667)

Now, is that better than a single lisp function with a lot of conditionals to handle each type? I am not sure. This seems to work pretty well.

Here is how I imagine using this idea. We would have some emacs-lisp variables and use them to dynamically generate a python script. We run the python script, capturing the output, and read it back in as a lisp data structure. Here is a simple kind of example that generates a dictionary.

(let* ((elisp-var 6)
       (result)
      (script (format "
from pylisp import *
print {x: [2*y for y in range(x)] for x in range(1, %s)}.lisp()
" elisp-var)))

  ;; start a python process
  (run-python)
  (setq result (read (python-shell-send-string-no-output
   script)))
  (plist-get result :5))
(0 2 4 6 8)

That seems to work pretty well. One alternative idea to this is Pymacs , which I have written about before . This project isn't currently under active development, and I ran into some difficulties with it before.

Here we can solve the problem I previously posed and get the result back as an elisp float, and then reuse the result

(let* ((myvar 3)
       (script (format "from pylisp import *
from scipy.optimize import fsolve
def objective(x):
    return x - 5

ans, = fsolve(objective, %s)
print ans.lisp()" myvar)))
  (run-python)
  (setq result (read (python-shell-send-string-no-output
                       script)))
  (- 5 result))
0.0

Bottom line: we can write python code in lisp functions that are dynamically updated, execute them, and get lisp data structures back for simple data types. I think that could be useful in some applications, where it is easier to do parsing/analysis in Python, but you want to do something else that is easier in Lisp.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter
« Previous Page -- Next Page »