A framework for automated feedback with Python and org-mode

| categories: emacs, python | tags:

Autolab is an autograding service that automatically grades code assignments. It uses a program to evaluate a program on a secure virtual system. Using this requires you to run a server, and run code from students. I have never liked that because it is hard to sandbox code well enough to prevent malicious code from doing bad things. Autolab does it well, but it is a heavy solution. Here we explore a local version, one that is used to test for correctness, and not for grading. Here, if you are malicious, you reap what you sow…

The basic idea I am working towards is that Emacs will provide content to be learned (through org-mode) with active exercises. The exercises will involve a code block, and the user will run a command on their code (or an advised C-c C-c) that checks the solution for correctness. A user will be able to see the solution, and maybe get hints.

Suppose we have a problem to solve \(e^x = 3\). This is a simple problem to solve, and here is a solution.

from scipy.optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)

print solve()
[ 1.09861229]

We would like to test this for correctness. We code this in a function-based form because we will later use the function solve to test for correctness. Let's see how we could test it with a test function. We will use exec on a string representing our code to get it into our namespace. I don't see a security issue here. You are writing the code! Eventually, we will be passing code to the test framework this way from an org-mode source block.

import unittest
TOLERANCE = 1e-5

s = '''from scipy.optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)[0]

print solve()'''

def test_solve(s):
    exec s in globals()
    if (abs(np.log(3) - solve()) <= TOLERANCE):
        print('Correct')
    else:
        print('incorrect')

test_solve(s)
1.09861228867
Correct

Next, we need to think about how we could generate an import statement from a code block name, import in python, and run a test function. We can assume that the test code will be in a file called "test_%s.py" on your python path. Here are the contents of test_solve.py.

import numpy as np
TOLERANCE = 1e-5

def solve_solution():
    from scipy. optimize import fsolve
    import numpy as np

    def objective(x):
        return np.exp(x) - 3

    return fsolve(objective, 3)[0]


def test_solve(s):
    exec s in globals()
    if (abs(solve_solution() - solve()) <= TOLERANCE):
        print('Correct!')
    else:
        print('Incorrect')

Now, we can import that, and use the functions. Here is the Python script we need to run to test it.

import test_solve
test_solve.test_solve('''
from scipy. optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)[0]

print solve()''')
1.09861228867
Correct!

Now, an elisp block to do that. One way to do this is to just run a shell command passing the string to a python interpreter. This is a short way away from an Emacs command now.

(let* ((string "import test_solve
test_solve.test_solve('''
from scipy. optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)[0]

print solve()''')"))
  (shell-command-to-string (format "python -c \"%s\"" string)))
1.09861228867
Correct!

Ok, now to wrap it all up in a function we can run from Emacs in a code block to test it. With the cursor in a code block, we get the name, and build the python code, and run it. The function is more complex than I anticipated because I end up running the code block essentially twice, once to get a results block and once to get the test results. For short problems this is not an issue. I also add the test results in a way that is compatible with the current results.

(defun check ()
  (interactive)
  (let* ((src-block (org-element-context))
         (name (org-element-property :name src-block))
         (code (org-element-property :value src-block))
         (end (org-element-property :end src-block))
         (results)
         (template (format "import test_%s
test_%s.test_%s('''%s''')" name name name code))
         (output (format
                  "\n%s\n"
                  (s-join
                   "\n"
                   (mapcar
                    (lambda (s)
                      (if (s-starts-with? ":" s)
                          s
                        (concat ": " s)))
                    (s-split
                     "\n"
                     (shell-command-to-string
                      (format "python -c \"%s\"" template))))))))
    ;; execute block as normal
    (org-babel-execute-src-block)
    ;; and add some output to the Results block
    (if (org-babel-where-is-src-block-result)
        (progn
          (goto-char (org-babel-where-is-src-block-result))
          (setq results (org-element-context))
          ;; delete results line
          (kill-line)
          ;;  delete the results
          (setf (buffer-substring (org-element-property :begin results)
                                  (org-element-property :post-affiliated results))
                "")
          ;; paste results line back
          (yank)
          ;; and the output from your code
          (insert output))
      (message "%s" output))))
check

Now, we use a named src-block so we can call M-x check in it, and check the answer.

from scipy.optimize import fsolve
import numpy as np

def objective(x):
    return np.exp(x) - 3

def solve():
    return fsolve(objective, 3)

print solve()
[ 1.09861229]
Correct!

I would like to be able to provide a solution function that would show a user my solution they were tested against. Python provides the inspect module that can do this. Here is how we get the code in Python.

import inspect
import test_solve

print inspect.getsource(test_solve.solve_solution)
def solve_solution():
    from scipy. optimize import fsolve
    import numpy as np

    def objective(x):
        return np.exp(x) - 3

    return fsolve(objective, 3)[0]

This makes it easy to wrap up a function in emacs that will show this from at src block. We just get the block name, and build the python code and execute it here.

(defun show-solution ()
  (interactive)
  (let* ((src-block (org-element-context))
         (name (org-element-property :name src-block))
         (template (format  "import inspect
import test_%s

print inspect.getsource(test_%s.%s_solution)" name name name)))
    (switch-to-buffer-other-window (get-buffer-create "solution"))
    (erase-buffer)
    (insert (shell-command-to-string
             (format "python -c \"%s\"" template)))
    (python-mode)))
show-solution

That summarizes the main features. It allows me to write a test module that has some name conventions to define a solution function, and a test function. Emacs can generate some boilerplate code for different problem names, and run the test to give the user some feedback. Most of the code in this post would not be directly visible to a user, it would be buried in a python module somewhere on the path, and in elisp files providing the glue. I am not sure how much obfuscation you can put in the python files, e.g. just providing byte-compiled code, so it is less easy to just read it. That is not as big a deal when it is just a study guide/feedback system.

From an authoring point of view, this seems pretty good to me. It is feasible I think to write an org-source document like this with tangling for the test modules, and an export to org that does not have the solutions in it. The only subtle point might be needing to alter Python paths to find the test modules if they aren't installed via something like pip.

I think this is pretty flexible, and could handle problems that take arguments, e.g. write a function that sorts a list. Here is a simple example of that. First we write the test_sort.py file with a solution, and some tests.

def sort_solution(LIST):
    return LIST.sort()

def test_sort(s):
    exec s in globals()
    if sort([3, 4, 2]) == [2, 3, 4]:
        print('passed test 1')
    if sort(['z', 'b']) == ['b', 'z']:
        print('passed test 2')
def sort(LIST):
    s = sorted(LIST)
    return s
passed test 1
passed test 2

Maybe it would make sense to use unittests, or nose or some other testing framework if it makes writing the tests easier. Another day.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter