Org-mode and ipython enhancements in scimax

| categories: | tags: | View Comments

We have made some improvements to using Ipython in org-mode in the past including:

Today I will talk about a few new features and improvements I have introduced to scimax for using org-mode and Ipython together.

The video for this post might be more obvious than the post:

1 Some convenience functions

There are a few nice shortcuts in the Jupyter notebook. Now we have some convenient commands in scimax to mimic those. My favorites are adding cells above or below the current cell. You can insert a new src block above the current one with (M-x org-babel-insert-block). You can use a prefix arg to insert it below the current block.

# code

# below

# some code


I am particularly fond of splitting a large block into two smaller blocks. Use (M-x org-babel-split-src-block) to do that and leave the point in the upper block. Use a prefix arg to leave the point in the lower block.

# lots of code in large block

# Even more code

# The end of the long block


You can execute all the blocks up to the current point with (M-x org-babel-execute-to-point).

2 ob-ipython-inspect works

In the original ob-ipython I found that ob-ipython-inspect did not work unless you were in special edit mode. That is too inconvenient. I modified a few functions to work directly from the org-buffer. I bind this to M-. in org-mode.

%matplotlib inline
import numpy as np

import matplotlib.pyplot as plt

# Compute areas and colors
N = 150
r = 2 * np.random.rand(N)
theta = 2 * np.pi * np.random.rand(N)
area = 200 * r**2
colors = theta

ax = plt.subplot(111, projection='polar')
c = ax.scatter(theta, r, c=colors, s=area, cmap='hsv', alpha=0.75)


<matplotlib.figure.Figure at 0x114ded710>

3 Getting selective output from Ipython

Out of the box Ipython returns a lot of results. This block, for example returns a plain text, image and latex result as output.

from sympy import *
# commenting out init_printing() results in no output
init_printing()

var('x y')
x**2 + y


2 x + y

We can select which one we want with a new header argument :ob-ipython-results. For this block you can give it the value of text/plain, text/latex or image/png.

var('x y')
x**2 + y


2 x + y

Or to get the image:

var('x y')
x**2 + y


This shows up with pandas too. This block creates a table of data and then shows the first 5 rows. Ipython returns both plain text and html here.

import pandas as pd
import numpy as np
import datetime as dt

def makeSim(nHosps, nPatients):
df = pd.DataFrame()
df['patientid'] = range(nPatients)
df['hospid'] = np.random.randint(0, nHosps, nPatients)
df['sex'] = np.random.randint(0, 2, nPatients)
df['age'] = np.random.normal(65,18, nPatients)
df['race'] = np.random.randint(0, 4, nPatients)
df['cptCode'] = np.random.randint(1, 100, nPatients)
df['rdm30d'] = np.random.uniform(0, 1, nPatients) < 0.1
df['mort30d'] = np.random.uniform(0, 1, nPatients) < 0.2
df['los'] = np.random.normal(8, 2, nPatients)
return df

discharges = makeSim(50, 10000)


patientid hospid sex age race cptCode rdm30d mort30d los 0 0 10 1 64.311947 0 8 False False 8.036793 1 1 6 0 82.951484 1 73 True False 7.996024 2 2 27 1 53.064501 3 95 False False 9.015144 3 3 37 0 64.799128 0 93 False False 10.099032 4 4 46 0 99.111394 2 25 False False 11.711427

patientid hospid sex age race cptCode rdm30d mort30d los
0 0 10 1 64.311947 0 8 False False 8.036793
1 1 6 0 82.951484 1 73 True False 7.996024
2 2 27 1 53.064501 3 95 False False 9.015144
3 3 37 0 64.799128 0 93 False False 10.099032
4 4 46 0 99.111394 2 25 False False 11.711427

We can use the header to select only the plain text output!

import pandas as pd
import numpy as np
import datetime as dt

def makeSim(nHosps, nPatients):
df = pd.DataFrame()
df['patientid'] = range(nPatients)
df['hospid'] = np.random.randint(0, nHosps, nPatients)
df['sex'] = np.random.randint(0, 2, nPatients)
df['age'] = np.random.normal(65,18, nPatients)
df['race'] = np.random.randint(0, 4, nPatients)
df['cptCode'] = np.random.randint(1, 100, nPatients)
df['rdm30d'] = np.random.uniform(0, 1, nPatients) < 0.1
df['mort30d'] = np.random.uniform(0, 1, nPatients) < 0.2
df['los'] = np.random.normal(8, 2, nPatients)
return df

discharges = makeSim(50, 10000)


patientid hospid sex age race cptCode rdm30d mort30d los 0 0 21 0 73.633836 1 38 False False 7.144019 1 1 16 1 67.518804 3 23 False False 3.340534 2 2 15 0 44.139033 0 8 False False 9.258706 3 3 29 1 45.510276 2 5 False False 10.590245 4 4 7 0 52.974924 2 4 False True 5.811064

4 Where was that error?

A somewhat annoying feature of running cells in org-mode is when there is an exception there has not been a good way to jump to the line that caused the error to edit it. The lines in the src block are not numbered, so in a large block it can be tedious to find the line. In scimax, when you get an exception it will number the lines in the src block, and when you press q in the exception traceback buffer it will jump to the line in the block where the error occurred.

print(1)
#raise Exception('Here')
print(2)


1 2

If you don't like the numbers add this to your init file:

(setq ob-ipython-number-on-exception nil)


5 Asynchronous Ipython

I have made a few improvements to the asynchronous workflow in Ipython. We now have a calculation queue, so you can use C-c C-c to execute several blocks in a row, and they will run asynchronously in the order you ran them. While they are running you can continue using Emacs, e.g. writing that paper, reading email, checking RSS feeds, tetris, … This also lets you run all the blocks up to the current point (M-x org-babel-execute-ipython-buffer-to-point-async) or the whole buffer (of Ipython) blocks asynchronously (M-x org-babel-execute-ipython-buffer-async).

To turn this on by default put this in your init file:

(setq org-babel-async-ipython t)


This requires all src blocks to have a name, and running the block will give it a name if you have not named the block. By default we use human-readable names. While the block is running, there will be a link indicating it is running. You can click on the link to cancel it. Running subsequent blocks will queue them to be run when the first block is done.

Here is an example:

import time
time.sleep(5)
a = 5
print('done')

print(3 * a)


15

Occasionally you will run into an issue. You can clear the queue with org-babel-async-ipython-clear-queue.

org-mode source

Org-mode version = 9.0.5

A partial symbolic numeric solver in emacs-lisp

| categories: | tags: | View Comments

I have been exploring ways to use emacs-lisp to express scientific ideas. In this post, we explore a partial symbolic numeric solver in Emacs-lisp. This involves some syntactic developments to more clearly identify something we want to solve for and to then generate the code required to solve it.

In section The Newton solver you can find a simple implementation of a Newton solver in emacs-lisp. This function allows you to numerically solve equations that can be written in the form $$f(x) = 0$$ for $$x$$ given an initial guess. You write a function for $$f(x)$$ and pass the function to the solver. This is a standard approach used in Python with fsolve, for example. Here is an example of solving a trivial problem: $$x - 4 = 0$$ just to check that it works. We use a lambda function for $$f(x) = x - 4 = 0$$. The answer is $$x=4$$.

(newton-f (lambda (x) (- x 4)) 2)


That syntax is not too bad, but we have the whole lambda expression in there, and some repetition of what we want to solve for as an argument and in the function. It would be interesting if we could just have an expression that gets solved, e.g. (newton-f (- x? 4) 2) where x? indicates the thing to solve for.

We can do that! We can take an expression, flatten it and find the variable names that end with ?. We should check that there is only one, but for now we don't. Here is an example that does that. I use a nested expression here just to illustrate that the code finds the x? variable correctly.

(require 'dash)

(let ((body '((* (- x? 4) 1))))
(loop for item in (-flatten body)
if (and (symbolp item) (s-ends-with? "?" (symbol-name item)))
collect item))


So, given an expression we can identify the unknown that should be the argument to a lambda function. So, we create a macro that takes that expression and constructs a function to solve it, then calls newton-f on it. The macro is syntactically useful here because we do not have to quote the expression. Here is that macro.

(defmacro solve (expression guess)
(newton-f
(lambda ,(loop for item in (-flatten expression)
if (and (symbolp item) (s-ends-with? "?" (symbol-name item)))
collect item)
,expression)
,guess))


I call this a partial symbolic solver because we do some introspection symbolically to identify what to solve for, and then construct the code required to solve it. Here is that trivial example (x? - 4 = 0). It just shows we can have some nesting and it still works. I am not so thrilled with the initial guess, but this is an iterative solver, so you either need an initial guess, or a solution range.

(solve (* (- x? 4) 1) 3)


Here is what that expands into:

(macroexpand '(solve (* (- x? 4) 1) 3))


It expands into what we would have written in the first place. The benefit to us is less typing, and a simpler syntax. Both of those reduce the opportunity to make errors!

A more realistic problem might be: Reactant A flows into a continuously stirred tank reactor at a rate of $$F_{A0} = 1$$ mol/min with a volumetric flow of $$v_0 = 1$$ L/min.. The reactor achieves 50% conversion ($$X$$) of A to products. The reaction rate law is known to be $$-r_A = k C_A$$ with $$k = 0.1$$ 1/min. Estimate the volume of the reactor. If you have taken my class in reaction engineering, you know the following facts:

• The exit molar flow is defined by $$F_A = F_{A0} (1 - X)$$
• The exit concentration is $$C_A = F_A / v_0$$
• The mole balance is defined by $$0 = F_{A0} - F_A + r_A V$$

That is all we need; we can solve for $$V$$ from the last equation. This is simple enough you might do the algebra to get: $$V = \frac{F_{A0} - F_A}{-r_A}$$ which can be simply evaluated. We use our solver here and compare it to the evaluation.

Here is the solver:

(let* ((Fa0 1)
(X 0.5)
(Fa (* Fa0 (- 1 X)))
(k 0.1)
(v0 1)
(Ca (/ Fa v0))
(r (* k Ca))
(ra (* r -1)))
(solve (+ Fa0 (* Fa -1) (* ra V?)) 2))


It is pretty hard to imagine doing something like this in Python! It would probably involve parsing a string.

Here is the evaluation from our algebra:

(let* ((Fa0 1)
(X 0.5)
(Fa (* Fa0 (- 1 X)))
(k 0.1)
(v0 1)
(Ca (/ Fa v0))
(r (* k Ca))
(ra (* r -1)))
(/ (- Fa0 Fa) (* -1 ra)))


Within the tolerance specified in newton-f, these are the same.

This is just the tip of the iceberg though. You may have noticed that none of the variables in the let* had any descriptions. Sure, you could put some comments after them, but those are not really part of the code.

Also, we had to define the variables in advance of the expression. That is a limitation of how computers work, they cannot evaluate undefined variables. It constrains how we can express the idea. What if we could instead specify the equation first, then the data? That way we are clear what we are trying to do at a higher level, and fill in the details later. Suppose we wanted a syntax like the block below instead. Here we emphasize the equation we are solving first, and then define the variables and quantities used in the equation, and finally the guess that we use to find the solution.

(solve1
(eqn (+ Fa0 (* -1 Fa) (* ra V?)))
(data ((k 0.1 "rate constant 1/min")
(Ca0 1.0 "feed concentration")
(v0 1 "volumetric flow L/min")
(Fa0 (* v0 Ca0) "Inlet molar flow")
(X 0.5 "Desired conversion")
(Fa (* Fa0 (- 1 X)) "Exit molar flow")
(Ca (/ Fa v0) "exit concentration")
(ra (* -1 k Ca) "rate in the reactor")))
(guess 8))


That is achievable with the solve1 macro below! It too has some limitations, mostly the order of the data block still has to be correct, e.g. you cannot use a variable before it is defined. It would take some serious macro-fu to sort these so that everything is defined in the right order! Still, it allows you to express an executable idea in the order we defined. The strings in this syntax for documenting the variables are ignored, but they could be used in the macro to print useful information or something else you could imagine. You could also make them mandatory to encourage documentation.

(defmacro solve1 (&rest body)
(let ((expression (second (assoc 'eqn body)))
(data (loop for d in (second (assoc 'data body))
collect (list (first d) (second d))))
(guess (second (assoc 'guess body))))
(let* ,data
(newton-f
(lambda ,(loop for item in (-flatten expression)
if (and (symbolp item) (s-ends-with? "?" (symbol-name item)))
collect item)
,expression)
,guess))))


To summarize, lisp macros allow us to rewrite the syntax of code before it is evaluated. This gives us the opportunity to inspect it, and generate new code, e.g. functions with arguments based on the contents of expressions, to save us typing. It also allows us to define ideas in a near arbitrary order that make sense to us, and then rearrange them so they make sense to the computer. See, for example, this post for an example of changing how functions are defined.

This seems to be heading in the domain specific language direction. I think it would be very helpful in engineering problem solving to build up tools like this. They could start out simple for new students to use. They never need to see the macro parts of this, just to learn how to use them for problem solving. These beginner tools would be limited in what they could do to minimize how much lisp is required to be learned so students can focus on the problem solving. Eventually they might outgrow them, and in the process transition to having the full lisp language at their disposal for problem solving.

1 The Newton solver in emacs-lisp

This is an emacs-lisp implementation of Newton's method. It is a simple implementation for a single variable. The tolerance and step-size are hard-coded for this post because we focus on the partial symbolic solver, not the best solver methods.

;; See https://en.wikipedia.org/wiki/Newton%27s_method for the method

(defun newton-f (func x0)
"Solve the equation FUNC(x)=0 using Newton's method.
X0 is an initial guess."
(let* ((tolerance 1e-6)
(x x0)
(dx 1e-6)
fx fpx)
(while (> (abs (funcall func x)) tolerance)
(setq fx (funcall func x)
fpx (/ (- (funcall func (+ x dx)) (funcall func (- x dx))) (* 2 dx))
x (- x (/ fx fpx))))
x))


org-mode source

Org-mode version = 9.0.5

A new and improved Emacs gnuplot DSL

| categories: | tags: | View Comments

A significant limitation of the previous DSL I wrote is that all the plotting commands have to go in one macro. It would be nice to accumulate them in different forms, and when you want to run them all. A classic way to do that in Emacs lisp is to make a global variable, e.g. *gnuplot-cmds* and append commands to it. Then when you want to, run the commands.

A more modern approach is to use a closure to encapsulate the commands. Here is a "let over lambda" that defines a few functions that encapsulate an enclosed variable gnuplot-commands. We define one function to add commands to the list of commands, one to clear the commands, one to generate the gnuplot script as a string, and one to run the program. The enclosed variable gnuplot-commands is basically only accessible by these functions. It is encapsulated, similar to if we defined a class in Python then made an instance of it with an attribute that was accessible only be instance methods. On one hand, this "protects" the variable, and keeps it out of the global namespace. On the other hand, we lose the documentation that would have come with a defvar, and we have to define a function to access the contents of that variable.

(let ((gnuplot-commands '("set terminal qt")))

"Append the command S to gnuplot-cmds."
(setq gnuplot-commands (append gnuplot-commands (list s))))

(defun gnuplot-clear ()
(setq gnuplot-commands '("set terminal qt")))

(defun gnuplot-script ()
(s-join "\n" gnuplot-commands)))


To run the commands, we define this function. It does not need to be in the closure because it only accesses the commands through functions we defined in the closure.

(defun gnuplot-show ()
(let* ((temporary-file-directory ".")
(cmdfile (make-temp-file "gnuplot-cmds-" nil ".gpl"))
(shellcmd (format "gnuplot --persist -c \"%s\"" cmdfile))
(cmds (gnuplot-script)))
(with-temp-file cmdfile
(insert cmds))
(shell-command shellcmd)
(delete-file cmdfile)
cmds))


Last time I noted I had a new idea for the DSL syntax that would give us more flexibility to inject variables and code into the DSL. The idea is to use keywords, symbols that start with :, to indicate they should be replaced by the value of the non-keyword symbol in the environment, and for any form that starts with : to evaluate that form. So, (: - 5 4) would get replaced by 1. Here is the new macro for that.

(defmacro kargs (&rest args)
"Convert symbols to strings, quote strings, and (expr) to what they evaluate to."
(s-join " " (list ,@(cl-mapcan
(lambda (s)
(list
(cond
((keywordp s)
(format "%s"
(symbol-value (intern (substring (symbol-name s) 1)))))
((symbolp s)
(symbol-name s))
((stringp s)
(format "\"%s\"" s))
((and (listp s) (eq : (car s)))
(with-output-to-string
(princ ,(cdr s))))
(t
(format "%s" s)))))
args))))


Now, our gnuplot macro is simpler, since all it does is add commands to the list. If the form is a string, we add it as is, if the form starts with (: stuff) we evaluate the cdr of the form, and otherwise, we pass the form contents to the kargs macro for processing.

(defmacro gnuplot (&rest forms)
(loop for s in (list ,@(mapcar (lambda (x)
(cond
((stringp x)
x)
((and (listp x) (eq : (car x)))
,(cdr x))
(t
(kargs ,@x))))
forms))


What did that gain us? First, we can break up a script so we can talk about it, maybe do some calculations, etc… Let's look at the exmaple at http://gnuplot.sourceforge.net/demo/linkedaxes.html.

(gnuplot-clear)

(gnuplot
(set terminal png)
(set encoding utf8)
(set key outside Left)
(set bmargin 5)
(set tmargin 6)
(set style data lines)
(set tics in)
(set ticslevel 0.5)
(set xlabel  "X-ray energy in eV")

(set format y  \'%5.1fe\')
(set title " Anomalous scattering factors ")
(set xrange  [9000:14400])
(set offset 0\,0\,1.0\,0)
(set xtics nomirror)
(set link x via 12398./x inverse 12398./x)

(set x2label  "X-ray wavelength in Å")
(set x2tics 0.1  format "%.1f Å" nomirror))


We need to download some data files. We can do that, and add another line to the gnuplot script. The escaping on the quotes and commas is especially tedious in this one ;) but, we don't need those pesky line-continuations here.

(shell-command "wget http://www.bmsc.washington.edu/scatter/data/Br.dat")
(shell-command "wget http://www.bmsc.washington.edu/scatter/data/Ta.dat")

(gnuplot
(plot "Br.dat" volatile using 1:3 title \'Br f\"\'  lt 1 lw 3\, \'\' volatile using 1:2 title "Br f'"  lt 1 lw 1\,
"Ta.dat" volatile using 1:3 title \'Ta f\"\' lt 2 lw 3\, \'\' volatile using 1:2 title \"Ta f\'\"  lt 2 lw 1))

(gnuplot-script)


Finally, we can set the output to png, and run our program.

(gnuplot-show)


Looks good.

What about the fancy keyword formatting? Here is an example of that in action. :term gets replaced by the term variable, :png gets replaced by the filename, and :x is replaced by 4.

(gnuplot-clear)
(let ((x 4)
(term "png")
(png "\"polar.png\""))
(gnuplot
(set terminal :term)
(set output :png)
(set polar)
(set dummy t)
(plot sin$$:x *t$$ \,cos$$:x *t$$)
(set offset 0\,0\,0\,0)))

(gnuplot-show)


There are a few nuances I didn't expect. First, you have to escape the parentheses in this case because otherwise it looks like a form that will be ignored. Second, you have to quote the string to get quotes into the gnuplot script. Third, there has to be a space before and after the keywords for emacs to parse it correctly and do the substitution.

Let's look at one last example that uses the (: form). We reproduce a figure from http://gnuplot.sourceforge.net/demo/transparent_solids.html here.

(gnuplot-clear)
(gnuplot
(set terminal pngcairo  background "#ffffff" enhanced font "arial,9" fontscale 1.0 size 512\, 384 )
(set output "transparent-solids.png")
;; construct the title
(set title (: format "\"%s\"" (concat "Interlocking Tori - PM3D surface" "with depth sorting and transparency")))

;; use lisp code to create a gnuplot command
(: concat "unset" " " "border")

(unset key)
(set object 1 rect from screen 0\, 0\, 0 to screen 1\, 1\, 0 behind)
(set object 1 rect fc  rgb \"gray\"  fillstyle solid 1.0  border -1)
(set view 64\, 345\, 1.24375\, 0.995902)
(set isosamples 50\, 20)
(unset xtics)
(unset ytics)
(unset ztics)
(set dummy u\,v)
(set parametric)
(set urange [ -pi : pi ])
(set vrange [ -pi : pi ])

(set style fill  transparent solid 0.30 border)
(set pm3d depthorder)
(set palette rgbformulae 8\, 9\, 7)
(set pm3d interpolate 1\,1 flush begin noftriangles border lt black linewidth 0.500 dashtype solid corners2color mean)
(set colorbox vertical origin screen 0.9\, 0.2\, 0 size screen 0.05\, 0.6\, 0 front  noinvert bdefault)

(splot (: concat "cos(u)+.5*cos(u)*cos(v),sin(u)+.5*sin(u)*cos(v),.5*sin(v) with pm3d,"
"1+cos(u)+.5*cos(u)*cos(v),.5*sin(v),sin(u)+.5*sin(u)*cos(v) with pm3d")))
(gnuplot-show)


Overall this seems like an improvement to the DSL. I didn't invent the idea of reusing keywords this way out of the blue. In On Lisp, Paul graham uses "special" variable names in Chapter 18, where he shows how to use gensyms for special purposes, and also variables with special names like ?x. Even Emacs is using a variation of this idea. Check out this new let-alist macro:

(let-alist '((x . 5))
(+ 1 .x))


There is a special variable inside the body that is a dot-name. The macro expands to provide a value for that symbol. I wonder if I should have tried to use an approach like this instead. Maybe another day. After I read and study the four defuns and single defmacro that make this possible!

You can see here what happens:

(macroexpand '(let-alist '((x . 5))
(+ 1 .x)))


The macro builds up an internal alist for the dot-names.

org-mode source

Org-mode version = 9.0.5

An emacs-lisp dsl for gnuplot

| categories: | tags: | View Comments

Plotting is a pretty general feature we need in scientific work. In this post we examine a way we could get at least minimal plotting into Emacs-lisp with as lispy a syntax as reasonable.

1 Embedding Python or gnuplot

With org-mode we can fluidly integrate many languages in one document. That is not the goal here, where I want to integrate plotting into a program. You certainly could go this route to embed python programs in your lisp programs for plotting.

(defun python (code)
(let* ((temporary-file-directory ".")
(tmpfile (make-temp-file "py-" nil ".py")))
(with-temp-file tmpfile
(insert code))
(shell-command-to-string (format "python %s" tmpfile))))


Here is that function in action.

(python "import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 1)
y = np.exp(x)
plt.plot(x, y, label='data')
plt.title('A Title')
plt.xlim([0, 1])
plt.ylim([1, 2.75])
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.savefig('figpy.png')")


And the corresponding figure:

This is irritating for a few reasons. One is it is annoying to write python programs in string form; you don't get much editor support for indentation or syntax highlighting, and you have to be careful with quotes. It is not that easy to switch that snippet to Python mode either. You are pretty limited in writing programs that expand and modify the code too. Basically you have to do that all by string manipulation.

Along these lines, you could imagine a gnuplot function. It ends up not being much better.

(defun gnuplot (cmds)
(let* ((temporary-file-directory ".")
(cmdfile (make-temp-file "gnuplot-cmds-" nil ".gpl"))
(shellcmd (format "gnuplot --persist -c \"%s\"" cmdfile)))
(with-temp-file cmdfile
(insert cmds))
(shell-command shellcmd)
(delete-file cmdfile)))


You use this the same way.

(gnuplot "set title \"Simple Plots\" font \",20\"
set key left box
set samples 50
set style data points
set terminal png
set output \"gnuplot.png\"

plot [-10:10] sin(x),atan(x),cos(atan(x))")


It has the same limitations as our string-based Python solution. The benefit of them is the native command structure for Python or gnuplot is used, so anything they can do you can too.

2 An alternative approach using a DSL

As an alternative, we consider here a domain specific language (DSL) that maps onto gnuplot. Suppose we could do this instead.

(gnuplot
(set terminal png)
(set output "test.png")
(set title "Simple Plots" font ",20")
(set key left box)
(set samples 50)
(set style data points)

(plot [-10:10] sin$$x$$ \,atan$$x$$ \,cos$$atan\(x$$\)))


Here is the figure from that code. The most annoying part of this is in the plot function we have to escape all the parentheses and commas, but otherwise it looks pretty lispy. The output of that program is the gnuplot commands that were generated for making the plot.

This retains a lot of benefits of programming in lisp. gnuplot has to be a macro though because we do not want to evaluate the s-expressions inside as lisp. For starters they just look lispy, I don't actually use them as lisp at all. Instead we transform them to the gnuplot code.

In the following code, I will develop the gnuplot macro. It has some sticky and tricky points, and it is not obvious it will support all the features of gnuplot, but I learned a lot doing it that I will share here.

Starting with a simple form inside the macro, I wanted to convert (set output "test.png") to "set output \"test.png\"". For this DSL, I want to treat every symbol in the form as if it should be turned into a string, anything that is a string should be quoted, and anything that is in parentheses (i.e. it passes listp) should be evaluated and converted to a string. Then all those strings should be joined by spaces. Here is a macro that does that (adapted from a solution at https://emacs.stackexchange.com/questions/32558/eval-some-arguments-in-a-macro/32570?noredirect=1#comment50186_32570).

There are a couple of corner cases that are handled here. If the arg is a string, we quote it. If the arg is not a symbol or string, then it is evaluated and converted to a string. Importantly, this is done in the run environment though, so we can inject variables into the gnuplot code.

(defmacro gargs (&rest args)
"Convert symbols to strings, quote strings, and (expr) to what they evaluate to."
(s-join " " (list ,@(cl-mapcan
(lambda (s)
(list
(cond
((symbolp s)
(symbol-name s))
((stringp s)
(format "\"%s\"" s))
(t
(with-output-to-string
(princ ,s))))))
args))))


Here are a few examples of how it works. The loop is just to get a vertical table in org-mode for the blog post.

(loop for s in
(list (gargs set key title "before fit" size \, (+ 5 5))
(gargs set title "red")
(gargs set yrange [0:*])
(gargs "5")
(let ((x 6)) (gargs (identity x)))
(gargs 'x)
(gargs '(x))
(gargs set label 1 "plot for [n=2:10] sin(x*n)/n" at graph .95\, graph .92 right))
collect
(list s))


A limitation of this is that we either have quote things like parentheses, commas, semi-colons and sometimes square brackets:

(gargs plot for [n=2:10] sin$$x*n$$/n notitle lw $$13-n$$/2)


Or we have to use the string form instead; we can always fall back to that.

(gargs "plot for [n=2:10] sin(x*n)/n notitle lw (13-n)/2")


The macro above will do the grunt work on each form in the gnuplot macro. Finally, for the gnuplot macro, I want to take all the forms, convert them to gnuplot commands, write them to a temporary file, and then run gnuplot on the file, and finally delete the temp file. I assume we start with a gui terminal so graphs pop up unless you change it in your macro body. Here is that macro. It returns the generated code so it easy to see if you got the right program.

(defmacro gnuplot (&rest forms)
(let* ((temporary-file-directory ".")
(cmdfile (make-temp-file "gnuplot-cmds-" nil ".gpl"))
(shellcmd (format "gnuplot --persist -c \"%s\"" cmdfile))
(cmd-string))
(let ((cmd-string (s-join "\n" (list ,@(mapcar (lambda (x)
(if (stringp x)
x
(gargs ,@x)))
forms)))))
(with-temp-file ,cmdfile
(insert "set terminal qt\n")
(insert cmd-string)
(setq cmd-string (buffer-string)))
(shell-command ,shellcmd)
(delete-file ,cmdfile)
cmd-string)))


Here is a figure adapted from http://gnuplot.sourceforge.net/demo/iterate.html. I use the string form for the last line to avoid escaping all the special characters.

(gnuplot
(set terminal png)
(set output "iteration.png")
(set title "Iteration within plot command")
(set xrange [0:3])
(set label 1 "plot for [n=2:10] sin(x*n)/n" at graph .95\, graph .92 right)
"plot for [n=2:10] sin(x*n)/n notitle lw (13-n)/2")


Here is the resulting figure.

That is overall pretty sweet. There is a little dissonance between the strings, escaped comma, etc.., and it is not terribly ideal for integrating with regular lisp code inside the macro yet. That seems to be a feature of my choice to use (expr) as the syntax to evaluate a form. It means you have to do some gymnastics to get some s-expressions into the graphs. For example below I use a couple of variables to inject values. To get a string I have to use format to add the extra quotes, and to get the number I have to use the identity function. I also used escaped characters in the last line to see the difference.

(let ((ts "Iteration and substitution")
(x0 0)
(xf 3)
(g1 0.95)
(g2 0.92))
(gnuplot
(set terminal png)
(set output "iteration-2.png")
(set title (format "\"%s\"" ts))
;; Note the escaped square brackets
(set xrange $(identity x0) : (identity xf)$)
(set label 1 "plot for [n=2:10] sin(x*n)/n" at graph (identity g1) \, graph (identity g2) right)
;; note here I escaped the parentheses!
(plot for [n=2:10] sin$$x*n$$/n notitle lw $$13-n$$/2)))


3 Summary

For the simple plots here, my DSL worked ok. There is a tradeoff in the syntax I chose that has some consequences. We cannot use the values of symbols in this DSL without resorting to hackery like (identity sym). We also cannot use the infix notation for sin(x) without quoting it as "sin(x)" or escaping the parentheses, e.g. sin$$x$$, likewise square brackets which lisp will read as a vector. Commas have to be escaped, which is probably an emacs-lisp issue. To address that would require a reader macro which emacs-lisp does not have great support for. I am calling this experiment done for now. I have another syntax idea to try out another day.

Here is a preview of what it might look like. It is basically the same but I reuse keywords to indicate that :x0 should be replaced by whatever x0 evaluates to, and (: - 1 0.05) should be evaluated. The special character escaping is still there of course, since that is a limitation of the emacs lisp reader I think. I might try using x0? and (? - 1 0.05) instead. That might be less confusing. I like that the keywords are syntax highlighted for free though, and you can't use them for anything else.

(let ((ts "Iteration and substitution")
(x0 0)
(xf 3)
(g2 0.92))
(gnuplot
(set terminal png)
(set output "iteration-2.png")
(set title :ts)
;; Note the escaped square brackets
(set xrange $:x0 : :xf$)
(set label 1 "plot for [n=2:10] sin(x*n)/n" at graph (: - 1 0.05) \, graph :g2 right)
;; note here I escaped the parentheses!
(plot for [n=2:10] sin(x*n)/n notitle lw (13-n)/2)))


This has the benefit of a little cleaner injection of variables and selective execution of parenthetical expressions, we will just ignore any that don't pass (= (car expr) :). That May not work for sin((: + 1 1) x) though, unless I escape the outer parentheses too.

org-mode source

Org-mode version = 9.0.5

Emulating Sparql queries in emacs-lisp with pattern matching

| categories: | tags: | View Comments

Sqarql is a query language for RDF triples. A triple is a data structure that consists of a (subject predicate object). Sparql lets you query the triples to extract data from them. I have been interested in using these to augment the SQL databases I generate from my org-files to be able to infer relationships between subjects and objects. For example, I could encode relationships into the contact database I use, and then infer new information that is not encoded explicitly. So far though I haven't found a good Sparql database that I can easily integrate into Emacs (or even play around with). I am reading On Lisp these days and chapters 18 and 19 talk about destructuring and pattern matching, and I realized these can be used to implement something like Sparql queries on simple lisp data structures. In this post I explore what it looks like and how to do it.

Let's consider a small database of triples that codify relationships between two people. For example, we can codify that Ann is Bob's mother with (Bob mother Ann). Here is our little database.

(setq triples '((Bob mother Ann)
(Bill father Bob)
(Lucy mother Jane)
(Bob wife Jane)))


We can filter out facts from the database with a -filter. Here we filter out triples about Bob. Emacs has nice pattern matching in the pcase macro (see http://www.wilfred.me.uk/blog/2017/03/19/pattern-matching-in-emacs-lisp/ and http://newartisans.com/2016/01/pattern-matching-with-pcase/ for example). It turns out this is an amazing way to solve this problem. Here we look at triples with the pattern that they start with Bob.

(-filter (lambda (triple) (pcase triple ((Bob ,_ ,_) t))) triples)


And here we get all the mothers.

(-filter (lambda (triple) (pcase triple ((,_ mother ,_) t))) triples)


We can infer some facts about these people from the database by using some "rules". For example, there is not an entry that tells us directly who Bill's grandmother is. If we assume that the mother of a person's father is their grandmother, then we can infer Bill's grandmother is Ann. In this post, we examine how to write code that can find that answer. We will use pattern matching on pairs of triples to do it.

We can enumerate pairs of triples, and use pattern matching to find the pair of triples that meet the criteria we specify. The criteria we want to match is (in pseudo-sparql):

(Bill father ?dad) (?dad mother ?grandmother)


In other words, we want to find a triple that contains Bill as the subject, father as the predication, and then his father will be the object, and then find another triple that matches a predicate of mother with the subject equal to the father object we found in the first triple, and the object of the second triple will be Bill's grandmother. We enumerate pairs of triples for the comparison. Here is a way to do that. It is not a very efficient way to do it; it would be better to first filter out the triples that match (Bill father something) and then filter out the triples that match (anything mother any other thing) and then consider the pairs of those triples. I will save that for another day; efficiency is not the point today ;)

(loop for i below (length triples)
append
(loop
for j below (length triples)
if (not (= i j))
collect
(list (nth i triples) (nth j triples))))


You can see the pair that matches is the fourth one down (actually the first one matches too, but not exactly in the order of the pattern we specified). Next, we use pcase for the pattern matching. This amazing macro allows you to specify a pattern in terms of reusable variables so we can specify that the same value exists in multiple places. We will use this pattern (in pcase syntax):

((Bill father ,dad) (,dad mother ,grandmother))


That means match a list that has the first element of (Bill father something) and store the value of something in the variable dad. The second element of the list must match (something mother another thing) and store the value of another thing in the variable grandmother. The two variables dad and grandmother are then available in the body of the pcase statement. Here is the code to loop over the triples and return the result when we find a match.

(catch 'result
(loop for i below (length triples)
do
(loop
for j below (length triples)
if (not (= i j))
collect
(pcase (list (nth i triples) (nth j triples))
(throw 'result (format "Bill's dad is %s and his grandmother is %s" dad grandmother)))))))


Not bad. It would be worthwhile to encapsulate that into a macro perhaps, so you could just write something like this:

(select (dad grandmother) from triples where ((Bill father ,dad) (,dad mother ,grandmother)))


For fun I implemented a limited version of this below. It is fairly limited, and lightly tested. The biggest limitation is we hard-coded the search over pairs of triples. This version searches by brute force too, because I don't know how to build in filtering yet. It is another exercise for another day to remove these limitations. Here I just want to try out the macro with the syntactic sugar of "from" and "where" (which are totally ignored) as well at the backquoted query.

(defmacro select (&rest args)
(let ((values (first args))
(db (third args))
(query (fifth args)))
(catch 'result
(loop for i below (length ,db)
do
(loop
for j below (length ,db)
if (not (= i j))
do
(pcase (list (nth i triples) (nth j triples))
(,query
(throw 'result (list ,@values)))))))))


Here is a fun way to write the query that finds the grandmother of the person named Bill with variable capture.

(select (person dad grandmother) from triples
where ((,(and person (let Bill person)) father ,dad) (,dad mother ,grandmother)))


We can see the grandmother is Ann, as we found before.

Let's have a look at the macro expansion. Clearly our macro hides a lot of work from us!

(macroexpand '(select (person dad grandmother) from triples

 Bill Bob Ann

How about another example query. Who is Lucy's dad? The most direct query would be (Lucy father ,dad), but a) that fact is not in the database, and b) our select macro won't search a single query anyway. So, let's examine how to find the answer by inference.

Let's assume that Lucy's dad is also the husband of her mother. Let's also assume that we can infer that if we know Jane is the wife of Bob, then Bob is the husband of Jane, and so we can infer from our database that Bob is Lucy's dad. This results in a query on a pair of triples that matches a pattern like:

(Lucy mother ?mom) (?dad wife ?mom)


Here is that query in our select macro.

(select (person mom dad) from triples
where ((,(and person (let Lucy person)) mother ,mom) (,dad wife ,mom)))
`

Pretty cool! Clearly there is still a lot to do to make this practical. The implementation I used here wouldn't scale well with large numbers of triples, and its limited to a single kind of query. Chapters 18 and 19 in On Lisp address the query limitation (and they are not even limited to triples) and a different syntax style that is more Sparql like. When I get through them, I will probably add a new post on it. There are a lot of interesting problems to solve here including what to do if there are multiple matches, or inconsistent data? The Sparql select command allows you to group, order and limit the results which would be increasingly useful with larger triple stores. That would definitely add a lot of code to the macro!