Using results from one code block in another org-mode

| categories: orgmode, emacs, elisp | tags:

One really great feature in org-mode is you have many options to pass data between code-blocks. In this post we look at some of these options using emacs-lisp as the language. This runs in a session where you can keep variables in memory between blocks, and use them in subsequent blocks.

Here we set a variable to a value.

(setq some-variable 42)
42

Then later in another block we can use that variable:

(+ some-variable 1)
43

While you are in the session, some-variable can be used. If you want some mind-bending trouble, the emacs-lisp session is global, and you can access some-variable even in another buffer! Don't do that. When you close emacs this variable will disappear, and all that is left are the results from above.

There is another way to pass information from one block to another using named src blocks and variables in the block header. This allows you to pass data between blocks by name, and you will see later you can even access the results by name from other files.

1 :var

First, we give our src block a name like this:

#+name: block-1
#+BEGIN_SRC emacs-lisp
(current-time-string)
#+END_SRC

When we run this, the results will have a name too.

(current-time-string)
Tue Feb 12 08:19:23 2019

Now, we can use the named result as input to a new block using the :var header.

#+BEGIN_SRC emacs-lisp :var input=block-1
(format "We got %S in block-1" input)
#+END_SRC

When we run this block, emacs will run block-1 and put the output in to the variable input which we use inside the code block.

(format "We got %S in block-1" input)
We got "Tue Feb 12 08:20:44 2019" in block-1

Some things to note:

  1. Every time you run this, block-1 gets rerun.
  2. The results in this block are not the same as in block-1
  3. The results in block-1 are not changed when you run the second block.

You may not want to rerun block-1 each time; maybe it is an expensive calculation, or maybe it should not be changed. You can prevent this behavior by using the :cache header.

2 :cache

If you specify :cache yes then org-mode should store a hash of the code block with the results, and if the code block hasn't changed then it should not run again.

#+name: block-2
#+BEGIN_SRC emacs-lisp :cache yes
(current-time-string)
#+END_SRC
(current-time-string)
Tue Feb 12 08:06:22 2019

Now, we use block-2 as input to a block, we see the output is the same as the output from block-2.

(format "We got %S in block-2" input)
We got "Tue Feb 12 08:06:22 2019" in block-2

Ok, but what if my results are too large to put in the buffer, or too complex for text? You still have some options.

3 :wrap

Suppose we generate some json in one block, and we want to use it in another block. We still want to see the json in the buffer as an intermediate result. We can wrap the output in a json block like this.

(require 'json)
(json-encode `(("date" . ,(current-time-string))))

{"date":"Tue Feb 12 08:30:20 2019"}

Then, we can simply input that output into a new block.

(format "We got %S in json" input)
We got "{\"date\":\"Tue Feb 12 08:30:20 2019\"}
" in json

This admittedly still pretty simple, text-based data. It is probably not a good idea to do this with binary data.

Note you can refer to this result even in another org-file:

#+BEGIN_SRC emacs-lisp :var input=./2019-02-12.org:json
input
#+END_SRC

#+RESULTS:
: {"date":"Tue Feb 12 08:30:20 2019"}

4 :file

It may be that your data is too large to conveniently put into your org-file, or maybe it is binary data. No problem, just put it into an external file using the :file header. It looks like this:

#+name: block-3
#+BEGIN_SRC emacs-lisp :cache yes :file block-3
(require 'json)
(json-encode `(("date" . ,(current-time-string))))
#+END_SRC

#+RESULTS[a14d376653bd8c40a0961ca95f21d8837dddec66]: block-3
[[file:block-3]]

Note that you have to provide a file name for this. Sometimes that is nice if you want a human recognizable file to send to someone, but it would also be nice if there was an automatic naming scheme, e.g. based on an sha-1 hash of the src block.

(require 'json)
(json-encode `(("date" . ,(current-time-string))))

block-3

Now you can use other tools to check out the file. Here we can still use simple shell tools.

cat block-3

The output of block-3 is a file name:

input
/Users/jkitchin/Box Sync/kitchingroup/jkitchin/journal/2019/02/12/block-3

So you can use it in a new block to read the data in, and then do something new with it.

(with-temp-buffer
  (insert-file-contents input)
  (format "We got %S in block-3" (json-read-from-string (buffer-string))))
We got ((date . "Tue Feb 12 08:46:55 2019")) in block-3

5 "remote" data

The blocks do not have to be in order. If you want, you can put your blocks in an appendix, and then just have analysis blocks here that use them. That way, you can have short blocks here that are more readable, but longer, more complex blocks elsewhere that do not clutter your document.

(with-temp-buffer
  (insert-file-contents input)
  (format "We got %S in the appendix data" (json-read-from-string (buffer-string))))
We got "{\"date\":\"Tue Feb 12 09:11:12 2019\"}" in the appendix data

6 Manually saving data in files

Note you can also manually save data in a file, for example:

(require 'json)
(let ((f "block-4.json"))
  (with-temp-file f
    (prin1
     (json-encode `(("date" . ,(current-time-string))))
     (current-buffer)))
  f)
block-4.json

We put the filename as the last variable which is returned by the block, so that we don't have to manually type it later in the next block. You know, try not to repeat yourself…

This just shows we did write out to our file:

cat block-4.json

And we read the file in here, using the filename from block-4 as an input variable.

(with-temp-buffer
  (insert-file-contents input)
  (format "We got %S in block-4" (json-read-from-string (buffer-string))))
We got "{\"date\":\"Tue Feb 12 08:51:25 2019\"}" in block-4

7 An appendix for data

(require 'json)
(let ((f "appendix.json"))
  (with-temp-file f
    (prin1
     (json-encode `(("date" . ,(current-time-string))))
     (current-buffer)))
  f)
appendix.json

8 Caveats

Using org-mode like this is almost always finding the right tradeoffs in what is persistent, and where is it stored. Not all of the intermediate data/calculations are stored; if they are really cheap you can just run the code blocks again. If they are really small, i.e. easy for your to read in a few lines, you can store them in the document. If they are really large, you can store them in a file.

The beauty of having everything in an org-file is you have a single file that is easy to transport. When the files get too large though, it can become impractical, e.g. emacs may slow down if you try to put thousands of lines of xml data into the buffer. Then, you have to make some decisions about what to keep, where to keep it, and in what form to keep it.

For short projects where you only need a single compute session, having everything in memory may be fine. For longer projects, say one that is long enough you will close all the buffers, and possibly restart emacs in between working on it, then you have to make some decisions about what to save from each block so you can continue the work in the next session. Again, you have to decide what to save, where to save, and in what form.

Once you start saving data outside the org-file, it becomes less portable, or more tricky to move the file because you need to also move all the data files to keep it intact. I have explored a concept of making an org-archive in the past, where you get a list of all files linked in the org-file, but this so far has just been worked out for some small proof of concept ideas.

Not all languages are the same in org-mode. They do not all support sessions for example, and they may not all work like the examples here. The scimax iPython modifications do not behave like the examples above. That is probably due to bugs I have inadvertently introduced, and in the future I will try to make it work like emacs-lisp does above.

Overall, org-mode has one of the most flexible and powerful systems for passing and reusing data in documents I have ever seen. It is not perfect, and in such a powerful system there are many unexplored or lightly traveled corners that may have hazards in them. It still seems pretty promising though.

Copyright (C) 2019 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.2.1

Discuss on Twitter

f-strings in emacs-lisp

| categories: emacs, elisp | tags:

I am a big fan of f-strings in Python 3. They let you put variable names and expressions in a string template that get expanded to create new strings. Here is a simple example of using those:

username = 'John Kitchin'
somevar = 5**0.5
print(f'{username:30s}{somevar:1.2f}')
John Kitchin                  2.24


String formatting in emacs-lisp is by comparison not as fun and easy. Out of the box we have:

(let ((username "John Kitchin")
      (somevar (sqrt 5)))
  (format "%-30s%1.2f" username somevar))
John Kitchin                  2.24

That is still three lines of code, but it is ugly and hard to read like the old python code:

print('%-30s%1.2f' % (username, somevar))
John Kitchin                  2.24


My experience has shown that this gets harder to figure out as the strings get larger, and f-strings are easier to read.

The wonderful 's library provides some salvation for emacs-lisp, if you don't want the format fields. You can refer to variables in a lexical environment like this.

(let ((username "John Kitchin")
      (somevar (sqrt 5)))
  (s-lex-format "${username}${somevar}"))
John Kitchin2.23606797749979

Today, I decided to do something about this, and wrote this little macro. It is a variation on s-lex-format that introduces a slightly new syntax. You can now add an optional format field separated from the variable name by a space.

(defmacro f-string (fmt)
  "Like `s-format' but with format fields in it.
FMT is a string to be expanded against the current lexical
environment. It is like what is used in `s-lex-format', but has
an expanded syntax to allow format-strings. For example:
${user-full-name 20s} will be expanded to the current value of
the variable `user-full-name' in a field 20 characters wide.
  (let ((f (sqrt 5)))  (f-string \"${f 1.2f}\"))
  will render as: 2.24
This function is inspired by the f-strings in Python 3.6, which I
enjoy using a lot.
"
  (let* ((matches (s-match-strings-all"${\\(?3:\\(?1:[^} ]+\\) *\\(?2:[^}]*\\)\\)}" fmt))
         (agetter (cl-loop for (m0 m1 m2 m3) in matches
                        collect `(cons ,m3  (format (format "%%%s" (if (string= ,m2 "")
                                                                      (if s-lex-value-as-lisp "S" "s")
                                                                   ,m2))
                                                  (symbol-value (intern ,m1)))))))

    `(s-format ,fmt 'aget (list ,@agetter))))
f-string

Here it is in action.

(let ((username "John Kitchin")
      (somevar (sqrt 5)))
  (f-string "${username -30s}${somevar 1.2f}"))
John Kitchin                  2.24

It still lacks some of the capability of f-strings in python, e.g. in Python, arguments inside the template to be expanded get evaluated. The solution used above is too simple for that, since it just used a regexp and is limited to the value of variables in the lexical environment.

print(f'{5**0.5:1.3f}')
2.236


Nevertheless, this simple solution matches what I do most of the time anyway, so I still consider it an improvement!

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.13

Discuss on Twitter

Caching searches using biblio and only seeing new results

| categories: biblio, arxiv, elisp | tags:

In this issue in scimax, Robert asked if it was possible to save searches, and then to repeat them every so often and only see the new results. This needs some persistent caching of the records, and a comparison of the current search results with the previous search results.

biblio provides a nice interface to searching a range of resources for bibliographic references. In this post, I will focus on arxiv. Out of the box, biblio does not seem to support this use case, but as you will see, it has many of the pieces required to achieve it. Let's start picking those pieces apart.

(require 'biblio)
biblio

Here is the first piece we need: a way to run a query, and get results back as a data structure. Here we just look at the first result.

(let* ((query "alloy segregration")
       (backend 'biblio-arxiv-backend)
       (cb (url-retrieve-synchronously (funcall backend 'url query)))
       (results (with-current-buffer cb
                  (funcall backend 'parse-buffer))))
  (car results))
((doi . "10.1103/PhysRevB.76.014112")
 (identifier . "0704.2752v2")
 (year . "2007")
 (title . "Modelling Thickness-Dependence of Ferroelectric Thin Film Properties")
 (authors nil nil nil nil nil nil nil nil nil nil nil nil nil "L. Palova" nil "P. Chandra" nil "K. M. Rabe" nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil)
 (container . "PRB 76, 014112 (2007)")
 (category . "cond-mat.mtrl-sci")
 (references "10.1103/PhysRevB.76.014112" "0704.2752v2")
 (type . "eprint")
 (url . "https://doi.org/10.1103/PhysRevB.76.014112")
 (direct-url . "http://arxiv.org/pdf/0704.2752v2"))

Next, we need a database to store the results in. I will just use a flat file database with a file for each record. The filename will be the md5 hash of the doi or the record itself. Why is that a good idea? Well, the doi is a constant, so if it exists the md5 will also be a constant. The doi itself is not a good filename in general, but the md5 is. The md5 of the record itself will be fragile to any changes, so if it has a doi, we should use it. If it doesn't and later gets one, we should see it again since that could mean it has been published. Also, if it changes because of some new version we might want to see it again. In any case, the existence of that file will be evidence we have seen that record before, and will indicate we need to remove it from the current view.

The flat file database is not super inspired. It is modeled a little after elfeed, but other solutions might work better for large sets of records, but this approach will work fine for this post.

Here is a function that returns nil if the record has been seen, and if not, saves the record and returns it.

(defvar db-dir "~/.arxiv-db/")

(unless (f-dir? db-dir) (make-directory db-dir t))

(defun unseen-record-p (record)
  "Given a RECORD return it if it is unseen.
Also, save the record so next time it will be marked seen. A
record is seen if we have seen the DOI or the record as a string
before."
  (let* ((doi (cdr (assoc 'doi record)))
         (contents (with-temp-buffer
                     (prin1 record (current-buffer))
                     (buffer-string)))
         (hash (md5 (or doi contents)))
         (fname (expand-file-name hash db-dir)))

    (if (f-exists? fname)
        nil
      (with-temp-file fname
        (insert contents))
      record)))
unseen-record-p

Now we can use that as a filter that saves records by side effect.

(defun scimax-arxiv (query)
  (interactive "Query: ")

  (let* ((backend 'biblio-arxiv-backend)
         (cb (url-retrieve-synchronously (funcall backend 'url query)))
         (results (-filter 'unseen-record-p (with-current-buffer cb
                                              (funcall backend 'parse-buffer))))
         (results-buffer (biblio--make-results-buffer (current-buffer) query backend)))
    (with-current-buffer results-buffer
      (biblio-insert-results results ""))
    (pop-to-buffer results-buffer)))

(scimax-arxiv "alloy segregation")
#<buffer *arXiv search*>

Now, when I run that once I see something like this:

and if I run it again:

(scimax-arxiv "alloy segregation")
#<buffer *arXiv search*>

Then the buffer is empty, since we have seen all the entries before.

Here are the files in our database:

ls ~/.arxiv-db/

Here are the contents of one of those files:

(with-temp-buffer
 (insert-file-contents "~/.arxiv-db/18085fe2512e15d66addc7dfb71f7cd2")
 (read (buffer-string)))
((doi) (identifier . 1101.3464v3) (year . 2011) (title . Characterizing Solute Segregation and Grain Boundary Energy in a Binary
  Alloy Phase Field Crystal Model) (authors nil nil nil nil nil nil nil nil nil nil nil nil nil Jonathan Stolle nil Nikolas Provatas nil nil nil nil nil nil nil nil nil nil nil) (container) (category . cond-mat.mtrl-sci) (references nil 1101.3464v3) (type . eprint) (url . http://arxiv.org/abs/1101.3464v3) (direct-url . http://arxiv.org/pdf/1101.3464v3))

So, if you need to read this in again later, no problem.

Now, what could go wrong? I don't know much about how the search results from arxiv are returned. For example, this query returns 10 hits.

(let* ((query "alloy segregration")
       (backend 'biblio-arxiv-backend)
       (cb (url-retrieve-synchronously (funcall backend 'url query)))
       (results (with-current-buffer cb
                  (funcall backend 'parse-buffer))))
  (length results))
10

There is just no way there are only 10 hits for this query. So, there must be a bunch more that you get by either changing the requested number in some argument, or by using subsequent queries to get the rest of them. I don't know if there are more advanced query options with biblio, e.g. to find entries newer than the last time it was run. On the advanced search page for arxiv, it looks like there is only a by year option.

This is still a good idea, and a lot of the pieces are here,

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.6

Discuss on Twitter

Overloading mathematical operators in elisp

| categories: emacs, elisp | tags:

Table of Contents

In Python I am used to some simple idioms like this:

print([1, 2, 3] * 2)
print("ab" * 3)

[1, 2, 3, 1, 2, 3] ababab

There is even such fanciness as defining operators for objects, as long as they have the appropriate dunder methods defined:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __str__(self):
        return "Point ({}, {})".format(self.x, self.y)

    def __mul__(self, a):
        return Point(self.x * a, self.y * a)

    def __rmul__(self, a):
        return Point(self.x * a, self.y * a)
    
p = Point(1, 1)
print(p * 2)
print(3 * p)

Point (2, 2) Point (3, 3)

Out of the box, these things are not possible in elisp. Operators like * in elisp only take numbers or markers. We have a few options to change this. The worst option is to simply redefine these functions. That is bad because it is not reversible. We could define new functions that have the behavior we want, but then we lose the semantic meaning of "*" that we were aiming for. A better option is to advise these functions. This is reversible, because you can later unadvise them. Today we look at some strategies to do this.

We will use "around" advise because it will let us bypass the original intent of the function when we want to, or use it when we do. First, we create a function that will be the advice and add it to the * function. This first draft won't actually change the behavior of *; if all the args are numbers or markers it will simply use the original function as before.

(require 'dash)

(defun *--*-around (orig-fun &rest args)
  "if every arg is a number do *, else do something else."
  (cond
   ((-every? (lambda (x) (or (numberp x) (markerp x))) args)
    (apply orig-fun args))))

(advice-add '* :around #'*--*-around)

Let's just confirm

(* 1 2 3)
6

Now, we can start modifying our function to handle some other cases. Let's do the list and string first. The * function is variadic, but in these cases it makes sense to limit to two arguments. We need two cases for each type since we can write (* 2 list) or (* list 2). We also should create a fall-through case that raises an error to alert us we can't multiply things.

(defun *--*-around (orig-fun &rest args)
  "if every arg is a number do *, else do something else."
  (cond
   ;; The original behavior
   ((-every? (lambda (x) (or (numberp x) (markerp x))) args)
    (apply orig-fun args))

   ;; create repeated copies of list
   ((and (listp (first args))
         (integerp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (second args) append (copy-list (first args))))

   ((and (integerp (first args))
         (listp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (first args) append (copy-list (second args))))

   ;; Make repeated string
   ((and (stringp (first args))
         (integerp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (second args) concat (first args)))

   ((and (integerp (first args))
         (stringp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (first args) concat (second args)))

   (t
    (error "You cannot * %s" args))))
*--*-around

Here is the new advice in action.

(list
 (* '(a b) 2)
 (* 2 '(c d))
 (* 2 "ab")
 (* "cd" 2))
(a b a b) (c d c d) abab cdcd

That captures the spirit of overloading * for lists and strings. What about that object example? We have to make some assumptions here. Python looks for an uses a dunder mul method. We will assume a double dash method (–mul–) in a similar spirit. We have to modify the advice one final time. We just add a condition to check if one of the arguments is an eieio-object, and then call the –mul– function on the arguments.

(defun *--*-around (orig-fun &rest args)
  "if every arg is a number do *, else do something else."
  (cond
   ;; The original behavior
   ((-every? (lambda (x) (or (numberp x) (markerp x))) args)
    (apply orig-fun args))

   ;; create repeated copies of list
   ((and (listp (first args))
         (integerp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (second args) append (copy-list (first args))))

   ((and (integerp (first args))
         (listp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (first args) append (copy-list (second args))))

   ;; Make repeated string
   ((and (stringp (first args))
         (integerp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (second args) concat (first args)))

   ((and (integerp (first args))
         (stringp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (first args) concat (second args)))

   ;; Handle object
   ((or (and (eieio-object-p (first args))
             (numberp (second args)))
        (and (numberp (first args))
             (eieio-object-p (second args))))
    (apply '--mul-- args))

   (t
    (error "You cannot * %s" args))))
*--*-around

Now, we can define a class and the –mul– function and show that our overloaded * function works. Note we can define two signatures of –mul– so it is not necessary to define an –rmul– in this case as it was with Python (although we still create two functions in the end).

(require 'eieio)

(defclass Point ()
  ((x :initarg :x)
   (y :initarg :y)))

(cl-defmethod --mul-- ((p Point) a)
  (Point :x (* (oref p :x) a) :y (* (oref p :y) a)))

(cl-defmethod --mul-- (a (p Point))
  (Point :x (* (oref p :x) a) :y (* (oref p :y) a)))

(cl-defmethod --str-- ((p Point))
  (format "Point (%s, %s)" (oref p :x) (oref p :y)))

(let ((P (Point :x 1 :y 1)))
  (list
   (--str-- (* P 2))
   (--str-- (* 3 P))))
Point (2, 2) Point (3, 3)

That is pretty awesome. Before going on, here is how you remove the advice:

(advice-remove '* '*--*-around)

This example has been pretty instructive. You have to handle overloading for all the intrinsic types. We did lists and strings here; you might also consider vectors. For objects, it looks like we can at least try using a generic method like –mul–. One detail I neglected to consider here is that * is natively variadic. For these special cases, we did not implement variadic versions. This isn't a feature of Python which uses infix notation, so every call is with two arguments. In some cases it might make sense to support variadic args, but that seems like a generally challenging thing to do. While (* "a" 2 3) might be expected to create a string of "aaaaaa", (* "a" 2 '(3)) doesn't make sense at all.

It would be straightforward to extend this to other operators like '+ to concatenate strings, lists and vectors, or '- to remove chars or elements, including extensions to objects using double-dash functions like –add–, –subtract–, etc. Another nice idea might be to advise print to use –str– on objects.

On the surface this looks useful so far. Python defines a lot of dunder methods that cover all kinds of scenarios including logical comparisons, bit shifting, mod, incrementing operators, casting, comparisons, right/left operations, indexing and assignment, length and others. That would be a lot of advices. This approach is moderately tedious to expand though; you have to keep adding conditional cases.

An alternative to the big conditional statement used in the advice might be the use of a generic function. With this approach we define a generic function that just does multiplication by default. Then we define specific cases with specific signatures that are used for lists, strings, objects, etc. That is basically all our conditional above was doing, matching signatures and executing a chunk of code accordingly.

Here is our default case that does the original behavior. We still use advice to apply the function.

(cl-defgeneric generic-multiply (orig-fun &rest args)
  "Generic multiply for when no specific case exists."
  (apply orig-fun args))

(defun *--*-around-generic (orig-fun &rest args)
  (apply 'generic-multiply orig-fun args))

(advice-add '* :around #'*--*-around-generic)

That should just work as usual for regular multiplication.

(* 1 2 3 4)
24

Sure enough it does. Now, we can define a specific method for a string. We need a specialized method for each signature, e.g. pre and post multiplication.

(cl-defmethod generic-multiply ((orig-fun subr) (s string) (n integer))
  (loop for i from 0 below n concat s))

(cl-defmethod generic-multiply ((orig-fun subr) (n integer) (s string))
  (loop for i from 0 below n concat s))

(list
 (* "Ac" 2)
 (* 2 "Ad"))
AcAc AdAd

That works fine, and we did not have to modify our original advice function at all! Next the list:

(cl-defmethod generic-multiply ((orig-fun subr) (L list) (n integer))
  (loop for i from 0 below n append (copy-list L)))

(cl-defmethod generic-multiply ((orig-fun subr) (n integer) (L list))
  (loop for i from 0 below n append (copy-list L)))

(list (* '(1 2) 2)
      (* 2 '(3 4)))
1 2 1 2
3 4 3 4

That also works fine. Last, our class example. This should work on all objects I think (unless there is some way to make classes that do not inherit the default superclass).

(cl-defmethod generic-multiply ((orig-fun subr) (n integer) (obj eieio-default-superclass))
  (--mul-- n obj))

(cl-defmethod generic-multiply ((orig-fun subr) (obj eieio-default-superclass) (n integer))
  (--mul-- n obj))

(let ((P (Point :x 1 :y 1)))
  (list
   (--str-- (* P 2))
   (--str-- (* 3 P))))
Point (2, 2) Point (3, 3)

This is a much better approach to extending the multiplication operator! If I continue this path in the future I would probably take this one. This could be useful to make elisp more like some more popular contemporary languages like Python, as well as to add linear algebra like notation or mathematical operations on objects in elisp. It kind of feels like these operations ought to be generic functions to start with to make this kind of overloading easier from the beginning. Functions like "*" are currently defined in the C source code though, maybe for performance reasons. It is not obvious what the consequences of making them generic might be.

1 Addendum

Christopher Wellons pointed out an important limitation of advice: they don't work on byte-compiled functions. Let's see what he means. Here is a simple function that will just multiply a Point object by an integer:

(defun to-be-bytten (p1 n)
  (* p1 n))
to-be-bytten

Here it is in action, and here it works fine.

(to-be-bytten (Point :x 1 :y 1) 2)
[eieio-class-tag--Point 2 2]

Now, let's byte-compile that function and try it again:

(byte-compile 'to-be-bytten)

(condition-case err
    (to-be-bytten (Point :x 1 :y 1) 2)
  ((error r)
   (message "Doh! Christopher was right. It did not work...\n%s" err)))
Doh! Christopher was right. It did not work...
(wrong-type-argument number-or-marker-p [eieio-class-tag--Point 1 1])

So the advice is pretty limited since most of the functions in Emacs core are likely to be byte-compiled, and it might mean you have to redefine * completely, or define some new function that looks like it. Too bad, the advice was pretty easy!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.7

Discuss on Twitter

A callable plist data structure for Emacs

| categories: macro, emacs, elisp | tags:

Table of Contents

Emacs lisp has a few data structures that store key-value pairs. Here are some canonical examples of these data structures and the way to get data out of them.

  • a-lists
(let ((data '((key1 . 4)
              (key2 . "tree"))))
  (cdr (assoc 'key2 data)))
tree

  • p-lists
(let ((data '(:key1 4 :key2 "tree")))
  (plist-get data :key2))
tree

  • A hash table
(let ((data #s(hash-table data (key1 4 key2 "tree"))))
  (gethash 'key2 data))
tree

Each of these uses some function to get data out of them. I have been learning about closures today, and realized a way you can make a "callable" data structure using them. In a closure, the data is stored as part of a function. We will use a "let over lambda" with a defalias in a lexical environment to achieve this. I will wrap a p-list with this approach, but it could work with any of the examples above. We will make the function have a few behaviors that allow us to see the whole data structure with no args, to get a value with one arg that is a key, and to set a value if there are more than two args add them as key-val pairs to the data structure. This block binds the function to the symbol "d" which is then a callable function.

(let ((data '(:key1 4 :key2 "tree")))
  (defalias 'd
    (lambda (&rest key-vals)
      (cond
       ;; no args, return data
       ((= 0 (length key-vals))
        data)
       ;; just a key, get val
       ((= 1 (length key-vals))
        (plist-get data (car key-vals)))
       (t
        (loop for key in (-slice key-vals 0 nil 2)
              for val in (-slice key-vals 1 nil 2)
              do
              (plist-put data key val))
        data)))))
d

Now we can use it like to get some data out:

(d :key2)
tree

And add new values like:

(d :key3 "oak")
:key1 4 :key2 tree :key3 oak

You can update a value with this too:

(d :key3 "pine")
:key1 4 :key2 tree :key3 pine

or add multiple values like this:

(d :key4 0 :key5 9)
:key1 4 :key2 tree :key3 pine :key4 0 :key5 9

And see the whole plist with no args:

(d)
:key1 4 :key2 tree :key3 pine :key4 0 :key5 9

Pretty nice! It seems like there ought to be a macro to facilitate creating those. Here is one. This macro basically expands to the same code as above, but for fun I add a default value option.

(defmacro default-dict (var &optional default &rest key-vals)
  "Bind a callable plist to VAR that contains KEY-VALS."
  (let ()
    `(let ((data ',key-vals))
       (defalias ',var
         (lambda (&rest key-vals)
           (message "%s" key-vals)
           (cond
            ;; no args, return data
            ((= 0 (length key-vals))
             data)
            ;; just a key, get val
            ((= 1 (length key-vals))
             (or  (plist-get data (car key-vals)) ,default))
            (t
             (loop for key in (-slice key-vals 0 nil 2)
                   for val in (-slice key-vals 1 nil 2)
                   do
                   (plist-put data key val))
             data)))))))

Here is an instance of it.

(default-dict d2 "None" :key1 4 :key2 "tree")
d2

And here it is in use.

(d2 :key1)
4

(d2 :new-key)
None

Not bad. If you come from Python, you might find this style of data structure to be more similar to what you are used to seeing. It sure seems less verbose than the usual plist boilerplate I have used before.

1 An update <2017-04-21 Fri>

One (perhaps undesirable even) feature of the approach above is that it creates a function in the global namespace. This might have unintended consequences with name clashes or shadowing, and if you later use the same variable name for a plist, you would change the function behavior. Here we consider a way to limit the scope of where these functions exist and work. The labels macro provides one way to do this, we just create temporary functions that only exist within a scope. There is a lot of backticking and comma operators in this, and it took quite a few iterations to get it working!

This macro creates temporary functions for each keyword that return the value in the plist.

(defmacro with-dict (key-vals &rest body)
  "A context-manager for a plist where each key is a callable
function that returns the value."
  (declare (indent 1))
  (let* ((g (if (symbolp key-vals)
                (symbol-value key-vals)
              key-vals))
         (keys (-slice g 0 nil 2)))
    `(labels ,(loop for key in keys
                    collect
                    (list key '() `(plist-get ',g  ,key)))
       ,@body)))
with-dict

Here is how we use it:

(with-dict (:a 1 :b 'some-symbol :c 3)
  (:b))
quote some-symbol

We can also use it with variables that hold mappings like this.

(let ((d '(:key1 1 :key2 some-other-symbol :key3 3)))
  (with-dict d
    (format "We got %s" (:key2))))
We got some-other-symbol

That is pretty interesting! In case that looks similar to a context manager in Python, now you know where Python got that idea ;)

Another related idea is to let-bind the values to variables within a scope. We can't use the keywords directly here, so I use some hackery to strip off the colon so it is a regular symbol. That is not quite as nice I guess since you have to remember to remove the : from the symbols in the body of your code.

(defmacro with-plist-vals (plist &rest body)
  "Bind the values of a plist to variables with the name of the keys."
  (declare (indent 1))
  `(let ,(loop for key in (-slice plist 0 nil 2)
               for val in (-slice plist 1 nil 2)
               collect (list (intern
                              (substring (symbol-name key) 1))
                             val))
     ,@body))
with-plist-vals

Here is an example usage.

(with-plist-vals (:a 4 :b 6)
 (* 2 a))
8

Obviously that is just an alternate syntax for the let statement, but it lets you leverage the plist syntax for multiple purposes.

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.6

Discuss on Twitter
Next Page ยป