Emulating Sparql queries in emacs-lisp with pattern matching

| categories: emacs, lisp | tags: | View Comments

Sqarql is a query language for RDF triples. A triple is a data structure that consists of a (subject predicate object). Sparql lets you query the triples to extract data from them. I have been interested in using these to augment the SQL databases I generate from my org-files to be able to infer relationships between subjects and objects. For example, I could encode relationships into the contact database I use, and then infer new information that is not encoded explicitly. So far though I haven't found a good Sparql database that I can easily integrate into Emacs (or even play around with). I am reading On Lisp these days and chapters 18 and 19 talk about destructuring and pattern matching, and I realized these can be used to implement something like Sparql queries on simple lisp data structures. In this post I explore what it looks like and how to do it.

Let's consider a small database of triples that codify relationships between two people. For example, we can codify that Ann is Bob's mother with (Bob mother Ann). Here is our little database.

(setq triples '((Bob mother Ann)
                (Bill father Bob)
                (Lucy mother Jane)
                (Bob wife Jane)))

We can filter out facts from the database with a -filter. Here we filter out triples about Bob. Emacs has nice pattern matching in the pcase macro (see http://www.wilfred.me.uk/blog/2017/03/19/pattern-matching-in-emacs-lisp/ and http://newartisans.com/2016/01/pattern-matching-with-pcase/ for example). It turns out this is an amazing way to solve this problem. Here we look at triples with the pattern that they start with Bob.

(-filter (lambda (triple) (pcase triple (`(Bob ,_ ,_) t))) triples)

And here we get all the mothers.

(-filter (lambda (triple) (pcase triple (`(,_ mother ,_) t))) triples)

We can infer some facts about these people from the database by using some "rules". For example, there is not an entry that tells us directly who Bill's grandmother is. If we assume that the mother of a person's father is their grandmother, then we can infer Bill's grandmother is Ann. In this post, we examine how to write code that can find that answer. We will use pattern matching on pairs of triples to do it.

We can enumerate pairs of triples, and use pattern matching to find the pair of triples that meet the criteria we specify. The criteria we want to match is (in pseudo-sparql):

(Bill father ?dad) (?dad mother ?grandmother)

In other words, we want to find a triple that contains Bill as the subject, father as the predication, and then his father will be the object, and then find another triple that matches a predicate of mother with the subject equal to the father object we found in the first triple, and the object of the second triple will be Bill's grandmother. We enumerate pairs of triples for the comparison. Here is a way to do that. It is not a very efficient way to do it; it would be better to first filter out the triples that match (Bill father something) and then filter out the triples that match (anything mother any other thing) and then consider the pairs of those triples. I will save that for another day; efficiency is not the point today ;)

(loop for i below (length triples)
      append
      (loop
       for j below (length triples)
       if (not (= i j))
       collect
       (list (nth i triples) (nth j triples))))

You can see the pair that matches is the fourth one down (actually the first one matches too, but not exactly in the order of the pattern we specified). Next, we use pcase for the pattern matching. This amazing macro allows you to specify a pattern in terms of reusable variables so we can specify that the same value exists in multiple places. We will use this pattern (in pcase syntax):

`((Bill father ,dad) (,dad mother ,grandmother))

That means match a list that has the first element of (Bill father something) and store the value of something in the variable dad. The second element of the list must match (something mother another thing) and store the value of another thing in the variable grandmother. The two variables dad and grandmother are then available in the body of the pcase statement. Here is the code to loop over the triples and return the result when we find a match.

(catch 'result
  (loop for i below (length triples)
        do
        (loop
         for j below (length triples)
         if (not (= i j))
         collect
         (pcase (list (nth i triples) (nth j triples))
           (`((Bill father ,dad) (,dad mother ,grandmother))
            (throw 'result (format "Bill's dad is %s and his grandmother is %s" dad grandmother)))))))

Not bad. It would be worthwhile to encapsulate that into a macro perhaps, so you could just write something like this:

(select (dad grandmother) from triples where `((Bill father ,dad) (,dad mother ,grandmother)))

For fun I implemented a limited version of this below. It is fairly limited, and lightly tested. The biggest limitation is we hard-coded the search over pairs of triples. This version searches by brute force too, because I don't know how to build in filtering yet. It is another exercise for another day to remove these limitations. Here I just want to try out the macro with the syntactic sugar of "from" and "where" (which are totally ignored) as well at the backquoted query.

(defmacro select (&rest args)
  (let ((values (first args))
        (db (third args))
        (query (fifth args)))
    `(catch 'result
       (loop for i below (length ,db)
             do
             (loop
              for j below (length ,db)
              if (not (= i j))
              do
              (pcase (list (nth i triples) (nth j triples))
                (,query
                 (throw 'result (list ,@values)))))))))

Here is a fun way to write the query that finds the grandmother of the person named Bill with variable capture.

(select (person dad grandmother) from triples
        where `((,(and person (let Bill person)) father ,dad) (,dad mother ,grandmother)))

We can see the grandmother is Ann, as we found before.

Let's have a look at the macro expansion. Clearly our macro hides a lot of work from us!

(macroexpand '(select (person dad grandmother) from triples
        where `((,(and person (let Bill person)) father ,dad) (,dad mother ,grandmother))))
Bill Bob Ann

How about another example query. Who is Lucy's dad? The most direct query would be `(Lucy father ,dad), but a) that fact is not in the database, and b) our select macro won't search a single query anyway. So, let's examine how to find the answer by inference.

Let's assume that Lucy's dad is also the husband of her mother. Let's also assume that we can infer that if we know Jane is the wife of Bob, then Bob is the husband of Jane, and so we can infer from our database that Bob is Lucy's dad. This results in a query on a pair of triples that matches a pattern like:

(Lucy mother ?mom) (?dad wife ?mom)

Here is that query in our select macro.

(select (person mom dad) from triples
        where `((,(and person (let Lucy person)) mother ,mom) (,dad wife ,mom)))

Pretty cool! Clearly there is still a lot to do to make this practical. The implementation I used here wouldn't scale well with large numbers of triples, and its limited to a single kind of query. Chapters 18 and 19 in On Lisp address the query limitation (and they are not even limited to triples) and a different syntax style that is more Sparql like. When I get through them, I will probably add a new post on it. There are a lot of interesting problems to solve here including what to do if there are multiple matches, or inconsistent data? The Sparql select command allows you to group, order and limit the results which would be increasingly useful with larger triple stores. That would definitely add a lot of code to the macro!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.5

Read and Post Comments

A callable plist data structure for Emacs

| categories: emacs, macro, elisp | tags: | View Comments

Table of Contents

Emacs lisp has a few data structures that store key-value pairs. Here are some canonical examples of these data structures and the way to get data out of them.

  • a-lists
(let ((data '((key1 . 4)
              (key2 . "tree"))))
  (cdr (assoc 'key2 data)))
  • p-lists
(let ((data '(:key1 4 :key2 "tree")))
  (plist-get data :key2))
  • A hash table
(let ((data #s(hash-table data (key1 4 key2 "tree"))))
  (gethash 'key2 data))

Each of these uses some function to get data out of them. I have been learning about closures today, and realized a way you can make a "callable" data structure using them. In a closure, the data is stored as part of a function. We will use a "let over lambda" with a defalias in a lexical environment to achieve this. I will wrap a p-list with this approach, but it could work with any of the examples above. We will make the function have a few behaviors that allow us to see the whole data structure with no args, to get a value with one arg that is a key, and to set a value if there are more than two args add them as key-val pairs to the data structure. This block binds the function to the symbol "d" which is then a callable function.

(let ((data '(:key1 4 :key2 "tree")))
  (defalias 'd
    (lambda (&rest key-vals)
      (cond
       ;; no args, return data
       ((= 0 (length key-vals))
        data)
       ;; just a key, get val
       ((= 1 (length key-vals))
        (plist-get data (car key-vals)))
       (t
        (loop for key in (-slice key-vals 0 nil 2)
              for val in (-slice key-vals 1 nil 2)
              do
              (plist-put data key val))
        data)))))

Now we can use it like to get some data out:

(d :key2)

And add new values like:

(d :key3 "oak")

You can update a value with this too:

(d :key3 "pine")

or add multiple values like this:

(d :key4 0 :key5 9)

And see the whole plist with no args:

(d)

Pretty nice! It seems like there ought to be a macro to facilitate creating those. Here is one. This macro basically expands to the same code as above, but for fun I add a default value option.

(defmacro default-dict (var &optional default &rest key-vals)
  "Bind a callable plist to VAR that contains KEY-VALS."
  (let ()
    `(let ((data ',key-vals))
       (defalias ',var
         (lambda (&rest key-vals)
           (message "%s" key-vals)
           (cond
            ;; no args, return data
            ((= 0 (length key-vals))
             data)
            ;; just a key, get val
            ((= 1 (length key-vals))
             (or  (plist-get data (car key-vals)) ,default))
            (t
             (loop for key in (-slice key-vals 0 nil 2)
                   for val in (-slice key-vals 1 nil 2)
                   do
                   (plist-put data key val))
             data)))))))

Here is an instance of it.

(default-dict d2 "None" :key1 4 :key2 "tree")

And here it is in use.

(d2 :key1)
(d2 :new-key)

Not bad. If you come from Python, you might find this style of data structure to be more similar to what you are used to seeing. It sure seems less verbose than the usual plist boilerplate I have used before.

1 An update <2017-04-21 Fri>

One (perhaps undesirable even) feature of the approach above is that it creates a function in the global namespace. This might have unintended consequences with name clashes or shadowing, and if you later use the same variable name for a plist, you would change the function behavior. Here we consider a way to limit the scope of where these functions exist and work. The labels macro provides one way to do this, we just create temporary functions that only exist within a scope. There is a lot of backticking and comma operators in this, and it took quite a few iterations to get it working!

This macro creates temporary functions for each keyword that return the value in the plist.

(defmacro with-dict (key-vals &rest body)
  "A context-manager for a plist where each key is a callable
function that returns the value."
  (declare (indent 1))
  (let* ((g (if (symbolp key-vals)
                (symbol-value key-vals)
              key-vals))
         (keys (-slice g 0 nil 2)))
    `(labels ,(loop for key in keys
                    collect
                    (list key '() `(plist-get ',g  ,key)))
       ,@body)))

Here is how we use it:

(with-dict (:a 1 :b 'some-symbol :c 3)
  (:b))

We can also use it with variables that hold mappings like this.

(let ((d '(:key1 1 :key2 some-other-symbol :key3 3)))
  (with-dict d
    (format "We got %s" (:key2))))

That is pretty interesting! In case that looks similar to a context manager in Python, now you know where Python got that idea ;)

Another related idea is to let-bind the values to variables withing a scope. We can't use the keywords directly here, so I use some hackery to strip off the colon so it is a regular symbol. That is not quite as nice I guess since you have to remember to remove the : from the symbols in the body of your code.

(defmacro with-plist-vals (plist &rest body)
  "Bind the values of a plist to variables with the name of the keys."
  (declare (indent 1))
  `(let ,(loop for key in (-slice plist 0 nil 2)
               for val in (-slice plist 1 nil 2)
               collect (list (intern
                              (substring (symbol-name key) 1))
                             val))
     ,@body))

Here is an example usage.

(with-plist-vals (:a 4 :b 6)
 (* 2 a))

Obviously that is just an alternate syntax for the let statement, but it lets you leverage the plist syntax for multiple purposes.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.5

Read and Post Comments

A new org-mode exporter to Word for scimax

| categories: emacs, export, orgmode | tags: | View Comments

I am continuing to chip away to getting a reasonable export behavior for org-mode to MS Word. I have previously made some progress with Pandoc here and here, but those solutions never stuck with me. So here is another go. Here I leverage Pandoc again, but use a path through LaTeX to get citations without modifying the org-ref cite link syntax. The code for this can be found here: https://github.com/jkitchin/scimax/blob/master/ox-word.el. The gist is you use org-ref like you always do, and you specify the bibliography style for Pandoc like this:

You can download other csl files at https://www.zotero.org/styles. Then you can simply export the org-doc to a Word document with the key-binding C-c C-e w p.

Here is an example document to illustrate the exporter. I have written about data sharing in catalysis kitchin-2015-examp and surface science kitchin-2015-data-surfac-scien.

Here is an example source block.

%matplotlib inline
import matplotlib.pyplot as plt

plt.plot([1, 2, 3, 4, 5, 6])

See Ref. fig:line for example. These do not work. That might require additional pre-processing to replace them with numbers.

Here is the Word document that is generated: 2017-04-15.docx

As a penultimate result it might be ok. The references are reasonably formatted, but not compatible with Endnote, or other bibliography manager software. There are still some issues with Figure numbering and cross-references, but it is not too bad. The main benefit of this seems to be that one source generates HTML and the Word document.

Bibliography

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.5

Read and Post Comments

Autoformatting ordinal numbers and fractions in orgmode

| categories: emacs, orgmode | tags: | View Comments

MS Word has a few things I like. One of them is the ability to autoformat things to make an ordinal number string like 1st to the superscripted version 1st while you type or a 1/2 to ½. I thought it would be pretty easy to implement that for org-mode. It turns out it was not so easy!

There does not appear to be a way to specify a regexp pattern as an abbreviation, or an abbrev that starts with a number. What we need for ordinal numbers is to recognize a sequence of numbers followed by "st", "nd", "rd" or "th" followed by a space or punctuation, and then superscript the letters. In case you didn't want the replacement to occur, you should be able to undo it and get back the original string. This addition was a little hard won, so I am sharing the lessons here.

The logic I used is to put a function in the post-self-insert-hook. The function only works in org-mode, when not in a codeblock and when looking back at a regexp that matches a pattern to be replaced. Getting it to undo was trickier than expected. Eventually I worked out that you put an undo boundary in place before the change, and then it seems like you can undo the changes. I created a minor mode so it is easy to toggle this on and off.

Here is the implementation:

(defcustom scimax-autoformat-ordinals t
  "Determines if scimax autoformats ordinal numbers."
  :group 'scimax)

(defun scimax-org-autoformat-ordinals ()
  "Expand ordinal words to superscripted versions in org-mode.
1st to 1^{st}.
2nd to 2^{nd}
3rd to 3^{rd}
4th to 4^{th}"
  (interactive)
  (when (and scimax-autoformat-ordinals
             (eq major-mode 'org-mode)
             (not (org-in-src-block-p))
             (looking-back "\\(?3:\\<\\(?1:[0-9]+\\)\\(?2:st\\|nd\\|rd\\|th\\)\\>\\)\\(?:[[:punct:]]\\|[[:space:]]\\)"
                           (line-beginning-position)))
    (undo-boundary)
    (save-excursion
      (replace-match "\\1^{\\2}" nil nil nil 3))))


(defcustom scimax-autoformat-fractions t
  "Determines if scimax autoformats fractions."
  :group 'scimax)


(defun scimax-org-autoformat-fractions ()
  "Expand fractions to take up space."
  (interactive)
  (when (and scimax-autoformat-fractions
             (eq major-mode 'org-mode)
             (not (org-in-src-block-p))
             (looking-back "\\(?3:\\<\\(1/4\\|1/2\\|3/4\\)\\>\\)\\(?:[[:punct:]]\\|[[:space:]]\\)"
                           (line-beginning-position)))
    (undo-boundary)
    (save-excursion
      (replace-match (cdr (assoc (match-string 3) '(("1/4" . "¼")
                                                    ("1/2" . "½")
                                                    ("3/4" . "¾"))))
                     nil nil nil 3))))

(defun scimax-org-autoformat ()
  "Autoformat functions."
  (scimax-org-autoformat-ordinals)
  (scimax-org-autoformat-fractions))

(define-minor-mode scimax-autoformat-mode
  "Toggle `scimax-autoformat-mode'.  Converts 1st to 1^{st} as you type."
  :init-value nil
  :lighter (" om")
  (if scimax-ordinal-mode
      (add-hook 'post-self-insert-hook #'scimax-org-autoformat nil 'local)
    (remove-hook 'post-self-insert-hook #'scimax-org-autoformat 'local)))

This is now a feature in scimax. This marks the 500th blog post! That is ½ way to 1000. At the current rate of posting, it will be at least 5 years until I hit that!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.5

Read and Post Comments

A better return in org-mode

| categories: emacs, orgmode | tags: | View Comments

Over on Stackoverflow someone wanted a better return in org-mode. They wanted return to add items in a list (instead of M-Ret). Someone posted a partial solution, and here I improve on it to add new items to lists, new headings after a heading, and new rows to tables. In each case, a double return on an empty item, headline or table row will delete that line, and terminate the list, headlines or table. You can still use M-Ret, and this function falls through to org-return like it did before. You can use a prefix arg to get a regular return if you want one (e.g. you want to press enter on a headline to push it down).

Here is the function. Give it a try. It is a small but helpful addition I think. I have not used it for long, so if you come across issues leave a comment!

(require 'org-inlinetask)

(defun scimax/org-return (&optional ignore)
  "Add new list item, heading or table row with RET.
A double return on an empty element deletes it.
Use a prefix arg to get regular RET. "
  (interactive "P")
  (if ignore
      (org-return)
    (cond

     ((eq 'line-break (car (org-element-context)))
      (org-return-indent))

     ;; Open links like usual, unless point is at the end of a line.
     ;; and if at beginning of line, just press enter.
     ((or (and (eq 'link (car (org-element-context))) (not (eolp)))
          (bolp))
      (org-return))

     ;; It doesn't make sense to add headings in inline tasks. Thanks Anders
     ;; Johansson!
     ((org-inlinetask-in-task-p)
      (org-return))

     ;; checkboxes too
     ((org-at-item-checkbox-p)
      (org-insert-todo-heading nil))

     ;; lists end with two blank lines, so we need to make sure we are also not
     ;; at the beginning of a line to avoid a loop where a new entry gets
     ;; created with only one blank line.
     ((org-in-item-p)
      (if (save-excursion (beginning-of-line) (org-element-property :contents-begin (org-element-context)))
          (org-insert-heading)
        (beginning-of-line)
        (delete-region (line-beginning-position) (line-end-position))
        (org-return)))

     ;; org-heading
     ((org-at-heading-p)
      (if (not (string= "" (org-element-property :title (org-element-context))))
          (progn (org-end-of-meta-data)
                 (org-insert-heading-respect-content)
                 (outline-show-entry))
        (beginning-of-line)
        (setf (buffer-substring
               (line-beginning-position) (line-end-position)) "")))

     ;; tables
     ((org-at-table-p)
      (if (-any?
           (lambda (x) (not (string= "" x)))
           (nth
            (- (org-table-current-dline) 1)
            (org-table-to-lisp)))
          (org-return)
        ;; empty row
        (beginning-of-line)
        (setf (buffer-substring
               (line-beginning-position) (line-end-position)) "")
        (org-return)))

     ;; fall-through case
     (t
      (org-return)))))


(define-key org-mode-map (kbd "RET")
  'scimax/org-return)

Here are a few tests:

  1. numbered item
  2. second item
    1. nested number
    2. second number
  • [ ] check 1
  • [ ] check 2
  • [ ] check 3
an inline task

With some content

1 a subheading

2 another Subheading

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.5

Read and Post Comments

« Previous Page -- Next Page »