Emulating Sparql queries in emacs-lisp with pattern matching

| categories: lisp, emacs | tags:

Sqarql is a query language for RDF triples. A triple is a data structure that consists of a (subject predicate object). Sparql lets you query the triples to extract data from them. I have been interested in using these to augment the SQL databases I generate from my org-files to be able to infer relationships between subjects and objects. For example, I could encode relationships into the contact database I use, and then infer new information that is not encoded explicitly. So far though I haven't found a good Sparql database that I can easily integrate into Emacs (or even play around with). I am reading On Lisp these days and chapters 18 and 19 talk about destructuring and pattern matching, and I realized these can be used to implement something like Sparql queries on simple lisp data structures. In this post I explore what it looks like and how to do it.

Let's consider a small database of triples that codify relationships between two people. For example, we can codify that Ann is Bob's mother with (Bob mother Ann). Here is our little database.

(setq triples '((Bob mother Ann)
                (Bill father Bob)
                (Lucy mother Jane)
                (Bob wife Jane)))

We can filter out facts from the database with a -filter. Here we filter out triples about Bob. Emacs has nice pattern matching in the pcase macro (see http://www.wilfred.me.uk/blog/2017/03/19/pattern-matching-in-emacs-lisp/ and http://newartisans.com/2016/01/pattern-matching-with-pcase/ for example). It turns out this is an amazing way to solve this problem. Here we look at triples with the pattern that they start with Bob.

(-filter (lambda (triple) (pcase triple (`(Bob ,_ ,_) t))) triples)

And here we get all the mothers.

(-filter (lambda (triple) (pcase triple (`(,_ mother ,_) t))) triples)

We can infer some facts about these people from the database by using some "rules". For example, there is not an entry that tells us directly who Bill's grandmother is. If we assume that the mother of a person's father is their grandmother, then we can infer Bill's grandmother is Ann. In this post, we examine how to write code that can find that answer. We will use pattern matching on pairs of triples to do it.

We can enumerate pairs of triples, and use pattern matching to find the pair of triples that meet the criteria we specify. The criteria we want to match is (in pseudo-sparql):

(Bill father ?dad) (?dad mother ?grandmother)

In other words, we want to find a triple that contains Bill as the subject, father as the predication, and then his father will be the object, and then find another triple that matches a predicate of mother with the subject equal to the father object we found in the first triple, and the object of the second triple will be Bill's grandmother. We enumerate pairs of triples for the comparison. Here is a way to do that. It is not a very efficient way to do it; it would be better to first filter out the triples that match (Bill father something) and then filter out the triples that match (anything mother any other thing) and then consider the pairs of those triples. I will save that for another day; efficiency is not the point today ;)

(loop for i below (length triples)
      append
      (loop
       for j below (length triples)
       if (not (= i j))
       collect
       (list (nth i triples) (nth j triples))))

You can see the pair that matches is the fourth one down (actually the first one matches too, but not exactly in the order of the pattern we specified). Next, we use pcase for the pattern matching. This amazing macro allows you to specify a pattern in terms of reusable variables so we can specify that the same value exists in multiple places. We will use this pattern (in pcase syntax):

`((Bill father ,dad) (,dad mother ,grandmother))

That means match a list that has the first element of (Bill father something) and store the value of something in the variable dad. The second element of the list must match (something mother another thing) and store the value of another thing in the variable grandmother. The two variables dad and grandmother are then available in the body of the pcase statement. Here is the code to loop over the triples and return the result when we find a match.

(catch 'result
  (loop for i below (length triples)
        do
        (loop
         for j below (length triples)
         if (not (= i j))
         collect
         (pcase (list (nth i triples) (nth j triples))
           (`((Bill father ,dad) (,dad mother ,grandmother))
            (throw 'result (format "Bill's dad is %s and his grandmother is %s" dad grandmother)))))))

Not bad. It would be worthwhile to encapsulate that into a macro perhaps, so you could just write something like this:

(select (dad grandmother) from triples where `((Bill father ,dad) (,dad mother ,grandmother)))

For fun I implemented a limited version of this below. It is fairly limited, and lightly tested. The biggest limitation is we hard-coded the search over pairs of triples. This version searches by brute force too, because I don't know how to build in filtering yet. It is another exercise for another day to remove these limitations. Here I just want to try out the macro with the syntactic sugar of "from" and "where" (which are totally ignored) as well at the backquoted query.

(defmacro select (&rest args)
  (let ((values (first args))
        (db (third args))
        (query (fifth args)))
    `(catch 'result
       (loop for i below (length ,db)
             do
             (loop
              for j below (length ,db)
              if (not (= i j))
              do
              (pcase (list (nth i triples) (nth j triples))
                (,query
                 (throw 'result (list ,@values)))))))))

Here is a fun way to write the query that finds the grandmother of the person named Bill with variable capture.

(select (person dad grandmother) from triples
        where `((,(and person (let Bill person)) father ,dad) (,dad mother ,grandmother)))

We can see the grandmother is Ann, as we found before.

Let's have a look at the macro expansion. Clearly our macro hides a lot of work from us!

(macroexpand '(select (person dad grandmother) from triples
        where `((,(and person (let Bill person)) father ,dad) (,dad mother ,grandmother))))
Bill Bob Ann

How about another example query. Who is Lucy's dad? The most direct query would be `(Lucy father ,dad), but a) that fact is not in the database, and b) our select macro won't search a single query anyway. So, let's examine how to find the answer by inference.

Let's assume that Lucy's dad is also the husband of her mother. Let's also assume that we can infer that if we know Jane is the wife of Bob, then Bob is the husband of Jane, and so we can infer from our database that Bob is Lucy's dad. This results in a query on a pair of triples that matches a pattern like:

(Lucy mother ?mom) (?dad wife ?mom)

Here is that query in our select macro.

(select (person mom dad) from triples
        where `((,(and person (let Lucy person)) mother ,mom) (,dad wife ,mom)))

Pretty cool! Clearly there is still a lot to do to make this practical. The implementation I used here wouldn't scale well with large numbers of triples, and its limited to a single kind of query. Chapters 18 and 19 in On Lisp address the query limitation (and they are not even limited to triples) and a different syntax style that is more Sparql like. When I get through them, I will probably add a new post on it. There are a lot of interesting problems to solve here including what to do if there are multiple matches, or inconsistent data? The Sparql select command allows you to group, order and limit the results which would be increasingly useful with larger triple stores. That would definitely add a lot of code to the macro!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.5

Discuss on Twitter