<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <atom:link href="http://kitchingroup.cheme.cmu.edu/blog/feed/index.xml" rel="self" type="application/rss+xml" />
    <title>The Kitchin Research Group</title>
    <link>https://kitchingroup.cheme.cmu.edu/blog</link>
    <description>Chemical Engineering at Carnegie Mellon University</description>
    <pubDate>Sat, 01 Nov 2025 13:47:46 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    
    <item>
      <title>A simple emacs-lisp interface to CRUD operations in mongodb</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2017/01/16/A-simple-emacs-lisp-interface-to-CRUD-operations-in-mongodb</link>
      <pubDate>Mon, 16 Jan 2017 09:44:16 EST</pubDate>
      <category><![CDATA[mongodb]]></category>
      <category><![CDATA[emacs]]></category>
      <category><![CDATA[database]]></category>
      <category><![CDATA[emacslisp]]></category>
      <guid isPermaLink="false">RYjjiQpLtNAakQHa8WQBr3x1-_s=</guid>
      <description>A simple emacs-lisp interface to CRUD operations in mongodb</description>
      <content:encoded><![CDATA[


&lt;div id="table-of-contents"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;div id="text-table-of-contents"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#orgf43163b"&gt;1. Inserting entries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#orgbec2cf8"&gt;2. Finding a document&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org7279625"&gt;3. Updating an entry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org769cf9d"&gt;4. Deleting a document&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org0cb9437"&gt;5. Generic commands&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org512e4fe"&gt;6. A MongoDB contacts database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org97edcbc"&gt;7. Text searching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org04fa0df"&gt;8. Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;
In this &lt;a href="http://kitchingroup.cheme.cmu.edu/blog/2017/01/15/Querying-a-MongoDB-bibtex-database-with-Python-and-emacs-lisp/"&gt;post&lt;/a&gt; I showed that MongoDB is pretty easy to use for simple database applications. I showed a way to get data out of the database that is native to Emacs, but to use Mongo in emacs applications comfortably, it would be really helpful to be able to create, read, update and delete (CRUD) entries. There is a minimal interface to mongodb for emacs-lisp here: &lt;a href="https://github.com/m2ym/mongo-el"&gt;https://github.com/m2ym/mongo-el&lt;/a&gt;. From what I can see, it seems limited to simple, single queries, and it is written with advanced features of emacs-lisp I do not understand enough to extend it. In the last post, I showed an easy way to use mongoexport to get data from a query out of a mongo database. Here I explore a similar approach to round out the CRUD (create, read, update and delete) operations for using emacs-lisp to work with mongodb. This will enable Emacs to easily use MongoDB in applications.
&lt;/p&gt;

&lt;p&gt;
We use the mongo cli with the &amp;#x2013;eval option, which allows you to run commands on the database. The basic idea is to generate the json we need from a lisp data structure, and use that json in mongo commands as needed. This sounds simple, but below you will see there are plenty of corners to take care of.
&lt;/p&gt;

&lt;p&gt;
The goal here is to get something that is pretty functional. It will not be able to support all the capabilities of MongoDB and the options available in the cli.
&lt;/p&gt;

&lt;div id="outline-container-orgf43163b" class="outline-2"&gt;
&lt;h2 id="orgf43163b"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; Inserting entries&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
Here we insert a document into the contacts collection of the contacts database. As in the Python example we considered earlier, this database is automatically created when we run this command. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;require&lt;/span&gt; '&lt;span style="color: #D0372D;"&gt;json&lt;/span&gt;))
(&lt;span style="color: #0000FF;"&gt;let*&lt;/span&gt; ((json (json-encode '((first-name . &lt;span style="color: #008000;"&gt;"John"&lt;/span&gt;)
                            (last-name . &lt;span style="color: #008000;"&gt;"Kitchin"&lt;/span&gt;)
                            (email . &lt;span style="color: #008000;"&gt;"jkitchin@cmu.edu"&lt;/span&gt;))))
       (cmd (format &lt;span style="color: #008000;"&gt;"mongo 127.0.0.1/contacts --quiet --eval 'db.contacts.insert(%s)'"&lt;/span&gt;
                    json)))
  (shell-command-to-string cmd))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
json
&lt;/pre&gt;

&lt;p&gt;
Here is a function we can use for inserting, and as you can see it works for multiple inserts too. There is a limit on how long the json string can be for this, so you cannot add too many entries at once with this. I do not know what the limit is, and suspect it is related to using a shell command. When this succeeds there is data returned about what happened, which we try to get in lisp form. Also, I noticed I had to do a little bit of escaping, especially for entries containing a single quote, which messes up the quoting on the shell command, and for non-ascii characters which the shell did not handle well. Maybe this could be avoided with a file-based approach, or if we used a pipe to a process.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;defun&lt;/span&gt; &lt;span style="color: #006699;"&gt;mongo-insert&lt;/span&gt; (db collection document)
  &lt;span style="color: #036A07;"&gt;"Insert into DB.COLLECTION the DOCUMENT.&lt;/span&gt;
&lt;span style="color: #036A07;"&gt;DOCUMENT will be some lisp structure that is converted to json."&lt;/span&gt;
  &lt;span style="color: #8D8D84;"&gt;;; &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;we have to escape quote any single quotes. This came from&lt;/span&gt;
  &lt;span style="color: #8D8D84;"&gt;;; &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;http://stackoverflow.com/questions/1250079/how-to-escape-single-quotes-within-single-quoted-strings&lt;/span&gt;
  (&lt;span style="color: #0000FF;"&gt;let*&lt;/span&gt; ((json (replace-regexp-in-string &lt;span style="color: #008000;"&gt;"'"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"'\"'\"'"&lt;/span&gt; (json-encode document)))
         &lt;span style="color: #8D8D84;"&gt;;; &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;it seems utf-8 characters may cause issues. Let's just remove them.&lt;/span&gt;
         (json (replace-regexp-in-string &lt;span style="color: #008000;"&gt;"[&lt;/span&gt;&lt;span style="color: #008000;"&gt;^&lt;/span&gt;&lt;span style="color: #008000;"&gt;[:ascii:]]"&lt;/span&gt; &lt;span style="color: #008000;"&gt;""&lt;/span&gt; json))
         (cmd (format &lt;span style="color: #008000;"&gt;"mongo %s --quiet --eval 'db.%s.insert(%s)'"&lt;/span&gt;
                      db collection
                      json))
         (output (shell-command-to-string cmd)))
    (&lt;span style="color: #0000FF;"&gt;cond&lt;/span&gt;
     ((string-match &lt;span style="color: #008000;"&gt;"BulkWriteResult("&lt;/span&gt; output)
      (json-read-from-string (substring output 16 -2)))
     ((string-match &lt;span style="color: #008000;"&gt;"WriteResult("&lt;/span&gt; output)
      (json-read-from-string (substring output 12 -2)))
     (t
      output))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
mongo-insert
&lt;/pre&gt;

&lt;p&gt;
Here it is in action.
&lt;/p&gt;
&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-insert &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt;
              '(((first-name . &lt;span style="color: #008000;"&gt;"John"&lt;/span&gt;)
                 (last-name . &lt;span style="color: #008000;"&gt;"Kitchin"&lt;/span&gt;)
                 (email . &lt;span style="color: #008000;"&gt;"jkitchin@cmu.edu"&lt;/span&gt;))
                ((first-name . &lt;span style="color: #008000;"&gt;"Someone"&lt;/span&gt;)
                 (last-name . &lt;span style="color: #008000;"&gt;"Else"&lt;/span&gt;)
                 (&lt;span style="color: #008000;"&gt;"email"&lt;/span&gt; . &lt;span style="color: #008000;"&gt;"someone@out.there"&lt;/span&gt;))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
((writeErrors . []) (writeConcernErrors . []) (nInserted . 2) (nUpserted . 0) (nMatched . 0) (nModified . 0) (nRemoved . 0) (upserted . []))
&lt;/pre&gt;

&lt;p&gt;
Seems like an ok way to get data from Emacs into a Mongo DB, and we get lisp data returned telling us what happened.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgbec2cf8" class="outline-2"&gt;
&lt;h2 id="orgbec2cf8"&gt;&lt;span class="section-number-2"&gt;2&lt;/span&gt; Finding a document&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
To update documents we need to find them. We would like to find a document by the _id, but we have a small dilemma. The json we need for that needs to look like: {"_id": ObjectId("587babfaef131d0d4603b3ad")}, where the ObjectId is not quoted. The json library does not seem to be able to do that. So, we have to modify our find code to do this. This is possible by manipulating the json string after it is generated with regular expression replacement. It feels hacky, and hopefully there are not many more examples of that. If there are, we will need another approach to generating the json data. Here is the modified find function, also with the projection option. Here is another place we have to tread somewhat lightly with the _id, in this case we have to requote it so that it can be read by emacs. It might make sense to just replace it with the quoted _id string, rather than the ObjectId call. Time will tell.
&lt;/p&gt;

&lt;p&gt;
Here we create two helper functions to unquote input, and requote output. We also need some code to make an array of all the results, and put commas between all the results so that we end up with valid json in the output.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;defun&lt;/span&gt; &lt;span style="color: #006699;"&gt;mongo-unquote-query&lt;/span&gt; (query)
  &lt;span style="color: #036A07;"&gt;"Json encodes QUERY, and unquotes any ObjectId calls.&lt;/span&gt;

&lt;span style="color: #036A07;"&gt;We don't have syntax for the ObjectId call that mongo wants in&lt;/span&gt;
&lt;span style="color: #036A07;"&gt; lisp, so a query has to look like this:&lt;/span&gt;
&lt;span style="color: #036A07;"&gt;'((_id .  \"ObjectId(\"587babfaef131d0d4603b3ad\")\"))&lt;/span&gt;

&lt;span style="color: #036A07;"&gt;Mongo can't have the quotes around the call, so this function&lt;/span&gt;
&lt;span style="color: #036A07;"&gt;removes them.&lt;/span&gt;
&lt;span style="color: #036A07;"&gt;"&lt;/span&gt;
  (replace-regexp-in-string &lt;span style="color: #008000;"&gt;"\"&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;\\&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;(&lt;/span&gt;&lt;span style="color: #008000;"&gt;ObjectID(\\\\\"&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;\\&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;(&lt;/span&gt;&lt;span style="color: #008000;"&gt;.*?&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;\\&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;)&lt;/span&gt;&lt;span style="color: #008000;"&gt;\\\\\")&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;\\&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;)&lt;/span&gt;&lt;span style="color: #008000;"&gt;\""&lt;/span&gt;
                            &lt;span style="color: #008000;"&gt;"ObjectId(\"\\2\")"&lt;/span&gt;
                            (json-encode query)))

(&lt;span style="color: #0000FF;"&gt;defun&lt;/span&gt; &lt;span style="color: #006699;"&gt;mongo-requote-output&lt;/span&gt; (output)
  &lt;span style="color: #036A07;"&gt;"Adds quotes around ObjectId in OUTPUT.&lt;/span&gt;
&lt;span style="color: #036A07;"&gt;When mongo outputs json, it has unquoted ObjectIds in it that&lt;/span&gt;
&lt;span style="color: #036A07;"&gt;emacs cannot interpret as json. "&lt;/span&gt;
  (replace-regexp-in-string
   &lt;span style="color: #008000;"&gt;"ObjectId(\"&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;\\&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;(&lt;/span&gt;&lt;span style="color: #008000;"&gt;.*?&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;\\&lt;/span&gt;&lt;span style="color: #008000; font-weight: bold;"&gt;)&lt;/span&gt;&lt;span style="color: #008000;"&gt;\")"&lt;/span&gt;
   &lt;span style="color: #008000;"&gt;"\"ObjectId(\\\\\"\\1\\\\\")\""&lt;/span&gt;
   output))

(&lt;span style="color: #0000FF;"&gt;defun&lt;/span&gt; &lt;span style="color: #006699;"&gt;mongo-find&lt;/span&gt; (db collection query &lt;span style="color: #6434A3;"&gt;&amp;amp;optional&lt;/span&gt; projection)
  (&lt;span style="color: #0000FF;"&gt;let*&lt;/span&gt; ((query-json (mongo-unquote-query query))
         (projection-json
          (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; projection (json-encode projection)))
         (output (mongo-requote-output
                  &lt;span style="color: #8D8D84;"&gt;;; &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;add [] to make an array of output in json,&lt;/span&gt;
                  &lt;span style="color: #8D8D84;"&gt;;; &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;and separate results by a comma&lt;/span&gt;
                  (concat &lt;span style="color: #008000;"&gt;"["&lt;/span&gt;
                          (replace-regexp-in-string
                           &lt;span style="color: #008000;"&gt;"\n"&lt;/span&gt; &lt;span style="color: #008000;"&gt;""&lt;/span&gt;
                           (shell-command-to-string
                            (format &lt;span style="color: #008000;"&gt;"mongo %s --quiet --eval 'db.%s.find(%s).forEach(function(myDoc) { printjsononeline(myDoc); print( \",\"); })'"&lt;/span&gt;
                                    db collection
                                    (&lt;span style="color: #0000FF;"&gt;if&lt;/span&gt; projection
                                        (format &lt;span style="color: #008000;"&gt;"%s, %s"&lt;/span&gt; query-json projection-json)
                                      query-json))))
                          &lt;span style="color: #008000;"&gt;"]"&lt;/span&gt;)))) 
    (json-read-from-string output)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
mongo-find
&lt;/pre&gt;

&lt;p&gt;
So, finally we can run something like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; '((email . &lt;span style="color: #008000;"&gt;"someone@out.there"&lt;/span&gt;)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[((_id . "ObjectId(\"587c166cdfcd649d3acf99fd\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there")) ((_id . "ObjectId(\"587c16ad410565dd4c16c748\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there")) ((_id . "ObjectId(\"587c17550e586b4f8df21de0\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there")) ((_id . "ObjectId(\"587c1764d75279a55ffec483\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there")) ((_id . "ObjectId(\"587c17743281f1e9d5054396\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there")) ((_id . "ObjectId(\"587c178ad92706d2bd5a6e3c\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there")) ((_id . "ObjectId(\"587c1794756bb2bd0f0ac499\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there"))]
&lt;/pre&gt;

&lt;p&gt;
Here is an example usage with a projection that returns only the information you want, in this case, just the id.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; '((email . &lt;span style="color: #008000;"&gt;"someone@out.there"&lt;/span&gt;))
            '((_id . 1)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[((_id . "ObjectId(\"587c166cdfcd649d3acf99fd\")")) ((_id . "ObjectId(\"587c16ad410565dd4c16c748\")")) ((_id . "ObjectId(\"587c17550e586b4f8df21de0\")")) ((_id . "ObjectId(\"587c1764d75279a55ffec483\")")) ((_id . "ObjectId(\"587c17743281f1e9d5054396\")")) ((_id . "ObjectId(\"587c178ad92706d2bd5a6e3c\")")) ((_id . "ObjectId(\"587c1794756bb2bd0f0ac499\")"))]
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org7279625" class="outline-2"&gt;
&lt;h2 id="org7279625"&gt;&lt;span class="section-number-2"&gt;3&lt;/span&gt; Updating an entry&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
Ok, back to the update. To make sure that we update exactly the document we want, we will use the document _id. First, we define an update command.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;defun&lt;/span&gt; &lt;span style="color: #006699;"&gt;mongo-update&lt;/span&gt; (db collection query $set)
  &lt;span style="color: #036A07;"&gt;"In DB.COLLECTION update records matching QUERY with the contents of $SET."&lt;/span&gt;
  (&lt;span style="color: #0000FF;"&gt;let*&lt;/span&gt; ((query-json (mongo-encode-query query))
         ($set-json (mongo-encode-query $set))
         (cmd (format &lt;span style="color: #008000;"&gt;"mongo %s --quiet --eval 'db.%s.update(%s, %s)'"&lt;/span&gt;
                      db collection
                      query-json $set-json))
         (output (shell-command-to-string cmd)))
    (&lt;span style="color: #0000FF;"&gt;if&lt;/span&gt; (string-match &lt;span style="color: #008000;"&gt;"WriteResult("&lt;/span&gt; output)
        (json-read-from-string
         (substring output 12 -2))
      output)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
mongo-update
&lt;/pre&gt;

&lt;p&gt;
First a reminder of what is in this record.
&lt;/p&gt;
&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; '((_id . &lt;span style="color: #008000;"&gt;"ObjectId(\"587c16ad410565dd4c16c748\")"&lt;/span&gt;)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[((_id . "ObjectId(\"587c16ad410565dd4c16c748\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there"))]
&lt;/pre&gt;

&lt;p&gt;
Here we set the email field to a new address. Without $set, the whole document gets replaced.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-update &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt;
              '((_id . &lt;span style="color: #008000;"&gt;"ObjectId(\"587c16ad410565dd4c16c748\")"&lt;/span&gt;))
              '(($set . ((email . &lt;span style="color: #008000;"&gt;"someone@out.there.com"&lt;/span&gt;)))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
((nMatched . 1) (nUpserted . 0) (nModified . 1))
&lt;/pre&gt;

&lt;p&gt;
Finally, let's see the document again to verify it is modified.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; '((_id . &lt;span style="color: #008000;"&gt;"ObjectId(\"587c16ad410565dd4c16c748\")"&lt;/span&gt;)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[((_id . "ObjectId(\"587c16ad410565dd4c16c748\")") (first-name . "Someone") (last-name . "Else") (email . "someone@out.there.com"))]
&lt;/pre&gt;

&lt;p&gt;
Looks good, you can see it got changed. There is a potential gotcha though. This next command looks like it should do the same thing, but it does not. The whole document gets replaced!
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-update &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt;
              '((_id . &lt;span style="color: #008000;"&gt;"ObjectId(\"587c16ad410565dd4c16c748\")"&lt;/span&gt;))
              '((email . &lt;span style="color: #008000;"&gt;"someone@out.there.com"&lt;/span&gt;)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
((nMatched . 1) (nUpserted . 0) (nModified . 1))
&lt;/pre&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; '((_id . &lt;span style="color: #008000;"&gt;"ObjectId(\"587c16ad410565dd4c16c748\")"&lt;/span&gt;)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[((_id . "ObjectId(\"587c16ad410565dd4c16c748\")") (email . "someone@out.there.com"))]
&lt;/pre&gt;

&lt;p&gt;
Do not forget the $set operator if you just want to update some fields!
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org769cf9d" class="outline-2"&gt;
&lt;h2 id="org769cf9d"&gt;&lt;span class="section-number-2"&gt;4&lt;/span&gt; Deleting a document&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
Next, let's get a delete function. I will only implement the deleteMany function here since you can give it a document id to delete only one, and usually I would want to delete all documents that meet a criteria anyway.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;defun&lt;/span&gt; &lt;span style="color: #006699;"&gt;mongo-deleteMany&lt;/span&gt; (db collection filter)
  &lt;span style="color: #036A07;"&gt;"Delete records in DB.COLLECTION matched by FILTER.&lt;/span&gt;
&lt;span style="color: #036A07;"&gt;TODO: add write concern."&lt;/span&gt;
  (&lt;span style="color: #0000FF;"&gt;let*&lt;/span&gt; ((filter-json (mongo-encode-query filter)) 
         (cmd (format &lt;span style="color: #008000;"&gt;"mongo %s --quiet --eval 'db.%s.deleteMany(%s)'"&lt;/span&gt;
                      db collection
                      filter-json))
         (output (shell-command-to-string cmd)))
    (json-read-from-string output)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
mongo-deleteMany
&lt;/pre&gt;

&lt;p&gt;
Since we borked that last document, let's just delete it.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-deleteMany &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; '((_id . &lt;span style="color: #008000;"&gt;"ObjectId(\"587be3fa6009a569a277b680\")"&lt;/span&gt;)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
((acknowledged . t) (deletedCount . 0))
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org0cb9437" class="outline-2"&gt;
&lt;h2 id="org0cb9437"&gt;&lt;span class="section-number-2"&gt;5&lt;/span&gt; Generic commands&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
We may want some flexibility to run collection commands.  The most generic command will simply be to write the shell-command completely. We can keep a little syntax by encapsulating most of the boilerplate though. Here is a function for that.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;defun&lt;/span&gt; &lt;span style="color: #006699;"&gt;mongo-cmd&lt;/span&gt; (db collection cmd &lt;span style="color: #6434A3;"&gt;&amp;amp;rest&lt;/span&gt; args)
  &lt;span style="color: #036A07;"&gt;"In DB.COLLECTION run CMD. &lt;/span&gt;
&lt;span style="color: #036A07;"&gt;ARGS if present will be used to format CMD."&lt;/span&gt;
  (shell-command-to-string
   (format &lt;span style="color: #008000;"&gt;"mongo %s --quiet --eval 'db.%s.%s'"&lt;/span&gt;
           db collection
           (apply #'format cmd args))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
mongo-cmd
&lt;/pre&gt;

&lt;p&gt;
We can get the number of documents with this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-cmd &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"count()"&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
4341
&lt;/pre&gt;

&lt;p&gt;
Or run a more sophisticated command with arguments like this.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-cmd &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"explain().remove(%s)"&lt;/span&gt; (json-encode '((&lt;span style="color: #008000;"&gt;"category"&lt;/span&gt; . &lt;span style="color: #008000;"&gt;"enemy"&lt;/span&gt;))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "contacts.contacts",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"category" : {
				"$eq" : "enemy"
			}
		},
		"winningPlan" : {
			"stage" : "DELETE",
			"inputStage" : {
				"stage" : "COLLSCAN",
				"filter" : {
					"category" : {
						"$eq" : "enemy"
					}
				},
				"direction" : "forward"
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Johns-MacBook-Air.local",
		"port" : 27017,
		"version" : "3.4.1",
		"gitVersion" : "5e103c4f5583e2566a45d740225dc250baacfbd7"
	},
	"ok" : 1
}
&lt;/pre&gt;

&lt;p&gt;
Or, drop the collection with:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-cmd &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"drop()"&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
true
&lt;/pre&gt;

&lt;p&gt;
All gone! Note, we do not try to handle the output of any of those, and they are returned as strings.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org512e4fe" class="outline-2"&gt;
&lt;h2 id="org512e4fe"&gt;&lt;span class="section-number-2"&gt;6&lt;/span&gt; A MongoDB contacts database&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;
&lt;p&gt;
Now, let's re-populate it for real. I store my contacts in a variable called "contacts" as a list of a descriptive string and then cons cells. These are actually harvested from a set of org-files. It is way to slow to parse these files each time, so I keep the contacts cached in memory and only update them if a file changes.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(length contacts)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
6047
&lt;/pre&gt;

&lt;p&gt;
There are over 6000 contacts. Let's put them in a MongoDB.
&lt;/p&gt;

&lt;p&gt;
Here is a limitation of our approach. This will not work because the generated shell command ends up being too long for the shell.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-insert &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt;
              (&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for contact in contacts
                    collect
                    (append `((desc . ,(car contact))) (cdr contact))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
So, we do them one at time here:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;let&lt;/span&gt; ((ct (current-time)))
  (&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for contact in contacts
        do
        (&lt;span style="color: #0000FF;"&gt;let&lt;/span&gt; ((output (mongo-insert &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt;
                                    (append `((desc . ,(car contact))) (cdr contact)))))
          (&lt;span style="color: #0000FF;"&gt;unless&lt;/span&gt; (= 1 (cdr (assoc 'nInserted output)))
            (&lt;span style="color: #ff0000; font-weight: bold;"&gt;warn&lt;/span&gt; &lt;span style="color: #008000;"&gt;"error: %S for %S"&lt;/span&gt; (cdr (assoc 'nInserted output)) contact))))
  (message &lt;span style="color: #008000;"&gt;"Elapsed time %.02f seconds"&lt;/span&gt; (float-time (time-since ct))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
Elapsed time 762.95 seconds
&lt;/pre&gt;

&lt;p&gt;
That took a little over 10 minutes to add. That seems long to me. This next step confirms that they were added.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-cmd &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"count()"&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
6047
&lt;/pre&gt;

&lt;p&gt;
Next we will compare some timing of finding data in the database vs looping through the cached contacts. Here is a timing macro to measure how long it takes to run a bit of code.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;&lt;span style="color: #8D8D84;"&gt;;; &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;http://stackoverflow.com/questions/23622296/emacs-timing-execution-of-function-calls-in-emacs-lisp&lt;/span&gt;
(&lt;span style="color: #0000FF;"&gt;defmacro&lt;/span&gt; &lt;span style="color: #006699;"&gt;measure-time&lt;/span&gt; (&lt;span style="color: #6434A3;"&gt;&amp;amp;rest&lt;/span&gt; body)
  &lt;span style="color: #036A07;"&gt;"Measure the time it takes to evaluate BODY."&lt;/span&gt;
  `(&lt;span style="color: #0000FF;"&gt;let&lt;/span&gt; ((time (current-time)))
     ,@body
     (message &lt;span style="color: #008000;"&gt;"%.06f seconds elapsed"&lt;/span&gt; (float-time (time-since time)))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
measure-time
&lt;/pre&gt;

&lt;p&gt;
Here is the old way I would extract data. Many contacts I have are academics, and I have stored their academic ranks in each contact.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for contact in contacts
      if (string= &lt;span style="color: #008000;"&gt;"Professor"&lt;/span&gt; (cdr (assoc &lt;span style="color: #008000;"&gt;"RANK"&lt;/span&gt; (cdr contact))))
      collect contact into professors
      if (string= &lt;span style="color: #008000;"&gt;"Associate Professor"&lt;/span&gt; (cdr (assoc &lt;span style="color: #008000;"&gt;"RANK"&lt;/span&gt; (cdr contact))))
      collect contact into associate-professors
      if (string= &lt;span style="color: #008000;"&gt;"Assistant Professor"&lt;/span&gt; (cdr (assoc &lt;span style="color: #008000;"&gt;"RANK"&lt;/span&gt; (cdr contact))))
      collect contact into assistant-professors
      finally return `((&lt;span style="color: #008000;"&gt;"Assistant Professor"&lt;/span&gt; ,(length assistant-professors))
                       (&lt;span style="color: #008000;"&gt;"Associate Professor"&lt;/span&gt; ,(length associate-professors))
                       (&lt;span style="color: #008000;"&gt;"Professor"&lt;/span&gt; ,(length professors))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;Assistant Professor&lt;/td&gt;
&lt;td class="org-right"&gt;313&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Associate Professor&lt;/td&gt;
&lt;td class="org-right"&gt;283&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Professor&lt;/td&gt;
&lt;td class="org-right"&gt;879&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
How long did it take to do that?
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;measure-time&lt;/span&gt;
 (&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for contact in contacts
       if (string= &lt;span style="color: #008000;"&gt;"Professor"&lt;/span&gt; (cdr (assoc &lt;span style="color: #008000;"&gt;"RANK"&lt;/span&gt; (cdr contact))))
       collect contact into professors
       if (string= &lt;span style="color: #008000;"&gt;"Associate Professor"&lt;/span&gt; (cdr (assoc &lt;span style="color: #008000;"&gt;"RANK"&lt;/span&gt; (cdr contact))))
       collect contact into associate-professors
       if (string= &lt;span style="color: #008000;"&gt;"Assistant Professor"&lt;/span&gt; (cdr (assoc &lt;span style="color: #008000;"&gt;"RANK"&lt;/span&gt; (cdr contact))))
       collect contact into assistant-professors
       finally return (list (length assistant-professors)
                            (length associate-professors)
                            (length professors))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.008772 seconds elapsed
&lt;/pre&gt;

&lt;p&gt;
Not long at all! Comparatively, it is very slow to get this information out of the mongodb, although considerably less code is required. That might not be surprising, considering the json parsing that has to get done here.
&lt;/p&gt;

&lt;p&gt;
Here is the equivalent code to extract that data from the database.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for rank in '(&lt;span style="color: #008000;"&gt;"Assistant Professor"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"Associate Professor"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"Professor"&lt;/span&gt;)
       collect (list rank (length (mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt;
                                              `((RANK . ,rank))))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;Assistant Professor&lt;/td&gt;
&lt;td class="org-right"&gt;313&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Associate Professor&lt;/td&gt;
&lt;td class="org-right"&gt;283&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Professor&lt;/td&gt;
&lt;td class="org-right"&gt;879&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
It is comparatively slow to do this. This requires three json parses, and profiling indicates that alot of the work is done in parsing the json. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;measure-time&lt;/span&gt;
 (&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for rank in '(&lt;span style="color: #008000;"&gt;"Assistant Professor"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"Associate Professor"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"Professor"&lt;/span&gt;)
       collect (list rank (length (mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt;
                                              `((RANK . ,rank)))))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
1.914817 seconds elapsed
&lt;/pre&gt;

&lt;p&gt;
Here is smarter way to do it that avoids the json parsing.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for rank in '(&lt;span style="color: #008000;"&gt;"Assistant Professor"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"Associate Professor"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"Professor"&lt;/span&gt;)
      collect (list rank (mongo-cmd &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"count(%s)"&lt;/span&gt;
                                    (json-encode `((RANK . ,rank))))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;Assistant Professor&lt;/td&gt;
&lt;td class="org-right"&gt;313&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Associate Professor&lt;/td&gt;
&lt;td class="org-right"&gt;283&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Professor&lt;/td&gt;
&lt;td class="org-right"&gt;879&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
And you can see here it is about 10 times faster, but not nearly as fast as running the lisp code on the cache.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;measure-time&lt;/span&gt;
 (&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for rank in '(&lt;span style="color: #008000;"&gt;"Assistant Professor"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"Associate Professor"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"Professor"&lt;/span&gt;)
       collect (list rank (mongo-cmd &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"count(%s)"&lt;/span&gt;
                                     (json-encode `((RANK . ,rank)))))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.349413 seconds elapsed
&lt;/pre&gt;

&lt;p&gt;
This is how you might integrate this into a completion command:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(ivy-read &lt;span style="color: #008000;"&gt;"choose: "&lt;/span&gt;
          (&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for c across (mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;""&lt;/span&gt;)
                collect
                (list (cdr (assoc 'desc c)) c)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
This is basically unusable though, because it takes so long to generate the candidates (over six seconds).
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;measure-time&lt;/span&gt;
 (&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for c across (mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;""&lt;/span&gt;)
       collect
       (list (cdr (assoc 'desc c)) c)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
6.228225 seconds elapsed
&lt;/pre&gt;

&lt;p&gt;
We can get back to usable by making the database do more work for us. Here, we simply make the database print a list of cons cells that we can read into lisp. We have to use a javascript function, with some escaping and quoting. The escaping was necessary because there is some bad data in the email field that messed up the cons cells, e.g. some things like "name" &amp;lt;email&amp;gt; with nested single and double quoting, etc., and the quoting was necessary to get cons cells of the form ("desc" . "email"), and finally we wrap them in parentheses and read back the list of cons cells. At about a quarter of a second, this is very usable to get a list of over 6000 candidates. It is still many times slower than working on the contacts list in memory though. I am not a super fan of the one-line javascript, and if it was much more complicated than this another strategy would probably be desirable.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;measure-time&lt;/span&gt;
 (read
  (concat
   &lt;span style="color: #008000;"&gt;"("&lt;/span&gt;
   (shell-command-to-string &lt;span style="color: #008000;"&gt;"mongo contacts --quiet --eval 'db.contacts.find().forEach(function (doc) {print(\"(\\\"\" + doc.desc + \"\\\" . \\\"\" + escape(doc.EMAIL) +\"\\\")\");})'"&lt;/span&gt;)
   &lt;span style="color: #008000;"&gt;")"&lt;/span&gt;)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.284730 seconds elapsed
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org97edcbc" class="outline-2"&gt;
&lt;h2 id="org97edcbc"&gt;&lt;span class="section-number-2"&gt;7&lt;/span&gt; Text searching&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-7"&gt;
&lt;p&gt;
Finally, let us make a text index to make searching easy. This allows us a very flexible search where we do not have to specify what field or use regular expressions. We setup the index on all the fields, so we can find entries that match even on fields that do not exist in all documents.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-cmd &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"createIndex(%s)"&lt;/span&gt; (json-encode '((&lt;span style="color: #008000;"&gt;"$**"&lt;/span&gt; . &lt;span style="color: #008000;"&gt;"text"&lt;/span&gt;))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
&lt;/pre&gt;

&lt;p&gt;
Now, let's use that to find the GOOGLE-SCHOLAR url of contacts matching the following query.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(mongo-find &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"contacts"&lt;/span&gt; '(($text . (($search . &lt;span style="color: #008000;"&gt;"\"Carnegie Mellon\""&lt;/span&gt;)))
                                    ($text . (($search . &lt;span style="color: #008000;"&gt;"\"John Kitchin\""&lt;/span&gt;))))
            '((GOOGLE-SCHOLAR . 1) (_id . 0)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;[((GOOGLE-SCHOLAR . &lt;span style="color: #008000;"&gt;"https://scholar.google.com/citations?hl=en&amp;amp;user=jD_4h7sAAAAJ"&lt;/span&gt;))
 nil nil]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
So, you can see there were three contacts, and one of them lists my google-scholar url.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;


&lt;div id="outline-container-org04fa0df" class="outline-2"&gt;
&lt;h2 id="org04fa0df"&gt;&lt;span class="section-number-2"&gt;8&lt;/span&gt; Summary&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-8"&gt;
&lt;p&gt;
This looks like the foundation of a mongo/emacs-lisp interface. This interface is not that fast though, and suffers from some limitations related to the use of the shell. Depending on the actual use, it is clear you can gain performance by passing some work on the database, which requires some javascript coding. Even that revealed some subtlety, e.g. making sure the database output text that was compatible with the lisp reader. That mostly means taking care of quotes, and other special characters, which I managed with a simple escape mechanism. It is probably worth investing a few more days in building an interface that uses a process and communicates with the mongo cli directly before moving forward with any significant application that uses Mongo in emacs. There are many good ideas for that:
&lt;/p&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;Index all your org files (e.g. &lt;a href="http://kitchingroup.cheme.cmu.edu/blog/2017/01/03/Find-stuff-in-org-mode-anywhere/"&gt;http://kitchingroup.cheme.cmu.edu/blog/2017/01/03/Find-stuff-in-org-mode-anywhere/&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Index all your bibtex files (e.g. &lt;a href="http://kitchingroup.cheme.cmu.edu/blog/2017/01/15/Querying-a-MongoDB-bibtex-database-with-Python-and-emacs-lisp/"&gt;http://kitchingroup.cheme.cmu.edu/blog/2017/01/15/Querying-a-MongoDB-bibtex-database-with-Python-and-emacs-lisp/&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Download RSS feeds into a searchable database&lt;/li&gt;
&lt;li&gt;Manage your contacts&lt;/li&gt;
&lt;li&gt;Index your email? mu and notmuch use xapian for this, but I have found they cannot search for things like hashtags. Maybe MongoDB would be better?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
The tradeoffs between this and sqlite are more clear now. With Mongo we do not have to create the normalized tables (although it is still a good idea to think about how to structure the documents, and if they should be a little normalized). It is &lt;i&gt;much&lt;/i&gt; easier to map lisp data structures to Mongo queries than it is to do that with SQL queries. On the other hand, it is necessary to do some javascript programming with Mongo to get some desired output. It still seems worth exploring further.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2017 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;
&lt;p&gt;&lt;a href="/org/2017/01/16/A-simple-emacs-lisp-interface-to-CRUD-operations-in-mongodb.org"&gt;org-mode source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Org-mode version = 9.0.3&lt;/p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Querying a MongoDB bibtex database with Python and emacs-lisp</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2017/01/15/Querying-a-MongoDB-bibtex-database-with-Python-and-emacs-lisp</link>
      <pubDate>Sun, 15 Jan 2017 10:36:22 EST</pubDate>
      <category><![CDATA[mongodb]]></category>
      <category><![CDATA[emacs]]></category>
      <category><![CDATA[database]]></category>
      <category><![CDATA[python]]></category>
      <guid isPermaLink="false">vHnWAM9JduXLE7-TL9gWquB7M2s=</guid>
      <description>Querying a MongoDB bibtex database with Python and emacs-lisp</description>
      <content:encoded><![CDATA[


&lt;div id="table-of-contents"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;div id="text-table-of-contents"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#org41df17a"&gt;1. text searching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org6dcd73e"&gt;2. Querying from emacs-lisp&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org6d7544a"&gt;3. Summary thoughts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;
I have been exploring &lt;a href="http://kitchingroup.cheme.cmu.edu/blog/2017/01/03/Find-stuff-in-org-mode-anywhere/"&gt;using databases&lt;/a&gt; to help with searching my data. In this post we explore using MongoDB for bibtex entries. I am choosing bibtex entries because it is easy to parse bibtex files, I already have a lot of them, and I have several kinds of queries I regularly use. So, they are a good candidate to test out a new database on!
&lt;/p&gt;

&lt;p&gt;
MongoDB is a noSQL database that is pretty easy to use. I installed it from homebrew, and then followed the directions to run the server.
&lt;/p&gt;

&lt;p&gt;
With pymongo you can make a database as easy as this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-python"&gt;&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; bibtexparser

&lt;span style="color: #8D8D84;"&gt;# &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;Read the bibtex file to get entries&lt;/span&gt;
&lt;span style="color: #0000FF;"&gt;with&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;open&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'../../../Dropbox/bibliography/references.bib'&lt;/span&gt;, &lt;span style="color: #008000;"&gt;'r'&lt;/span&gt;) &lt;span style="color: #0000FF;"&gt;as&lt;/span&gt; bibfile:
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;bp&lt;/span&gt; = bibtexparser.load(bibfile)
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;entries&lt;/span&gt; = bp.entries

&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;"N = "&lt;/span&gt;, &lt;span style="color: #006FE0;"&gt;len&lt;/span&gt;(entries))

&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(entries[0])

&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; pymongo
&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; pymongo &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; MongoClient
&lt;span style="color: #BA36A5;"&gt;client&lt;/span&gt; = MongoClient(&lt;span style="color: #008000;"&gt;'localhost'&lt;/span&gt;, 27017)

&lt;span style="color: #8D8D84;"&gt;# &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;This creates the "entries" collection&lt;/span&gt;
&lt;span style="color: #BA36A5;"&gt;db&lt;/span&gt; = client[&lt;span style="color: #008000;"&gt;'bibtex'&lt;/span&gt;].entries

&lt;span style="color: #8D8D84;"&gt;# &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;add each entry&lt;/span&gt;
&lt;span style="color: #0000FF;"&gt;for&lt;/span&gt; entry &lt;span style="color: #0000FF;"&gt;in&lt;/span&gt; entries:
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   db.insert_one(entry)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
N =  1671
{'keyword': 'test, word', 'year': '2006', 'publisher': 'American Chemical Society (ACS)', 'title': 'The ACS Style Guide', 'ENTRYTYPE': 'book', 'editor': 'Janet S. Dodd', 'address': 'Washington, D.C.', 'ID': '2006-acs-style-guide', 'doi': '10.1021/bk-2006-styg', 'link': 'https://doi.org/10.1021/bk-2006-STYG', 'date_added': 'Wed Apr  1 10:17:54 2015', 'pages': 'nil'}
&lt;/p&gt;

&lt;p&gt;
That was easy. We have a database with 1671 documents in it, and each document is essentially a dictionary of key-value pairs. You might even argue it was too easy. I didn't specify any structure to the entries at all. No required fields, no validation that the keys are spelled correctly, no validation on the values, e.g. you can see the year looks like a string. The benefit of that is that every entry went in, with no issues. On the other hand, the authors went in as a single string, as did the keywords, which affects our ability to search a little bit later. Note if you run that twice, it will add each entry again, since we do not check if the entry already exists.
&lt;/p&gt;

&lt;p&gt;
A database is only useful though if it is easy to get stuff out of it. So, let's consider some test queries. First we find entries that have years less than 1950. The query is basically a little json bundle that describes a field and condition that we want to match. Here we use a less than operator, ""$lt"The results come back as a list of dictionaries. This is in stark contrast to a SQL query which is an expression in its own declarative language. A query here is a chunk of data that must get converted to code by the server. I am not 100% clear if the less than here is in the string sense or numeric sense, but for years it probably does not matter for a long time.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-python"&gt;&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; pymongo
&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; pymongo &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; MongoClient
&lt;span style="color: #BA36A5;"&gt;client&lt;/span&gt; = MongoClient(&lt;span style="color: #008000;"&gt;'localhost'&lt;/span&gt;, 27017)

&lt;span style="color: #BA36A5;"&gt;db&lt;/span&gt; = client[&lt;span style="color: #008000;"&gt;'bibtex'&lt;/span&gt;].entries

&lt;span style="color: #0000FF;"&gt;for&lt;/span&gt; i, result &lt;span style="color: #0000FF;"&gt;in&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;enumerate&lt;/span&gt;(db.find({&lt;span style="color: #008000;"&gt;"year"&lt;/span&gt; : {&lt;span style="color: #008000;"&gt;"$lt"&lt;/span&gt;: &lt;span style="color: #008000;"&gt;"1950"&lt;/span&gt;}})):
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'{i: 2d}. {author}, {title}, {journal}, {year}.'&lt;/span&gt;.&lt;span style="color: #006FE0;"&gt;format&lt;/span&gt;(i=i+1, **result))
&lt;/pre&gt;
&lt;/div&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;Birch, Francis, Finite Elastic Strain of Cubic Crystals, Phys. Rev., 1947.&lt;/li&gt;
&lt;li&gt;Ditchburn, R. W. and Gilmour, J. C., The Vapor Pressures of Monatomic Vapors, Rev. Mod. Phys., 1941.&lt;/li&gt;
&lt;li&gt;J. Korringa, On the Calculation of the Energy of a Bloch Wave in a Metal, Physica, 1947.&lt;/li&gt;
&lt;li&gt;Nix, F. C. and MacNair, D., The Thermal Expansion of Pure Metals. {II}: Molybdenum, Palladium, Silver, Tantalum, Tungsten, Platinum, and Lead, Phys. Rev., 1942.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;
That seems easy enough, and those strings could easily be used as candidates for a selection tool like helm.
&lt;/p&gt;

&lt;p&gt;
How about articles published by myself and my student Jacob Boes? This requires "and" logic. Apparently that is the default, so we just add three queries. One is an exact match on articles, and the other two are case-insensitive regular expression matches.  I guess this has to be done on every document, since there probably is no way to index a regex match! This search was very fast, but it is not clear how fast it would be for a million entries. This matching is necessary because we stored all authors in a single field rather than splitting them into an array. We might still have to match strings for this even in an array since an author might then be "John R. Kitchin", rather than further decomposed into first and last names.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-python"&gt;&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; pymongo
&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; pymongo &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; MongoClient
&lt;span style="color: #BA36A5;"&gt;client&lt;/span&gt; = MongoClient(&lt;span style="color: #008000;"&gt;'localhost'&lt;/span&gt;, 27017)

&lt;span style="color: #BA36A5;"&gt;db&lt;/span&gt; = client[&lt;span style="color: #008000;"&gt;'bibtex'&lt;/span&gt;]
&lt;span style="color: #BA36A5;"&gt;entries&lt;/span&gt; = db[&lt;span style="color: #008000;"&gt;'entries'&lt;/span&gt;]

&lt;span style="color: #0000FF;"&gt;for&lt;/span&gt; i, result &lt;span style="color: #0000FF;"&gt;in&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;enumerate&lt;/span&gt;(entries.find({&lt;span style="color: #008000;"&gt;"ENTRYTYPE"&lt;/span&gt;: &lt;span style="color: #008000;"&gt;"article"&lt;/span&gt;,
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;&lt;span style="color: #008000;"&gt;"author"&lt;/span&gt; : {&lt;span style="color: #008000;"&gt;"$regex"&lt;/span&gt;: &lt;span style="color: #008000;"&gt;"kitchin"&lt;/span&gt;, &lt;span style="color: #008000;"&gt;'$options'&lt;/span&gt; : &lt;span style="color: #008000;"&gt;'i'&lt;/span&gt;},
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;&lt;span style="color: #008000;"&gt;"author"&lt;/span&gt; : {&lt;span style="color: #008000;"&gt;"$regex"&lt;/span&gt;: &lt;span style="color: #008000;"&gt;"boes"&lt;/span&gt;, &lt;span style="color: #008000;"&gt;'$options'&lt;/span&gt; : &lt;span style="color: #008000;"&gt;'i'&lt;/span&gt;}})):
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #0000FF;"&gt;if&lt;/span&gt; result.get(&lt;span style="color: #008000;"&gt;'doi'&lt;/span&gt;, &lt;span style="color: #D0372D;"&gt;None&lt;/span&gt;):
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;result&lt;/span&gt;[&lt;span style="color: #008000;"&gt;'doi'&lt;/span&gt;] = &lt;span style="color: #008000;"&gt;'https://doi.org/{doi}'&lt;/span&gt;.&lt;span style="color: #006FE0;"&gt;format&lt;/span&gt;(doi=result[&lt;span style="color: #008000;"&gt;'doi'&lt;/span&gt;])
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #0000FF;"&gt;else&lt;/span&gt;:
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #BA36A5;"&gt;result&lt;/span&gt;[&lt;span style="color: #008000;"&gt;'doi'&lt;/span&gt;] = &lt;span style="color: #008000;"&gt;''&lt;/span&gt;
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'{i: 2d}. {author}, {title}, {journal}, {year}. {doi}'&lt;/span&gt;.&lt;span style="color: #006FE0;"&gt;format&lt;/span&gt;(i=i+1, **result).replace(&lt;span style="color: #008000;"&gt;"\n"&lt;/span&gt;, &lt;span style="color: #008000;"&gt;""&lt;/span&gt;))
&lt;/pre&gt;
&lt;/div&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;Jacob R. Boes and Peter Kondratyuk and Chunrong Yin and JamesB. Miller and Andrew J. Gellman and John R. Kitchin, Core Level Shifts in {Cu-Pd} Alloys As a Function of BulkComposition and Structure, Surface Science, 2015. &lt;a href="https://doi.org/10.1016/j.susc.2015.02.011"&gt;https://doi.org/10.1016/j.susc.2015.02.011&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Estimating Bulk-Composition-Dependent \ce{H2} AdsorptionEnergies on \ce{Cu&lt;sub&gt;x&lt;/sub&gt;Pd&lt;sub&gt;1-x&lt;/sub&gt;} Alloy (111) Surfaces, ACS Catalysis, 2015. &lt;a href="https://doi.org/10.1021/cs501585k"&gt;https://doi.org/10.1021/cs501585k&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Supporting Information: Estimating Bulk-Composition-Dependent\ce{H2} Adsorption Energies on \ce{Cu&lt;sub&gt;x&lt;/sub&gt;Pd&lt;sub&gt;1-x&lt;/sub&gt;} Alloy (111)Surfaces, ACS Catalysis, 2015. &lt;a href="https://doi.org/10.1021/cs501585k"&gt;https://doi.org/10.1021/cs501585k&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;G. Gumuslu and P. Kondratyuk and J. R. Boes and B. Morrealeand J. B. Miller and J. R. Kitchin and A. J. Gellman, Correlation of Electronic Structure With Catalytic Activity:\ce{H2}-\ce{D2} Exchange Across \ce{Cu&lt;sub&gt;x&lt;/sub&gt;Pd&lt;sub&gt;1-x&lt;/sub&gt;}Composition Space, ACS Catalysis, 2015. &lt;a href="https://doi.org/10.1021/cs501586t"&gt;https://doi.org/10.1021/cs501586t&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;John D. Michael and Ethan L. Demeter and Steven M. Illes andQingqi Fan and Jacob R. Boes and John R. Kitchin, Alkaline Electrolyte and {Fe} Impurity Effects on thePerformance and Active-Phase Structure of {NiOOH} Thin Filmsfor {OER} Catalysis Applications, J. Phys. Chem. C, 2015. &lt;a href="https://doi.org/10.1021/acs.jpcc.5b02458"&gt;https://doi.org/10.1021/acs.jpcc.5b02458&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Mitchell C. Groenenboom and John A. Keithand John R. Kitchin, Neural Network and {Reaxff} Comparison for {Au} Properties, Int. J. Quantum Chem., 2016. &lt;a href="https://doi.org/10.1002/qua.25115"&gt;https://doi.org/10.1002/qua.25115&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and John R. Kitchin, Neural Network Predictions of Oxygen Interactions on a Dynamic Pd Surface, Molecular Simulation, Accepted 12/2016. &lt;a href="https://doi.org/10.1080/08927022.2016.1274984"&gt;https://doi.org/10.1080/08927022.2016.1274984&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and John R. Kitchin, Modeling Segregation on {AuPd}(111) Surfaces With DensityFunctional Theory and Monte Carlo Simulations, Submitted to J. Phys. Chem. C, 2016.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
We can find out how many different entry types we have, as well as how many distinct keyword entries there are. The documents do not separate the keywords though, so this is just the unique strings of comma-separated keywords values. We would have had to split those in advance to have a list of keywords to search for a specific one beyond string matching. Curiously, in my bibtex entries, these are in a field called "keywords". It appears the bibtex parser may have changed the name to "keyword".
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-python"&gt;&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; pymongo
&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; pymongo &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; MongoClient
&lt;span style="color: #BA36A5;"&gt;client&lt;/span&gt; = MongoClient(&lt;span style="color: #008000;"&gt;'localhost'&lt;/span&gt;, 27017)

&lt;span style="color: #BA36A5;"&gt;db&lt;/span&gt; = client[&lt;span style="color: #008000;"&gt;'bibtex'&lt;/span&gt;]
&lt;span style="color: #BA36A5;"&gt;entries&lt;/span&gt; = db[&lt;span style="color: #008000;"&gt;'entries'&lt;/span&gt;]

&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(entries.distinct(&lt;span style="color: #008000;"&gt;"ENTRYTYPE"&lt;/span&gt;))
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #006FE0;"&gt;len&lt;/span&gt;(entries.distinct(&lt;span style="color: #008000;"&gt;"keyword"&lt;/span&gt;)))
&lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(entries.find({&lt;span style="color: #008000;"&gt;"keyword"&lt;/span&gt;: {&lt;span style="color: #008000;"&gt;"$exists"&lt;/span&gt;: &lt;span style="color: #008000;"&gt;"true"&lt;/span&gt;}})[22][&lt;span style="color: #008000;"&gt;'keyword'&lt;/span&gt;])
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
['book', 'article', 'techreport', 'phdthesis', 'inproceedings', 'inbook', 'mastersthesis', 'misc', 'incollection']
176
Bildungsw{\"a}rmen, Dichtefunktionalrechnungen, Perowskite, Thermochemie
&lt;/p&gt;

&lt;div id="outline-container-org41df17a" class="outline-2"&gt;
&lt;h2 id="org41df17a"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; text searching&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
You can do text search as well. You first have to create an index on one or more fields, and then use the $text and $search operators. Here I made an index on a few fields, and then searched on it. Note that you can only have one text index, so think about it in advance! This simplifies the query a bit, we do not have to use the regex syntax for matching on a field.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-python"&gt;&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; pymongo
&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; pymongo &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; MongoClient
&lt;span style="color: #BA36A5;"&gt;client&lt;/span&gt; = MongoClient(&lt;span style="color: #008000;"&gt;'localhost'&lt;/span&gt;, 27017)

&lt;span style="color: #BA36A5;"&gt;db&lt;/span&gt; = client[&lt;span style="color: #008000;"&gt;'bibtex'&lt;/span&gt;]
&lt;span style="color: #BA36A5;"&gt;entries&lt;/span&gt; = db[&lt;span style="color: #008000;"&gt;'entries'&lt;/span&gt;]

entries.create_index([(&lt;span style="color: #008000;"&gt;'author'&lt;/span&gt;, pymongo.TEXT),
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt; (&lt;span style="color: #008000;"&gt;'title'&lt;/span&gt;, pymongo.TEXT),
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt; (&lt;span style="color: #008000;"&gt;'keyword'&lt;/span&gt;, pymongo.TEXT)], sparse=&lt;span style="color: #D0372D;"&gt;True&lt;/span&gt;)

&lt;span style="color: #0000FF;"&gt;for&lt;/span&gt; i, result &lt;span style="color: #0000FF;"&gt;in&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;enumerate&lt;/span&gt;(entries.find({&lt;span style="color: #008000;"&gt;"$text"&lt;/span&gt; : {&lt;span style="color: #008000;"&gt;"$search"&lt;/span&gt;: &lt;span style="color: #008000;"&gt;"kitchin"&lt;/span&gt;, &lt;span style="color: #008000;"&gt;"$search"&lt;/span&gt;: &lt;span style="color: #008000;"&gt;"boes"&lt;/span&gt;}})):
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'{i: 2d}. {author}, {title}, {journal}, {year}.'&lt;/span&gt;.&lt;span style="color: #006FE0;"&gt;format&lt;/span&gt;(i=i, **result).replace(&lt;span style="color: #008000;"&gt;"\n"&lt;/span&gt;, &lt;span style="color: #008000;"&gt;""&lt;/span&gt;))
&lt;/pre&gt;
&lt;/div&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;G. Gumuslu and P. Kondratyuk and J. R. Boes and B. Morrealeand J. B. Miller and J. R. Kitchin and A. J. Gellman, Correlation of Electronic Structure With Catalytic Activity:\ce{H2}-\ce{D2} Exchange Across \ce{Cu&lt;sub&gt;x&lt;/sub&gt;Pd&lt;sub&gt;1-x&lt;/sub&gt;}Composition Space, ACS Catalysis, 2015.&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Peter Kondratyuk and Chunrong Yin and JamesB. Miller and Andrew J. Gellman and John R. Kitchin, Core Level Shifts in {Cu-Pd} Alloys As a Function of BulkComposition and Structure, Surface Science, 2015.&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Estimating Bulk-Composition-Dependent \ce{H2} AdsorptionEnergies on \ce{Cu&lt;sub&gt;x&lt;/sub&gt;Pd&lt;sub&gt;1-x&lt;/sub&gt;} Alloy (111) Surfaces, ACS Catalysis, 2015.&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and John R. Kitchin, Neural Network Predictions of Oxygen Interactions on a Dynamic Pd Surface, Molecular Simulation, Accepted 12/2016.&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and John R. Kitchin, Modeling Segregation on {AuPd}(111) Surfaces With DensityFunctional Theory and Monte Carlo Simulations, Submitted to J. Phys. Chem. C, 2016.&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Supporting Information: Estimating Bulk-Composition-Dependent\ce{H2} Adsorption Energies on \ce{Cu&lt;sub&gt;x&lt;/sub&gt;Pd&lt;sub&gt;1-x&lt;/sub&gt;} Alloy (111)Surfaces, ACS Catalysis, 2015.&lt;/li&gt;
&lt;li&gt;John D. Michael and Ethan L. Demeter and Steven M. Illes andQingqi Fan and Jacob R. Boes and John R. Kitchin, Alkaline Electrolyte and {Fe} Impurity Effects on thePerformance and Active-Phase Structure of {NiOOH} Thin Filmsfor {OER} Catalysis Applications, J. Phys. Chem. C, 2015.&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Mitchell C. Groenenboom and John A. Keithand John R. Kitchin, Neural Network and {Reaxff} Comparison for {Au} Properties, Int. J. Quantum Chem., 2016.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
We can use this to search for documents with orgmode in a keyword or title too.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-python"&gt;&lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; pymongo
&lt;span style="color: #0000FF;"&gt;from&lt;/span&gt; pymongo &lt;span style="color: #0000FF;"&gt;import&lt;/span&gt; MongoClient
&lt;span style="color: #BA36A5;"&gt;client&lt;/span&gt; = MongoClient(&lt;span style="color: #008000;"&gt;'localhost'&lt;/span&gt;, 27017)

&lt;span style="color: #BA36A5;"&gt;db&lt;/span&gt; = client[&lt;span style="color: #008000;"&gt;'bibtex'&lt;/span&gt;]
&lt;span style="color: #BA36A5;"&gt;entries&lt;/span&gt; = db[&lt;span style="color: #008000;"&gt;'entries'&lt;/span&gt;]

entries.create_index([(&lt;span style="color: #008000;"&gt;'author'&lt;/span&gt;, pymongo.TEXT),
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt; (&lt;span style="color: #008000;"&gt;'title'&lt;/span&gt;, pymongo.TEXT),
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt; (&lt;span style="color: #008000;"&gt;'keyword'&lt;/span&gt;, pymongo.TEXT)], sparse=&lt;span style="color: #D0372D;"&gt;True&lt;/span&gt;)

&lt;span style="color: #0000FF;"&gt;for&lt;/span&gt; i, result &lt;span style="color: #0000FF;"&gt;in&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;enumerate&lt;/span&gt;(entries.find({&lt;span style="color: #008000;"&gt;"$text"&lt;/span&gt; : {&lt;span style="color: #008000;"&gt;"$search"&lt;/span&gt;: &lt;span style="color: #008000;"&gt;"orgmode"&lt;/span&gt;}})):
&lt;span style="color: #9B9B9B; background-color: #EDEDED;"&gt; &lt;/span&gt;   &lt;span style="color: #0000FF;"&gt;print&lt;/span&gt;(&lt;span style="color: #008000;"&gt;'{i: 2d}. {author}, {title}, {journal}, {year}.'&lt;/span&gt;.&lt;span style="color: #006FE0;"&gt;format&lt;/span&gt;(i=i, **result).replace(&lt;span style="color: #008000;"&gt;"\n"&lt;/span&gt;, &lt;span style="color: #008000;"&gt;""&lt;/span&gt;))
&lt;/pre&gt;
&lt;/div&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;John R. Kitchin, Data Sharing in Surface Science, Surface Science, 2016.&lt;/li&gt;
&lt;li&gt;Zhongnan Xu and John R. Kitchin, Probing the Coverage Dependence of Site and AdsorbateConfigurational Correlations on (111) Surfaces of LateTransition Metals, J. Phys. Chem. C, 2014.&lt;/li&gt;
&lt;li&gt;Xu, Zhongnan and Rossmeisl, Jan and Kitchin, John R., A Linear Response {DFT}+{U} Study of Trends in the OxygenEvolution Activity of Transition Metal Rutile Dioxides, The Journal of Physical Chemistry C, 2015.&lt;/li&gt;
&lt;li&gt;Prateek Mehta and Paul A. Salvador and John R. Kitchin, Identifying Potential \ce{BO2} Oxide Polymorphs for EpitaxialGrowth Candidates, ACS Appl. Mater. Interfaces, 2015.&lt;/li&gt;
&lt;li&gt;Xu, Zhongnan and Joshi, Yogesh V. and Raman, Sumathy andKitchin, John R., Accurate Electronic and Chemical Properties of 3d TransitionMetal Oxides Using a Calculated Linear Response {U} and a {DFT+ U(V)} Method, The Journal of Chemical Physics, 2015.&lt;/li&gt;
&lt;li&gt;Zhongnan Xu and John R. Kitchin, Relationships Between the Surface Electronic and ChemicalProperties of Doped 4d and 5d Late Transition Metal Dioxides, The Journal of Chemical Physics, 2015.&lt;/li&gt;
&lt;li&gt;Zhongnan Xu and John R Kitchin, Tuning Oxide Activity Through Modification of the Crystal andElectronic Structure: From Strain To Potential Polymorphs, Phys. Chem. Chem. Phys., 2015.&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Supporting Information: Estimating Bulk-Composition-Dependent\ce{H2} Adsorption Energies on \ce{Cu&lt;sub&gt;x&lt;/sub&gt;Pd&lt;sub&gt;1-x&lt;/sub&gt;} Alloy (111)Surfaces, ACS Catalysis, 2015.&lt;/li&gt;
&lt;li&gt;Kitchin, John R., Examples of Effective Data Sharing in Scientific Publishing, ACS Catalysis, 2015.&lt;/li&gt;
&lt;li&gt;Curnan, Matthew T. and Kitchin, John R., Effects of Concentration, Crystal Structure, Magnetism, andElectronic Structure Method on First-Principles Oxygen VacancyFormation Energy Trends in Perovskites, The Journal of Physical Chemistry C, 2014.&lt;/li&gt;
&lt;li&gt;Kitchin, John R. and Van Gulick, Ana E. and Zilinski, Lisa D., Automating Data Sharing Through Authoring Tools, International Journal on Digital Libraries, 2016.&lt;/li&gt;
&lt;li&gt;Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Estimating Bulk-Composition-Dependent \ce{H2} AdsorptionEnergies on \ce{Cu&lt;sub&gt;x&lt;/sub&gt;Pd&lt;sub&gt;1-x&lt;/sub&gt;} Alloy (111) Surfaces, ACS Catalysis, 2015.&lt;/li&gt;
&lt;li&gt;Zhongnan Xu and John R. Kitchin, Relating the Electronic Structure and Reactivity of the 3dTransition Metal Monoxide Surfaces, Catalysis Communications, 2014.&lt;/li&gt;
&lt;li&gt;Spencer D. Miller and Vladimir V. Pushkarev and AndrewJ. Gellman and John R. Kitchin, Simulating Temperature Programmed Desorption of Oxygen on{P}t(111) Using {DFT} Derived Coverage Dependent DesorptionBarriers, Topics in Catalysis, 2014.&lt;/li&gt;
&lt;li&gt;Hallenbeck, Alexander P. and Kitchin, John R., Effects of \ce{O_2} and \ce{SO_2} on the Capture Capacity of aPrimary-Amine Based Polymeric \ce{CO_2} Sorbent, Industrial \&amp;amp; Engineering Chemistry Research, 2013.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org6dcd73e" class="outline-2"&gt;
&lt;h2 id="org6dcd73e"&gt;&lt;span class="section-number-2"&gt;2&lt;/span&gt; Querying from emacs-lisp&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
It is hard to get too excited about this if it is not easy to query from emacs and get data in a form we can use in emacs ;) The json library allows us to convert lisp data structures to json pretty easily. For example:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;require&lt;/span&gt; '&lt;span style="color: #D0372D;"&gt;json&lt;/span&gt;)

(json-encode '((ENTRYTYPE . article)
               (author . (($regex . kitchin)
                          ($options . i)))
               (author . (($regex . boes)
                          ($options . i)))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
{"ENTRYTYPE":"article","author":{"$regex":"kitchin","$options":"i"},"author":{"$regex":"boes","$options":"i"}}
&lt;/pre&gt;

&lt;p&gt;
So, we can use an a-list syntax to build up the query. Then we can send it to mongo using mongoexport that will return a json string that we can read back into emacs to get lisp data. Here is an example that returns a query. We print the first element here.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(pp
 (aref (json-read-from-string
        (shell-command-to-string
         (format &lt;span style="color: #008000;"&gt;"mongoexport --quiet --jsonArray -d bibtex -c entries -q '%s'"&lt;/span&gt;
                 (json-encode '((ENTRYTYPE . article)
                                (author . (($regex . kitchin)
                                           ($options . i)))
                                (author . (($regex . boes)
                                           ($options . i))))))))
       0))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
((_id
  ($oid . "5878d9644c114f59fe86cb36"))
 (author . "Jacob R. Boes and Peter Kondratyuk and Chunrong Yin and James\nB. Miller and Andrew J. Gellman and John R. Kitchin")
 (year . "2015")
 (title . "Core Level Shifts in {Cu-Pd} Alloys As a Function of Bulk\nComposition and Structure")
 (ENTRYTYPE . "article")
 (ID . "boes-2015-core-cu")
 (keyword . "DESC0004031, early-career")
 (volume . "640")
 (doi . "10.1016/j.susc.2015.02.011")
 (link . "https://doi.org/10.1016/j.susc.2015.02.011")
 (issn . "0039-6028")
 (journal . "Surface Science")
 (pages . "127-132"))
&lt;/pre&gt;

&lt;p&gt;
That is pretty sweet, we get a lisp data structure we can use. We can wrap that into a reasonable looking function here:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;defun&lt;/span&gt; &lt;span style="color: #006699;"&gt;mongo-find&lt;/span&gt; (db collection query)
  (json-read-from-string
   (shell-command-to-string
    (format &lt;span style="color: #008000;"&gt;"mongoexport --quiet --jsonArray -d %s -c %s -q '%s'"&lt;/span&gt;
            db collection (json-encode query)))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
mongo-find
&lt;/pre&gt;

&lt;p&gt;
Now we can use the function to query the database, and then format the results. Here we look at the example of articles with authors that match "kitchin" and "boes".
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(&lt;span style="color: #0000FF;"&gt;loop&lt;/span&gt; for counter from 1 for entry across
      (mongo-find &lt;span style="color: #008000;"&gt;"bibtex"&lt;/span&gt; &lt;span style="color: #008000;"&gt;"entries"&lt;/span&gt; '((ENTRYTYPE . article)
                                       (author . (($regex . kitchin)
                                                  ($options . i)))
                                       (author . (($regex . boes)
                                                  ($options . i)))))
      do
      (&lt;span style="color: #0000FF;"&gt;setq&lt;/span&gt; entry (append `(,(cons &lt;span style="color: #008000;"&gt;"counter"&lt;/span&gt; counter)) entry))
      &lt;span style="color: #8D8D84;"&gt;;; &lt;/span&gt;&lt;span style="color: #8D8D84; font-style: italic;"&gt;make sure we have a doi field.&lt;/span&gt;
      (&lt;span style="color: #0000FF;"&gt;if&lt;/span&gt; (assoc 'doi entry)
          (&lt;span style="color: #0000FF;"&gt;push&lt;/span&gt; (cons &lt;span style="color: #008000;"&gt;"doi"&lt;/span&gt; (format &lt;span style="color: #008000;"&gt;"https://doi.org/%s"&lt;/span&gt; (cdr (assoc 'doi entry)))) entry)
        (&lt;span style="color: #0000FF;"&gt;push&lt;/span&gt; (cons &lt;span style="color: #008000;"&gt;"doi"&lt;/span&gt; &lt;span style="color: #008000;"&gt;""&lt;/span&gt;) entry))
      concat
      (concat (replace-regexp-in-string
               &lt;span style="color: #008000;"&gt;"\n"&lt;/span&gt; &lt;span style="color: #008000;"&gt;" "&lt;/span&gt;
               (s-format &lt;span style="color: #008000;"&gt;"${counter}. ${author}, ${title} (${year}). ${doi}"&lt;/span&gt;
                         'aget entry)) &lt;span style="color: #008000;"&gt;"\n"&lt;/span&gt;))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
1. Jacob R. Boes and Peter Kondratyuk and Chunrong Yin and James B. Miller and Andrew J. Gellman and John R. Kitchin, Core Level Shifts in {Cu-Pd} Alloys As a Function of Bulk Composition and Structure (2015). https://doi.org/10.1016/j.susc.2015.02.011
2. Jacob R. Boes and Gamze Gumuslu and James B. Miller and Andrew J. Gellman and John R. Kitchin, Estimating Bulk-Composition-Dependent \ce{H2} Adsorption Energies on \ce{Cu_{x}Pd_{1-x}} Alloy (111) Surfaces (2015). https://doi.org/10.1021/cs501585k
3. Jacob R. Boes and Gamze Gumuslu and James B. Miller and Andrew J. Gellman and John R. Kitchin, Supporting Information: Estimating Bulk-Composition-Dependent \ce{H2} Adsorption Energies on \ce{Cu_{x}Pd_{1-x}} Alloy (111) Surfaces (2015). https://doi.org/10.1021/cs501585k
4. G. Gumuslu and P. Kondratyuk and J. R. Boes and B. Morreale and J. B. Miller and J. R. Kitchin and A. J. Gellman, Correlation of Electronic Structure With Catalytic Activity: \ce{H2}-\ce{D2} Exchange Across \ce{Cu_{x}Pd_{1-x}} Composition Space (2015). https://doi.org/10.1021/cs501586t
5. John D. Michael and Ethan L. Demeter and Steven M. Illes and Qingqi Fan and Jacob R. Boes and John R. Kitchin, Alkaline Electrolyte and {Fe} Impurity Effects on the Performance and Active-Phase Structure of {NiOOH} Thin Films for {OER} Catalysis Applications (2015). https://doi.org/10.1021/acs.jpcc.5b02458
6. Jacob R. Boes and Mitchell C. Groenenboom and John A. Keith and John R. Kitchin, Neural Network and {Reaxff} Comparison for {Au} Properties (2016). https://doi.org/10.1002/qua.25115
7. Jacob R. Boes and John R. Kitchin, Neural Network Predictions of Oxygen Interactions on a Dynamic Pd Surface (Accepted 12/2016). https://doi.org/10.1080/08927022.2016.1274984
8. Jacob R. Boes and John R. Kitchin, Modeling Segregation on {AuPd}(111) Surfaces With Density Functional Theory and Monte Carlo Simulations (2016).
&lt;/pre&gt;

&lt;p&gt;
Wow, that looks like a pretty lispy way to query the database and use the results. It is probably pretty easy to do similar things for inserting and updating documents. I will save that for another day.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org6d7544a" class="outline-2"&gt;
&lt;h2 id="org6d7544a"&gt;&lt;span class="section-number-2"&gt;3&lt;/span&gt; Summary thoughts&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
This is not an exhaustive study of Mongo for a bibtex database. It does illustrate that it is potentially useful. Imagine a group of users can enter bibtex entries, and then share them through a central server. Or you query the server for entries and then select them using helm/ivy. That is probably faster than parsing large bibtex files (note, in org-ref I already cache the files in parsed form for performance reasons!).
&lt;/p&gt;

&lt;p&gt;
It would make sense to split the authors, and keywords in another version of this database. It also could make sense to have a field that is the bibtex string, and to do text search on that string. That way you get everything in the entry for searching, and an easy way to generate bibtex files without having to reconstruct them.
&lt;/p&gt;

&lt;p&gt;
It is especially interesting to run the queries through emacs-lisp since we get the benefit of editing lisp code while writing the query, e.g. parenthesis navigation, less quoting, etc&amp;#x2026; and we get back lisp data that can be used to construct helm/ivy queries, or other emacs things. That makes this look competitive with emacsql at least for the syntax. I predict that there will be more posts on this in the future.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2017 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;
&lt;p&gt;&lt;a href="/org/2017/01/15/Querying-a-MongoDB-bibtex-database-with-Python-and-emacs-lisp.org"&gt;org-mode source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Org-mode version = 9.0.3&lt;/p&gt;
]]></content:encoded>
    </item>
    <item>
      <title>Find stuff in org-mode anywhere</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2017/01/03/Find-stuff-in-org-mode-anywhere</link>
      <pubDate>Tue, 03 Jan 2017 14:33:32 EST</pubDate>
      <category><![CDATA[orgmode]]></category>
      <category><![CDATA[emacs]]></category>
      <category><![CDATA[database]]></category>
      <guid isPermaLink="false">EKN_0P_B1jUiutFJ8PoFGyIL9aY=</guid>
      <description>Find stuff in org-mode anywhere</description>
      <content:encoded><![CDATA[


&lt;div id="table-of-contents"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;div id="text-table-of-contents"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#org961d2be"&gt;1. The database design&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#orgbda3471"&gt;2. Querying the link table&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org8284133"&gt;3. Headline queries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org7d75505"&gt;4. Keyword queries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org08feb51"&gt;5. Full text search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#orgbb3d71f"&gt;6. Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;
I use org-mode &lt;i&gt;extensively&lt;/i&gt;. I write scientific papers, keep notes on meetings, write letters of recommendation, notes on scientific articles, keep TODO lists in projects, help files for software, write lecture notes, students send me homework solutions in it, it is a contact database, &amp;#x2026; Some files are on Dropbox, Google Drive, Box, some in git repos, etc. The problem is that leads to org-files everywhere on my hard drive. At this point I have several thousand org-files that span about five years of work.
&lt;/p&gt;

&lt;p&gt;
It is not that easy after a while to find them. Yes there are things like recent-files, bookmarks, counsel-find-file, helm-for-files, counsel/helm-locate, helm/counsel-grep/ag/pt, projectile for searching within a project, a slew of tools to search open buffers, there is &lt;a href="https://www.lesbonscomptes.com/recoll/"&gt;recoll&lt;/a&gt;, etc&amp;#x2026; There are desktop search tools, and of course, good organization habits. Over a five year time span though, these change, and I have yet to find a solution to finding what I want. What about a file I made a year ago that is not in the current directory or this project, and not in my org-agenda-files list? How do I get a dynamic todo list across all these files? Or find all the files that cite a particular bibtex entry, or that were authored by a particular student?
&lt;/p&gt;

&lt;p&gt;
Previously, &lt;a href="http://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/"&gt;I indexed org files with Swish-e&lt;/a&gt; to make it easy to search them, with an ability to search just headlines, or paragraphs, etc. The problem with that is the nightly indexing was slow since I basically had to regenerate the database each time due to limitations in Swish-e. Finally I have gotten around to the next iteration of this idea, which is a better database. In this post, I explore using sqlite to store headlines and links in org-files.
&lt;/p&gt;

&lt;p&gt;
The idea is that anytime I open or save &lt;i&gt;any&lt;/i&gt; org file, it will be added/updated in the database. The database will store the headlines and its properties and content, as well as the location and properties of all links and file keywords. That means I should be able to efficiently query all org files I have ever visited to find TODO headlines, tagged headlines, different types of links, etc. Here we try it out and see if it is useful.
&lt;/p&gt;


&lt;div id="outline-container-org961d2be" class="outline-2"&gt;
&lt;h2 id="org961d2be"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; The database design&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
I used &lt;a href="https://github.com/skeeto/emacsql"&gt;emacsql&lt;/a&gt; to create and interact with a sqlite3 database. It is a lispy way to generate SQL queries. I will not talk about the code much here, you can see this version &lt;a href="/media/org-db.el"&gt;org-db.el&lt;/a&gt; . The database design consists of several tables that contain the filenames, headlines, tags, properties, (optionally) headline-content, headline-tags, headline-properties, and links. The lisp code is a work in progress, and not something I use on a daily basis yet. This post is a proof of concept to see how well this approach works. 
&lt;/p&gt;

&lt;p&gt;
I use hooks to update the database when an org-file is opened (only if it is different than what is in the database based on an md5 hash) and when it is saved. Basically, these functions delete the current entries in the database for a file, then use regular expressions to go to each headline or link in the file, and add data back to the database. I found this to be faster than parsing the org-file with org-element especially for large files. Since this is all done by a hook, anytime I open an org-file anywhere it gets added/updated to the database. The performance of this is ok. This approach will not guarantee the database is 100% accurate all the time (e.g. if something modifies the file outside of emacs, like a git pull), but it doesn't need to be. Most of the files do not change often, the database gets updated each time you open a file, and you can always reindex the database from files it knows about. Time will tell how often that seems necessary.
&lt;/p&gt;

&lt;p&gt;
emacsql lets you use lisp code to generate SQL that is sent to the database. Here is an example:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql-flatten-sql [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; [name] &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; main:sqlite_master &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (= type table)])
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
SELECT name FROM main.sqlite_master WHERE type = "table";
&lt;/pre&gt;

&lt;p&gt;
There are some nuances, for example, main:sqlite_master gets converted to main.sqlite_master. You use vectors, keywords, and sexps to setup the command. emacsql will turn a name like filename-id into filename_id. It was not too difficulty to figure out, and the author of emacsql was really helpful on a few points. I will be referring to this post in the future to remember some of these nuances!
&lt;/p&gt;

&lt;p&gt;
Here is a list of tables in the database. There are a few primary tables, and then some that store tags, properties, and keywords on the headlines. This is typical of emacsql code; it is a lisp expression that generates SQL.  In this next expression org-db is a variable that stores the database connection created in org-db.el.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; [name] &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; main:sqlite_master &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (= type table)])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;files&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;tags&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;properties&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;keywords&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headlines&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headline_content&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headline_content_content&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headline_content_segments&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headline_content_segdir&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headline_content_docsize&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headline_content_stat&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headline_tags&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;headline_properties&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;file_keywords&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;links&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Here is a description of the columns in the files table:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:pragma&lt;/span&gt; (funcall table_info files)])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-right" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-right" /&gt;

&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;rowid&lt;/td&gt;
&lt;td class="org-right"&gt;INTEGER&lt;/td&gt;
&lt;td class="org-left"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;1&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;1&lt;/td&gt;
&lt;td class="org-left"&gt;filename&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;2&lt;/td&gt;
&lt;td class="org-left"&gt;md5&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
and the headlines table.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:pragma&lt;/span&gt; (funcall table_info headlines)])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-right" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-right" /&gt;

&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;rowid&lt;/td&gt;
&lt;td class="org-right"&gt;INTEGER&lt;/td&gt;
&lt;td class="org-left"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;1&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;1&lt;/td&gt;
&lt;td class="org-left"&gt;filename_id&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;2&lt;/td&gt;
&lt;td class="org-left"&gt;title&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;3&lt;/td&gt;
&lt;td class="org-left"&gt;level&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;4&lt;/td&gt;
&lt;td class="org-left"&gt;todo_keyword&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;5&lt;/td&gt;
&lt;td class="org-left"&gt;todo_type&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;6&lt;/td&gt;
&lt;td class="org-left"&gt;archivedp&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;7&lt;/td&gt;
&lt;td class="org-left"&gt;commentedp&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;8&lt;/td&gt;
&lt;td class="org-left"&gt;footnote_section_p&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-right"&gt;9&lt;/td&gt;
&lt;td class="org-left"&gt;begin&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-left"&gt;nil&lt;/td&gt;
&lt;td class="org-right"&gt;0&lt;/td&gt;
&lt;td class="org-right"&gt;&amp;#xa0;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Tags and properties on a headline are stored in headline-tags and headline-properties. 
&lt;/p&gt;

&lt;p&gt;
The database is not large if all it has is headlines and links (no content). It got up to half a GB with content, and seemed a little slow, so for this post I leave the content out.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-sh"&gt;du -hs ~/org-db/org-db.sqlite
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;56M&lt;/td&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/org-db/org-db.sqlite&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Here we count how many files are in the database. These are just the org-files in my Dropbox folder. There are a lot of them! If I include all the org-files from my research and teaching projects this number grows to about 10,000! You do not want to run org-map-entries on that. Note this also includes all of the org_archive files.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; (funcall count) &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; files])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;1569&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Here is the headlines count. You can see there is no chance of remembering where these are because there are so many!
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; (funcall count) &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; headlines])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;38587&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
And the links. So many links!
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; (funcall count) &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; links])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;303739&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
That is a surprising number of links. 
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgbda3471" class="outline-2"&gt;
&lt;h2 id="orgbda3471"&gt;&lt;span class="section-number-2"&gt;2&lt;/span&gt; Querying the link table&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
Let's see how many are cite links from org-ref there are.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; (funcall count) &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; links &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (= type &lt;span style="color: #008000;"&gt;"cite"&lt;/span&gt;)])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;14766&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Wow, I find that to also be surprisingly large!  I make a living writing proposals and scientific papers, and I wrote org-ref to make that easier, so maybe it should not be so surprising. We can search the link database for files containing citations of "kitchin-2015-examp" like this.  The links table only stores the filename-id, so we join it with the files table to get useful information. Here we show the list of files that contain a citation of that reference. It is a mix of manuscripts, proposals, presentations, documentation files and notes.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:distinct&lt;/span&gt; [files:filename]
                 &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; links &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; files &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= links:filename-id files:rowid) 
                 &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; (= type &lt;span style="color: #008000;"&gt;"cite"&lt;/span&gt;) (like path &lt;span style="color: #008000;"&gt;"%kitchin-2015-examp%"&lt;/span&gt;))])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/2015/Research_Data_Publishing_Paper/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/2015/Research_Data_Publishing_Paper/manuscript-2015-06-29/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/2015/Research_Data_Publishing_Paper/manuscript-2015-10-10/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/2015/Research_Data_Publishing_Paper/manuscript-2016-03-09/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/2015/Research_Data_Publishing_Paper/manuscript-2016-04-18/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/2015/human-readable-data/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/@archive/2015/Research_Data_Publishing_Paper/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/@archive/2015/Research_Data_Publishing_Paper/manuscript-2015-06-29/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/@archive/2015/Research_Data_Publishing_Paper/manuscript-2015-10-10/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/@archive/2015/Research_Data_Publishing_Paper/manuscript-2016-03-09/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/@archive/2015/Research_Data_Publishing_Paper/manuscript-2016-04-18/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/manuscripts/@archive/2015/human-readable-data/manuscript.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/meetings/@archive/2015/BES-2015/doe-bes-wed-data-briefing/doe-bes-wed-data-sharing.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/meetings/@archive/2015/NIST-july-2015/data-sharing.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/meetings/@archive/2015/UD-webinar/ud-webinar.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/meetings/@archive/2016/AICHE/data-sharing/data-sharing.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/meetings/@archive/2016/Spring-ACS/data-sharing/data-sharing.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/projects/DOE-Early-Career/annual-reports/final-report/kitchin-DESC0004031-final-report.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/proposals/@archive/2015/DOE-renewal/proposal-v2.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/proposals/@archive/2015/DOE-renewal/archive/proposal.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/proposals/@archive/2016/DOE-single-atom-alloy/proposal.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/proposals/@archive/2016/MRSEC/MRSEC-IRG-metastable-materials-preproposal/IRG-concept.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/proposals/@archive/2016/ljaf-open-science/kitchin-proposal.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/proposals/@archive/2016/nsf-germination/project-description.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/proposals/@archive/2016/nsf-reu-supplement/project-description.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/CMU/proposals/@archive/2016/proctor-and-gamble-education/proposal.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/bibliography/notes.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref/citeproc/readme.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref/citeproc/readme-unsrt.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref/citeproc/readme-author-year.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref/tests/test-1.org&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref/tests/sandbox/elpa/org-ref-20160122.1725/citeproc/readme.org&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Obviously we could use this to generate candidates for something like helm or ivy like this. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(ivy-read &lt;span style="color: #008000;"&gt;"Open: "&lt;/span&gt; (emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; [files:filename links:begin]
                                    &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; links &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; files &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= links:filename-id files:rowid) 
                                    &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; (= type &lt;span style="color: #008000;"&gt;"cite"&lt;/span&gt;) (like path &lt;span style="color: #008000;"&gt;"%kitchin-2015-examp%"&lt;/span&gt;))])
          &lt;span style="color: #006FE0;"&gt;:action&lt;/span&gt; '(1 (&lt;span style="color: #008000;"&gt;"o"&lt;/span&gt;
                       (&lt;span style="color: #0000FF;"&gt;lambda&lt;/span&gt; (c)
                         (find-file (car c))
                         (goto-char (nth 1 c))
                         (org-show-entry)))))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
/Users/jkitchin/Dropbox/CMU/manuscripts/2015/human-readable-data/manuscript.org
&lt;/pre&gt;

&lt;p&gt;
Now, you can find every org-file containing any bibtex key as a citation. Since SQL is the query language, you should be able to build really sophisticated queries that combine filters for multiple citations, different kinds of citations, etc.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org8284133" class="outline-2"&gt;
&lt;h2 id="org8284133"&gt;&lt;span class="section-number-2"&gt;3&lt;/span&gt; Headline queries&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
Every headline is stored, along with its location, tags and properties. We can use the database to find headlines that are tagged or with certain properties. You can see here I have 293 tags in the database.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; (funcall count) &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; tags])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;293&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Here we find headlines tagged with electrolyte. I tagged some papers I read with this at some point.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:distinct&lt;/span&gt; [files:filename headlines:title]
                 &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt; headlines &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; headline-tags &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (=  headlines:rowid headline-tags:headline-id)
                 &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; tags &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= tags:rowid headline-tags:tag-id)
                 &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; files &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= headlines:filename-id files:rowid)
                 &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (= tags:tag &lt;span style="color: #008000;"&gt;"electrolyte"&lt;/span&gt;) &lt;span style="color: #006FE0;"&gt;:limit&lt;/span&gt; 5])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/org-mode/prj-doe-early-career.org&lt;/td&gt;
&lt;td class="org-left"&gt;2010 - Nickel-borate oxygen-evolving catalyst that functions under benign conditions&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/bibliography/notes.org&lt;/td&gt;
&lt;td class="org-left"&gt;1971 - A Correlation of the Solution Properties and the  Electrochemical Behavior of the Nickel Hydroxide  Electrode in Binary Aqueous Alkali Hydroxides&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/bibliography/notes.org&lt;/td&gt;
&lt;td class="org-left"&gt;1981 - Studies concerning charged nickel hydroxide electrodes IV. Reversible potentials in LiOH, NaOH, RbOH and CsOH&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/bibliography/notes.org&lt;/td&gt;
&lt;td class="org-left"&gt;1986 - The effect of lithium in preventing iron poisoning in the nickel hydroxide electrode&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;/Users/jkitchin/Dropbox/bibliography/notes.org&lt;/td&gt;
&lt;td class="org-left"&gt;1996 - The role of lithium in preventing the detrimental effect of iron on alkaline battery nickel hydroxide electrode: A mechanistic aspect&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Here we see how many entries have an EMAIL property. These could serve as contacts to send email to.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; [(funcall count)] &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt;
                 headlines &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; headline-properties &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (=  headlines:rowid headline-properties:headline-id)
                 &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; properties &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= properties:rowid headline-properties:property-id)
                 &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; (= properties:property &lt;span style="color: #008000;"&gt;"EMAIL"&lt;/span&gt;) (not (null headline-properties:value)))])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;7452&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
If you want to see the ones that match "jkitchin", here they are. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:distinct&lt;/span&gt; [headlines:title headline-properties:value] &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt;
             headlines &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; headline-properties &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (=  headlines:rowid headline-properties:headline-id)
             &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; properties &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= properties:rowid headline-properties:property-id)
             &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; (= properties:property &lt;span style="color: #008000;"&gt;"EMAIL"&lt;/span&gt;) (like headline-properties:value &lt;span style="color: #008000;"&gt;"%jkitchin%"&lt;/span&gt;))])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;John Kitchin&lt;/td&gt;
&lt;td class="org-left"&gt;jkitchin@andrew.cmu.edu&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;John Kitchin&lt;/td&gt;
&lt;td class="org-left"&gt;jkitchin@cmu.edu&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Kitchin, John&lt;/td&gt;
&lt;td class="org-left"&gt;jkitchin@andrew.cmu.edu&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


&lt;p&gt;
Here is a query to find the number of headlines where the deadline matches 2017. Looks like I am already busy!
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; (funcall count) &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt;
             headlines &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; headline-properties &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (=  headlines:rowid headline-properties:headline-id)
             &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; properties &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= properties:rowid headline-properties:property-id)
             &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; (= properties:property &lt;span style="color: #008000;"&gt;"DEADLINE"&lt;/span&gt;) (glob headline-properties:value &lt;span style="color: #008000;"&gt;"*2017*"&lt;/span&gt;))])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org7d75505" class="outline-2"&gt;
&lt;h2 id="org7d75505"&gt;&lt;span class="section-number-2"&gt;4&lt;/span&gt; Keyword queries&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
We also store file keywords, so we can search on document titles, authors, etc. Here are five documents with titles longer than 35 characters sorted in descending order. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:distinct&lt;/span&gt; [value] &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt;
                 file-keywords &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; keywords &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= file-keywords:keyword-id keywords:rowid)
                 &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; (&amp;gt; (funcall length value) 35) (= keywords:keyword &lt;span style="color: #008000;"&gt;"TITLE"&lt;/span&gt;))
                 &lt;span style="color: #006FE0;"&gt;:order&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:by&lt;/span&gt; value &lt;span style="color: #006FE0;"&gt;:desc&lt;/span&gt;
                 &lt;span style="color: #006FE0;"&gt;:limit&lt;/span&gt; 5])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;pycse - Python3 Computations in Science and Engineering&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;org-show - simple presentations in org-mode&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;org-mode - A Human Readable, Machine Addressable Approach to Data Archiving and Sharing in Science and Engineering&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;modifying emacs to make typing easier.&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;jmax - John's customizations to maximize Emacs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
It is possible to search on AUTHOR, and others. My memos have a #+SUBJECT keyword, so I can find memos on a subject. They also use the LATEX_CLASS of cmu-memo, so I can find all of them easily too:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(emacsql org-db [&lt;span style="color: #006FE0;"&gt;:select&lt;/span&gt; [(funcall count)] &lt;span style="color: #006FE0;"&gt;:from&lt;/span&gt;
                 file-keywords &lt;span style="color: #006FE0;"&gt;:inner&lt;/span&gt; &lt;span style="color: #006FE0;"&gt;:join&lt;/span&gt; keywords &lt;span style="color: #006FE0;"&gt;:on&lt;/span&gt; (= file-keywords:keyword-id keywords:rowid)
                 &lt;span style="color: #006FE0;"&gt;:where&lt;/span&gt; (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; (= value &lt;span style="color: #008000;"&gt;"cmu-memo"&lt;/span&gt;) (= keywords:keyword &lt;span style="color: #008000;"&gt;"LATEX_CLASS"&lt;/span&gt;))
                 &lt;span style="color: #006FE0;"&gt;:limit&lt;/span&gt; 5])
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-right" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-right"&gt;119&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
How about that, 119 memos&amp;#x2026; Still it sure is nice to be able to find them.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org08feb51" class="outline-2"&gt;
&lt;h2 id="org08feb51"&gt;&lt;span class="section-number-2"&gt;5&lt;/span&gt; Full text search&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
In theory, the database has a table for the headline content, and it should be fully searchable. I found the database got a little sluggish, and nearly 1/2 a GB in size when using it so I am leaving it out for now.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgbb3d71f" class="outline-2"&gt;
&lt;h2 id="orgbb3d71f"&gt;&lt;span class="section-number-2"&gt;6&lt;/span&gt; Summary&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;
&lt;p&gt;
The foundation for something really good is here. It is still a little tedious to wrote the queries with all the table joins, but some of that could be wrapped into a function for a query. I like the lispy style of the queries, although it can be tricky to map all the concepts onto SQL. A function that might wrap this could look like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(org-db-query (&lt;span style="color: #0000FF;"&gt;and&lt;/span&gt; (= properties:property &lt;span style="color: #008000;"&gt;"DEADLINE"&lt;/span&gt;) (glob headline-properties:value &lt;span style="color: #008000;"&gt;"*2017*"&lt;/span&gt;)))
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
This is what it would ideally look like using the org tag/property match syntax. Somehow that string would have to get expanded to generate the code above. I do not have a sense for how difficult that would be. It might not be hard with &lt;a href="https://github.com/skeeto/rdp"&gt;a recursive descent parser&lt;/a&gt;, written by the same author as emacsql.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-emacs-lisp"&gt;(org-db-query &lt;span style="color: #008000;"&gt;"DEADLINE={2017}"&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
The performance is only ok. For large org files there is a notable lag in updating the database, which is notable because while updating, Emacs is blocked. I could try using an idle timer for updates with a queue, or get more clever about when to update. It is not essential that the updates be real-time, only that they are reasonably accurate or done by the time I next search. For now, it is not too annoying though. As a better database, I have had my eye on &lt;a href="https://xapian.org"&gt;xapian&lt;/a&gt; since that is what mu4e (and notmuch) uses. It might be good to have an external library for parsing org-files, i.e. not through emacs, for this. It would certainly be faster. It seems like a big project though, maybe next summer ;)
&lt;/p&gt;

&lt;p&gt;
Another feature this might benefit from is ignore patterns, or some file feature that prevents it from being indexed. For example, I keep an encrypted password file in org-mode, but as soon as I opened it, it got indexed right into the database, in plain text. If you walk your file system, it might make sense to avoid some directories, like .dropbox.cache. Otherwise, this still looks like a promising approach. 
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2017 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;
&lt;p&gt;&lt;a href="/org/2017/01/03/Find-stuff-in-org-mode-anywhere.org"&gt;org-mode source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Org-mode version = 9.0.3&lt;/p&gt;]]></content:encoded>
    </item>
  </channel>
</rss>
