Querying a MongoDB bibtex database with Python and emacs-lisp

| categories: emacs, python, mongodb, database | tags: | View Comments

I have been exploring using databases to help with searching my data. In this post we explore using MongoDB for bibtex entries. I am choosing bibtex entries because it is easy to parse bibtex files, I already have a lot of them, and I have several kinds of queries I regularly use. So, they are a good candidate to test out a new database on!

MongoDB is a noSQL database that is pretty easy to use. I installed it from homebrew, and then followed the directions to run the server.

With pymongo you can make a database as easy as this:

import bibtexparser

# Read the bibtex file to get entries
with open('../../../Dropbox/bibliography/references.bib', 'r') as bibfile:
    bp = bibtexparser.load(bibfile)
    entries = bp.entries

print("N = ", len(entries))

print(entries[0])

import pymongo
from pymongo import MongoClient
client = MongoClient('localhost', 27017)

# This creates the "entries" collection
db = client['bibtex'].entries

# add each entry
for entry in entries:
    db.insert_one(entry)

N = 1671 {'keyword': 'test, word', 'year': '2006', 'publisher': 'American Chemical Society (ACS)', 'title': 'The ACS Style Guide', 'ENTRYTYPE': 'book', 'editor': 'Janet S. Dodd', 'address': 'Washington, D.C.', 'ID': '2006-acs-style-guide', 'doi': '10.1021/bk-2006-styg', 'link': 'http://dx.doi.org/10.1021/bk-2006-STYG', 'date_added': 'Wed Apr 1 10:17:54 2015', 'pages': 'nil'}

That was easy. We have a database with 1671 documents in it, and each document is essentially a dictionary of key-value pairs. You might even argue it was too easy. I didn't specify any structure to the entries at all. No required fields, no validation that the keys are spelled correctly, no validation on the values, e.g. you can see the year looks like a string. The benefit of that is that every entry went in, with no issues. On the other hand, the authors went in as a single string, as did the keywords, which affects our ability to search a little bit later. Note if you run that twice, it will add each entry again, since we do not check if the entry already exists.

A database is only useful though if it is easy to get stuff out of it. So, let's consider some test queries. First we find entries that have years less than 1950. The query is basically a little json bundle that describes a field and condition that we want to match. Here we use a less than operator, ""$lt"The results come back as a list of dictionaries. This is in stark contrast to a SQL query which is an expression in its own declarative language. A query here is a chunk of data that must get converted to code by the server. I am not 100% clear if the less than here is in the string sense or numeric sense, but for years it probably does not matter for a long time.

import pymongo
from pymongo import MongoClient
client = MongoClient('localhost', 27017)

db = client['bibtex'].entries

for i, result in enumerate(db.find({"year" : {"$lt": "1950"}})):
    print('{i: 2d}. {author}, {title}, {journal}, {year}.'.format(i=i+1, **result))
  1. Birch, Francis, Finite Elastic Strain of Cubic Crystals, Phys. Rev., 1947.
  2. Ditchburn, R. W. and Gilmour, J. C., The Vapor Pressures of Monatomic Vapors, Rev. Mod. Phys., 1941.
  3. J. Korringa, On the Calculation of the Energy of a Bloch Wave in a Metal, Physica, 1947.
  4. Nix, F. C. and MacNair, D., The Thermal Expansion of Pure Metals. {II}: Molybdenum, Palladium, Silver, Tantalum, Tungsten, Platinum, and Lead, Phys. Rev., 1942.

That seems easy enough, and those strings could easily be used as candidates for a selection tool like helm.

How about articles published by myself and my student Jacob Boes? This requires "and" logic. Apparently that is the default, so we just add three queries. One is an exact match on articles, and the other two are case-insensitive regular expression matches. I guess this has to be done on every document, since there probably is no way to index a regex match! This search was very fast, but it is not clear how fast it would be for a million entries. This matching is necessary because we stored all authors in a single field rather than splitting them into an array. We might still have to match strings for this even in an array since an author might then be "John R. Kitchin", rather than further decomposed into first and last names.

import pymongo
from pymongo import MongoClient
client = MongoClient('localhost', 27017)

db = client['bibtex']
entries = db['entries']

for i, result in enumerate(entries.find({"ENTRYTYPE": "article",
                                         "author" : {"$regex": "kitchin", '$options' : 'i'},
                                         "author" : {"$regex": "boes", '$options' : 'i'}})):
    if result.get('doi', None):
        result['doi'] = 'http://dx.doi.org/{doi}'.format(doi=result['doi'])
    else:
        result['doi'] = ''
    print('{i: 2d}. {author}, {title}, {journal}, {year}. {doi}'.format(i=i+1, **result).replace("\n", ""))
  1. Jacob R. Boes and Peter Kondratyuk and Chunrong Yin and JamesB. Miller and Andrew J. Gellman and John R. Kitchin, Core Level Shifts in {Cu-Pd} Alloys As a Function of BulkComposition and Structure, Surface Science, 2015. http://dx.doi.org/10.1016/j.susc.2015.02.011
  2. Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Estimating Bulk-Composition-Dependent \ce{H2} AdsorptionEnergies on \ce{CuxPd1-x} Alloy (111) Surfaces, ACS Catalysis, 2015. http://dx.doi.org/10.1021/cs501585k
  3. Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Supporting Information: Estimating Bulk-Composition-Dependent\ce{H2} Adsorption Energies on \ce{CuxPd1-x} Alloy (111)Surfaces, ACS Catalysis, 2015. http://dx.doi.org/10.1021/cs501585k
  4. G. Gumuslu and P. Kondratyuk and J. R. Boes and B. Morrealeand J. B. Miller and J. R. Kitchin and A. J. Gellman, Correlation of Electronic Structure With Catalytic Activity:\ce{H2}-\ce{D2} Exchange Across \ce{CuxPd1-x}Composition Space, ACS Catalysis, 2015. http://dx.doi.org/10.1021/cs501586t
  5. John D. Michael and Ethan L. Demeter and Steven M. Illes andQingqi Fan and Jacob R. Boes and John R. Kitchin, Alkaline Electrolyte and {Fe} Impurity Effects on thePerformance and Active-Phase Structure of {NiOOH} Thin Filmsfor {OER} Catalysis Applications, J. Phys. Chem. C, 2015. http://dx.doi.org/10.1021/acs.jpcc.5b02458
  6. Jacob R. Boes and Mitchell C. Groenenboom and John A. Keithand John R. Kitchin, Neural Network and {Reaxff} Comparison for {Au} Properties, Int. J. Quantum Chem., 2016. http://dx.doi.org/10.1002/qua.25115
  7. Jacob R. Boes and John R. Kitchin, Neural Network Predictions of Oxygen Interactions on a Dynamic Pd Surface, Molecular Simulation, Accepted 12/2016. http://dx.doi.org/10.1080/08927022.2016.1274984
  8. Jacob R. Boes and John R. Kitchin, Modeling Segregation on {AuPd}(111) Surfaces With DensityFunctional Theory and Monte Carlo Simulations, Submitted to J. Phys. Chem. C, 2016.

We can find out how many different entry types we have, as well as how many distinct keyword entries there are. The documents do not separate the keywords though, so this is just the unique strings of comma-separated keywords values. We would have had to split those in advance to have a list of keywords to search for a specific one beyond string matching. Curiously, in my bibtex entries, these are in a field called "keywords". It appears the bibtex parser may have changed the name to "keyword".

import pymongo
from pymongo import MongoClient
client = MongoClient('localhost', 27017)

db = client['bibtex']
entries = db['entries']

print(entries.distinct("ENTRYTYPE"))
print(len(entries.distinct("keyword")))
print(entries.find({"keyword": {"$exists": "true"}})[22]['keyword'])

['book', 'article', 'techreport', 'phdthesis', 'inproceedings', 'inbook', 'mastersthesis', 'misc', 'incollection'] 176 Bildungsw{\"a}rmen, Dichtefunktionalrechnungen, Perowskite, Thermochemie

1 text searching

You can do text search as well. You first have to create an index on one or more fields, and then use the $text and $search operators. Here I made an index on a few fields, and then searched on it. Note that you can only have one text index, so think about it in advance! This simplifies the query a bit, we do not have to use the regex syntax for matching on a field.

import pymongo
from pymongo import MongoClient
client = MongoClient('localhost', 27017)

db = client['bibtex']
entries = db['entries']

entries.create_index([('author', pymongo.TEXT),
                      ('title', pymongo.TEXT),
                      ('keyword', pymongo.TEXT)], sparse=True)

for i, result in enumerate(entries.find({"$text" : {"$search": "kitchin", "$search": "boes"}})):
    print('{i: 2d}. {author}, {title}, {journal}, {year}.'.format(i=i, **result).replace("\n", ""))
  1. G. Gumuslu and P. Kondratyuk and J. R. Boes and B. Morrealeand J. B. Miller and J. R. Kitchin and A. J. Gellman, Correlation of Electronic Structure With Catalytic Activity:\ce{H2}-\ce{D2} Exchange Across \ce{CuxPd1-x}Composition Space, ACS Catalysis, 2015.
  2. Jacob R. Boes and Peter Kondratyuk and Chunrong Yin and JamesB. Miller and Andrew J. Gellman and John R. Kitchin, Core Level Shifts in {Cu-Pd} Alloys As a Function of BulkComposition and Structure, Surface Science, 2015.
  3. Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Estimating Bulk-Composition-Dependent \ce{H2} AdsorptionEnergies on \ce{CuxPd1-x} Alloy (111) Surfaces, ACS Catalysis, 2015.
  4. Jacob R. Boes and John R. Kitchin, Neural Network Predictions of Oxygen Interactions on a Dynamic Pd Surface, Molecular Simulation, Accepted 12/2016.
  5. Jacob R. Boes and John R. Kitchin, Modeling Segregation on {AuPd}(111) Surfaces With DensityFunctional Theory and Monte Carlo Simulations, Submitted to J. Phys. Chem. C, 2016.
  6. Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Supporting Information: Estimating Bulk-Composition-Dependent\ce{H2} Adsorption Energies on \ce{CuxPd1-x} Alloy (111)Surfaces, ACS Catalysis, 2015.
  7. John D. Michael and Ethan L. Demeter and Steven M. Illes andQingqi Fan and Jacob R. Boes and John R. Kitchin, Alkaline Electrolyte and {Fe} Impurity Effects on thePerformance and Active-Phase Structure of {NiOOH} Thin Filmsfor {OER} Catalysis Applications, J. Phys. Chem. C, 2015.
  8. Jacob R. Boes and Mitchell C. Groenenboom and John A. Keithand John R. Kitchin, Neural Network and {Reaxff} Comparison for {Au} Properties, Int. J. Quantum Chem., 2016.

We can use this to search for documents with orgmode in a keyword or title too.

import pymongo
from pymongo import MongoClient
client = MongoClient('localhost', 27017)

db = client['bibtex']
entries = db['entries']

entries.create_index([('author', pymongo.TEXT),
                      ('title', pymongo.TEXT),
                      ('keyword', pymongo.TEXT)], sparse=True)

for i, result in enumerate(entries.find({"$text" : {"$search": "orgmode"}})):
    print('{i: 2d}. {author}, {title}, {journal}, {year}.'.format(i=i, **result).replace("\n", ""))
  1. John R. Kitchin, Data Sharing in Surface Science, Surface Science, 2016.
  2. Zhongnan Xu and John R. Kitchin, Probing the Coverage Dependence of Site and AdsorbateConfigurational Correlations on (111) Surfaces of LateTransition Metals, J. Phys. Chem. C, 2014.
  3. Xu, Zhongnan and Rossmeisl, Jan and Kitchin, John R., A Linear Response {DFT}+{U} Study of Trends in the OxygenEvolution Activity of Transition Metal Rutile Dioxides, The Journal of Physical Chemistry C, 2015.
  4. Prateek Mehta and Paul A. Salvador and John R. Kitchin, Identifying Potential \ce{BO2} Oxide Polymorphs for EpitaxialGrowth Candidates, ACS Appl. Mater. Interfaces, 2015.
  5. Xu, Zhongnan and Joshi, Yogesh V. and Raman, Sumathy andKitchin, John R., Accurate Electronic and Chemical Properties of 3d TransitionMetal Oxides Using a Calculated Linear Response {U} and a {DFT+ U(V)} Method, The Journal of Chemical Physics, 2015.
  6. Zhongnan Xu and John R. Kitchin, Relationships Between the Surface Electronic and ChemicalProperties of Doped 4d and 5d Late Transition Metal Dioxides, The Journal of Chemical Physics, 2015.
  7. Zhongnan Xu and John R Kitchin, Tuning Oxide Activity Through Modification of the Crystal andElectronic Structure: From Strain To Potential Polymorphs, Phys. Chem. Chem. Phys., 2015.
  8. Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Supporting Information: Estimating Bulk-Composition-Dependent\ce{H2} Adsorption Energies on \ce{CuxPd1-x} Alloy (111)Surfaces, ACS Catalysis, 2015.
  9. Kitchin, John R., Examples of Effective Data Sharing in Scientific Publishing, ACS Catalysis, 2015.
  10. Curnan, Matthew T. and Kitchin, John R., Effects of Concentration, Crystal Structure, Magnetism, andElectronic Structure Method on First-Principles Oxygen VacancyFormation Energy Trends in Perovskites, The Journal of Physical Chemistry C, 2014.
  11. Kitchin, John R. and Van Gulick, Ana E. and Zilinski, Lisa D., Automating Data Sharing Through Authoring Tools, International Journal on Digital Libraries, 2016.
  12. Jacob R. Boes and Gamze Gumuslu and James B. Miller and AndrewJ. Gellman and John R. Kitchin, Estimating Bulk-Composition-Dependent \ce{H2} AdsorptionEnergies on \ce{CuxPd1-x} Alloy (111) Surfaces, ACS Catalysis, 2015.
  13. Zhongnan Xu and John R. Kitchin, Relating the Electronic Structure and Reactivity of the 3dTransition Metal Monoxide Surfaces, Catalysis Communications, 2014.
  14. Spencer D. Miller and Vladimir V. Pushkarev and AndrewJ. Gellman and John R. Kitchin, Simulating Temperature Programmed Desorption of Oxygen on{P}t(111) Using {DFT} Derived Coverage Dependent DesorptionBarriers, Topics in Catalysis, 2014.
  15. Hallenbeck, Alexander P. and Kitchin, John R., Effects of \ce{O_2} and \ce{SO_2} on the Capture Capacity of aPrimary-Amine Based Polymeric \ce{CO_2} Sorbent, Industrial \& Engineering Chemistry Research, 2013.

2 Querying from emacs-lisp

It is hard to get too excited about this if it is not easy to query from emacs and get data in a form we can use in emacs ;) The json library allows us to convert lisp data structures to json pretty easily. For example:

(require 'json)

(json-encode '((ENTRYTYPE . article)
               (author . (($regex . kitchin)
                          ($options . i)))
               (author . (($regex . boes)
                          ($options . i)))))
{"ENTRYTYPE":"article","author":{"$regex":"kitchin","$options":"i"},"author":{"$regex":"boes","$options":"i"}}

So, we can use an a-list syntax to build up the query. Then we can send it to mongo using mongoexport that will return a json string that we can read back into emacs to get lisp data. Here is an example that returns a query. We print the first element here.

(pp
 (aref (json-read-from-string
        (shell-command-to-string
         (format "mongoexport --quiet --jsonArray -d bibtex -c entries -q '%s'"
                 (json-encode '((ENTRYTYPE . article)
                                (author . (($regex . kitchin)
                                           ($options . i)))
                                (author . (($regex . boes)
                                           ($options . i))))))))
       0))
((_id
  ($oid . "5878d9644c114f59fe86cb36"))
 (author . "Jacob R. Boes and Peter Kondratyuk and Chunrong Yin and James\nB. Miller and Andrew J. Gellman and John R. Kitchin")
 (year . "2015")
 (title . "Core Level Shifts in {Cu-Pd} Alloys As a Function of Bulk\nComposition and Structure")
 (ENTRYTYPE . "article")
 (ID . "boes-2015-core-cu")
 (keyword . "DESC0004031, early-career")
 (volume . "640")
 (doi . "10.1016/j.susc.2015.02.011")
 (link . "http://dx.doi.org/10.1016/j.susc.2015.02.011")
 (issn . "0039-6028")
 (journal . "Surface Science")
 (pages . "127-132"))

That is pretty sweet, we get a lisp data structure we can use. We can wrap that into a reasonable looking function here:

(defun mongo-find (db collection query)
  (json-read-from-string
   (shell-command-to-string
    (format "mongoexport --quiet --jsonArray -d %s -c %s -q '%s'"
            db collection (json-encode query)))))
mongo-find

Now we can use the function to query the database, and then format the results. Here we look at the example of articles with authors that match "kitchin" and "boes".

(loop for counter from 1 for entry across
      (mongo-find "bibtex" "entries" '((ENTRYTYPE . article)
                                       (author . (($regex . kitchin)
                                                  ($options . i)))
                                       (author . (($regex . boes)
                                                  ($options . i)))))
      do
      (setq entry (append `(,(cons "counter" counter)) entry))
      ;; make sure we have a doi field.
      (if (assoc 'doi entry)
          (push (cons "doi" (format "http://dx.doi.org/%s" (cdr (assoc 'doi entry)))) entry)
        (push (cons "doi" "") entry))
      concat
      (concat (replace-regexp-in-string
               "\n" " "
               (s-format "${counter}. ${author}, ${title} (${year}). ${doi}"
                         'aget entry)) "\n"))
1. Jacob R. Boes and Peter Kondratyuk and Chunrong Yin and James B. Miller and Andrew J. Gellman and John R. Kitchin, Core Level Shifts in {Cu-Pd} Alloys As a Function of Bulk Composition and Structure (2015). http://dx.doi.org/10.1016/j.susc.2015.02.011
2. Jacob R. Boes and Gamze Gumuslu and James B. Miller and Andrew J. Gellman and John R. Kitchin, Estimating Bulk-Composition-Dependent \ce{H2} Adsorption Energies on \ce{Cu_{x}Pd_{1-x}} Alloy (111) Surfaces (2015). http://dx.doi.org/10.1021/cs501585k
3. Jacob R. Boes and Gamze Gumuslu and James B. Miller and Andrew J. Gellman and John R. Kitchin, Supporting Information: Estimating Bulk-Composition-Dependent \ce{H2} Adsorption Energies on \ce{Cu_{x}Pd_{1-x}} Alloy (111) Surfaces (2015). http://dx.doi.org/10.1021/cs501585k
4. G. Gumuslu and P. Kondratyuk and J. R. Boes and B. Morreale and J. B. Miller and J. R. Kitchin and A. J. Gellman, Correlation of Electronic Structure With Catalytic Activity: \ce{H2}-\ce{D2} Exchange Across \ce{Cu_{x}Pd_{1-x}} Composition Space (2015). http://dx.doi.org/10.1021/cs501586t
5. John D. Michael and Ethan L. Demeter and Steven M. Illes and Qingqi Fan and Jacob R. Boes and John R. Kitchin, Alkaline Electrolyte and {Fe} Impurity Effects on the Performance and Active-Phase Structure of {NiOOH} Thin Films for {OER} Catalysis Applications (2015). http://dx.doi.org/10.1021/acs.jpcc.5b02458
6. Jacob R. Boes and Mitchell C. Groenenboom and John A. Keith and John R. Kitchin, Neural Network and {Reaxff} Comparison for {Au} Properties (2016). http://dx.doi.org/10.1002/qua.25115
7. Jacob R. Boes and John R. Kitchin, Neural Network Predictions of Oxygen Interactions on a Dynamic Pd Surface (Accepted 12/2016). http://dx.doi.org/10.1080/08927022.2016.1274984
8. Jacob R. Boes and John R. Kitchin, Modeling Segregation on {AuPd}(111) Surfaces With Density Functional Theory and Monte Carlo Simulations (2016). 

Wow, that looks like a pretty lispy way to query the database and use the results. It is probably pretty easy to do similar things for inserting and updating documents. I will save that for another day.

3 Summary thoughts

This is not an exhaustive study of Mongo for a bibtex database. It does illustrate that it is potentially useful. Imagine a group of users can enter bibtex entries, and then share them through a central server. Or you query the server for entries and then select them using helm/ivy. That is probably faster than parsing large bibtex files (note, in org-ref I already cache the files in parsed form for performance reasons!).

It would make sense to split the authors, and keywords in another version of this database. It also could make sense to have a field that is the bibtex string, and to do text search on that string. That way you get everything in the entry for searching, and an easy way to generate bibtex files without having to reconstruct them.

It is especially interesting to run the queries through emacs-lisp since we get the benefit of editing lisp code while writing the query, e.g. parenthesis navigation, less quoting, etc… and we get back lisp data that can be used to construct helm/ivy queries, or other emacs things. That makes this look competitive with emacsql at least for the syntax. I predict that there will be more posts on this in the future.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.3

Read and Post Comments

New and improved asynchronous org-babel python blocks

| categories: orgmode, emacs, python | tags: | View Comments

Table of Contents

About a year ago I posted some code to run org-babel python blocks asynchronously. This year, my students asked for some enhancements related to debugging. Basically, they were frustrated by a few things when they got errors. First, they found it difficult to find the line number in the Traceback in the src block because there are no line numbers in the block, and it is annoying to do a special edit just for line numbers.

I thought about this, and figured out how to significantly improve the situation. The async python code in scimax now has the following features:

  1. When you get a Traceback, it goes in the results, and each file listed in it is hyperlinked to the source file and line so it is easy to get to them.
  2. The cursor jumps to the last line in the code block that is listed in the Traceback, and a beacon shines to show you the line
  3. You can turn on temporary line numbers in the code block to see where the lines are in the block, and these disappear when you start typing. This is done in the variable `org-babel-async-python-show-line-numbers'.
  4. You can control whether a buffer of the results shows or not via the variable `org-babel-async-python-show-results'.
  5. When you run the block, you get a clickable link in the RESULTS section to kill the process.
  6. You may also find the `autopep8' and `pylint' functions helpful.

The code for this is currently found here: https://github.com/jkitchin/scimax/blob/org-9/scimax-org-babel-python.el

Eventually, I will merge this into master, after I am sure about all the changes needed for org 9.0. That is not likely to happen until the semester ends, so I do not mess up my students who use scimax in class. So, sometime mid-December it will make into master.

To make async the default way to run a python block use this code, so that you can use C-c C-c to run them:

(require 'scimax-org-babel-python)
(add-to-list 'org-ctrl-c-ctrl-c-hook 'org-babel-async-execute:python)

As with the past few posts, this video will make it much more clear what the post is about:

Here is a prototypical example that shows how it works. While it runs you can view the progress if you click on the link to show the results.

import time

for i in range(5):
    print(i)
    time.sleep(2)

0 1 2 3 4 Traceback (most recent call last): File "Org SRC", line 5, in <module> time.sleep(2) KeyboardInterrupt

This block has a pretty obvious issue when we run it. The cursor jumps right to the problem!

print('This line is ok')
# 5 / 0
print('We will not see this')

This line is ok We will not see this

This block shows we can access any of the links in the Traceback. Here we have an error in calling a function that is raised in an external file.

import numpy as np
from scipy.integrate import odeint

Vspan = np.linspace(0, 2) # L

# dF/dV = F
def dFdV(F, V, v0):
    return F


print(odeint(dFdV, 1.0, Vspan))

Traceback (most recent call last): File "Org SRC", line 11, in <module> print(odeint(dFdV, 1.0, Vspan)) File "/Users/jkitchin/anaconda3/lib/python3.5/site-packages/scipy/integrate/odepack.py", line 215, in odeint ixpr, mxstep, mxhnil, mxordn, mxords) TypeError: dFdV() missing 1 required positional argument: 'v0'

Here we show how nice it is to be able to kill a process. This block will not end on its own.

while True:
    pass

Traceback (most recent call last): File "Org SRC", line 2, in <module> pass KeyboardInterrupt

1 autopep8

autopep8 is a tool for reformatting Python code. We wrapped this into an Emacs command so you can quickly reformat a Python code block.

a = 4
b = 5
c = a * b  # comment
# another comment


def f(x):
    return x
print(f(5))

2 pylint

pylint is a great tool for checking your Python code for errors, style and conventions. We also wrapped this into an Emacs command so you can run it on a Python src block. The report that is generated had clickable links to help you get right to the lines in your code block with problems.

import numpy as np

a = np.array(5, 5)

def f(x): return x

print(f(6))

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0

Read and Post Comments

Writing lisp code from Python

| categories: python, lisp | tags: | View Comments

Some time ago I wrote about converting python data structures to lisp . I have expanded on that idea to writing lisp programs from Python! The newly expanded code that makes this possible can be found at https://github.com/jkitchin/pycse/blob/master/pycse/lisp.py .

Here are the simple data types known to pycse.lisp:

import pycse.lisp
import numpy as np

print("a string".lisp)
a = 5
b = 5.0
print(a.lisp)
print(b.lisp)
print([1, 2, 3].lisp)
print((1, 2, 3).lisp)
print({'a': 4}.lisp)
print(np.array([1, 2, 3]).lisp)
print(np.array([1.0, 2.0, 3.0]).lisp)
"a string"
5
5.0
(1 2 3)
(1 2 3)
(:a 4)
(1 2 3)
(1.0 2.0 3.0)

There are also some more complex types.

import pycse.lisp as pl

print(pl.Symbol('lambda'))
print(pl.Quote('lambda'))
print(pl.SharpQuote('lambda'))
print(pl.Cons("a", 5))
print(pl.Alist(["a", 2, "b", 5]))
print(pl.Vector([1, 2, 3]))

print(pl.Backquote([]))
print(pl.Comma([1, 2, 3]))
print(pl.Splice([1, 2, 3]))
lambda
'lambda
#'lambda
("a" . 5)
(("a" . 2) ("b" . 5))
[1 2 3]
`()
,(1 2 3)
,@(1 2 3)

You can nest these too.

import pycse.lisp as pl
print(pl.Quote(pl.Alist(["a", 2, "b", 5])))
print(pl.Backquote([pl.Symbol('+'), pl.Comma(pl.Symbol('b')), 5]))
'(("a" . 2) ("b" . 5))
`(+ ,b 5)

All that means we can use Python code to generate lisp programs. Here is an example where we make two sub-programs, and combine them into an overall program, then add one more subprogram to it. We wrap the results in an emacs-lisp block, then actually run the block!

import pycse.lisp as pl

p1 = [pl.Symbol('mapcar'),
      [pl.Symbol('lambda'),
       [pl.Symbol('x')],
       [pl.Symbol('*'),
        pl.Symbol('x'),
        pl.Symbol('x')]],
      pl.Quote([1, 2, 3, 4])]

p2 = [pl.Symbol('princ'), "Hello world"]

p = [pl.Symbol('list'), p1, p2]
p.append([pl.Symbol('+'), 5, 5])

print(p.lisp)
(list (mapcar (lambda (x) (* x x)) '(1 2 3 4)) (princ "Hello world") (+ 5 5))
(1 4 9 16) Hello world 10

Wow, it worked! Here is another example of setting up a macro and then running it.

import pycse.lisp as pl
s = pl.Symbol
bq = pl.Backquote
c = pl.Comma

p1 = [s('defmacro'), s('f'), [s('x')],
      "A docstring",
      bq([s('*'), c(s('x')), 5])]


p2 = [s('f'), 5]

print(p1.lisp)

print(p2.lisp)
(defmacro f (x) "A docstring" `(* ,x 5))
(f 5)
25

I am not too sure where this will be super useful, but it is an interesting proof of concept. I haven't tested this much beyond the original post and this one. Let me know if you find issues with it.

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Read and Post Comments

Expanding orgmode.py to get better org-python integration

| categories: orgmode, python | tags: | View Comments

I have only ever been about 80% satisfied with Python/org-mode integration. I have developed a particular workflow that I like a lot, and works well for solving scientific and engineering problems. I typically use stand-alone Python blocks, i.e. not sessions. I tend to use print statements to create output that I want to see, e.g. the value of a calculation. I also tend to create multiple figures in a single block, which I want to display in the buffer. This workflow is represented extensively in PYCSE and dft-book which collectively have 700+ src blocks! So I use it alot ;)

There are some deficiencies though. For one, I have had to hand build any figures/tables that are generated from the code blocks. That means duplicating filenames, adding the captions, etc… It is not that easy to update captions from the code blocks, and there has been limited ability to use markup in the output.

Well finally I had some ideas to change this. The ideas are:

  1. Patch matplotlib so that savefig actually returns a figure link that can be printed to the output. savefig works the same otherwise.
  2. Patch matplotlib.pyplot.show to save the figure, and print a figure link in thhe output.
  3. Create special functions to generate org tables and figures.
  4. Create some other functions to generate some blocks and elements.

Then we could just import the library in our Python scripts (or add it as a prologue) and get this nice functionality. You can find the code for this here:

https://github.com/jkitchin/pycse/blob/master/pycse/orgmode.py

Finally, it seems like a good idea to specify that we want our results to be an org drawer. This makes the figures/tables export, and allows us to generate math and other markup in our programs. That has the downside of making exported results not be in the "verbatim" markup I am used to, but that may be solvable in other ways. We can make the org drawer output the default like this:

(setq org-babel-default-header-args:python
      (cons '(:results . "output org drawer replace")
            (assq-delete-all :results org-babel-default-header-args)))

With these, using Python blocks in org-mode gets quite a bit better!

Here is the first example, with savefig. I have the savefig function return the link, so we have to print it. We use this feature later. The figure is automatically inserted to the buffer. Like magic!

Here is a fun figure from http://matplotlib.org/xkcd/examples/pie_and_polar_charts/polar_scatter_demo.html

import pycse.orgmode

import numpy as np
import matplotlib.pyplot as plt
plt.xkcd()

N = 150
r = 2 * np.random.rand(N)
theta = 2 * np.pi * np.random.rand(N)
area = 200 * r**2 * np.random.rand(N)
colors = theta

ax = plt.subplot(111, polar=True)
c = plt.scatter(theta, r, c=colors, s=area, cmap=plt.cm.hsv)
c.set_alpha(0.75)

print(plt.savefig('test.png'))

How about another example with show. This just prints the link directly. It seems to make sense to do it that way. This is from http://matplotlib.org/xkcd/examples/showcase/xkcd.html .

import pycse.orgmode as org

from matplotlib import pyplot as plt
import numpy as np

plt.xkcd()

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
plt.xticks([])
plt.yticks([])
ax.set_ylim([-30, 10])

data = np.ones(100)
data[70:] -= np.arange(30)

plt.annotate(
    'THE DAY I REALIZED\nI COULD COOK BACON\nWHENEVER I WANTED',
    xy=(70, 1), arrowprops=dict(arrowstyle='->'), xytext=(15, -10))

plt.plot(data)

plt.xlabel('time')
plt.ylabel('my overall health')
plt.show()

# An intermediate result
print('Some intermediate result for x - 4 = 6:')
x = 6 + 4
org.fixed_width('x = {}'.format(x))

# And another figure
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.bar([-0.125, 1.0-0.125], [0, 100], 0.25)
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.set_xticks([0, 1])
ax.set_xlim([-0.5, 1.5])
ax.set_ylim([0, 110])
ax.set_xticklabels(['CONFIRMED BY\nEXPERIMENT', 'REFUTED BY\nEXPERIMENT'])
plt.yticks([])

plt.title("CLAIMS OF SUPERNATURAL POWERS")

plt.show()

Some intermediate result for x - 4 = 6:

x = 10

See, the figures show where they belong, with intermediate results that have some formatting, and they export correctly. Nice.

1 A Figure from Python

It has been a long desire of mine to generate full figures with captions from code blocks, and to get them where I want like this one:

Figure 3: An italicized histogram of 10000 points

Here is the code to generate the full figure. Note we use the output of savefig as the filename. That lets us save some intermediate variable construction. That seems nice.

import pycse.orgmode as org
import matplotlib.pyplot as plt
plt.xkcd()

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)

num_bins = 50
# the histogram of the data
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)

org.figure(plt.savefig('smarts.png'),
           label='fig:1',
           caption='An italicized /histogram/ of {} points'.format(len(x)),
           attributes=[('LATEX', ':width 3in'),
                       ('HTML', ':width 300'),
                       ('ORG', ':width 300')])

That is pretty awesome. You cannot put figures in more than one place like this, and you might not want to mix results with this, but it is still pretty awesome!

2 An example table.

Finally, I have wanted the same thing for tables. Here is the resulting table.

Table 1: Dependence of the energy on the encut value.
ENCUT Energy (eV)
100 11.233
200 21.233
300 31.233
400 41.233
500 51.233

Here is the code block that generated it.

import pycse.orgmode as org

data = [['<5>', '<11>'],  # Column aligners
        ['ENCUT', 'Energy (eV)'],
        None]

for encut in [100, 200, 300, 400, 500]:
    data += [[encut, 1.233 + 0.1 * encut]]

org.table(data,
          name='table-1',
          caption='Dependence of the energy on the encut value.')

The only obvious improvement on this is similar to getting images to redisplay after running a code block, it might be nice to reformat tables to make sure they are pretty looking. Otherwise this is good.

Let's go ahead and try that. Here we narrow down to the results, and align the tables in that region.

(defun org-align-visible-tables ()
  "Align all the tables in the results."
  (let ((location (org-babel-where-is-src-block-result)) start)
    (when location
      (setq start (- location 1))
      (save-restriction
        (save-excursion
          (goto-char location) (forward-line 1)
          (narrow-to-region start (org-babel-result-end))
          (goto-char (point-min))
          (while (re-search-forward org-table-any-line-regexp nil t)
            (save-excursion (org-table-align))
            (or (looking-at org-table-line-regexp)
                (forward-char 1)))
          (re-search-forward org-table-any-border-regexp nil 1))))))

(add-hook 'org-babel-after-execute-hook
          (lambda () (org-align-visible-tables)))
lambda nil (org-align-visible-tables)
lambda nil (org-refresh-images)

And that seems to solve that problem now too!

3 Miscellaneous outputs

Here are some examples of getting org-output from the pycse.orgmode module.

import pycse.orgmode as org

org.verbatim('One liner verbatim')

org.verbatim('''multiline
output
   with indentation
       at a few levels
that is verbatim.''')

org.fixed_width('your basic result')

org.fixed_width('''your
  basic
    result
on a few lines.''')

# A latex block
org.latex('\(e^{i\pi} - 1 = 0\)')

org.org(r'The equation is \(E = h \nu\).')

One liner

multiline
output
   with indentation
       at a few levels
that is verbatim.
your basic result
your
  basic
    result
on a few lines.

The equation is \(E = h \nu\).

4 Summary

This looks promising to me. There are a few things to get used to, like always having org output, and some minor differences in making figures. On the whole this looks like a big improvement though! I look forward to working with it more.

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Read and Post Comments

When in python do as Pythonistas unless...

| categories: python | tags: | View Comments

Many lisps have a when/unless conditional syntax that works like this:

(when t (print "when evaluated"))

(unless nil (print "unless evaluated"))
"when evaluated"

"unless evaluated"

Those are actually just macros that expand to the more verbose if function:

(macroexpand '(unless nil (print "unless evaluated")))
(if nil nil
  (print "unless evaluated"))

In Python, we only have this syntax for this kind of construct:

if True: print "when equivalent"

if not False: print "unless equivalent"
when equivalent
unless equivalent

I thought is would be fun to get as close as possible to the lisp syntax in Python. It is not that easy though. The benefit of a macro is we do not evaluate the arguments until they need to be evaluated. In Python, all arguments of functions are immediately evaluated.

One way to avoid this is to put code inside a function. Then it will not be executed until the function is called. So, here is an example of how to get an unless function in Python that conditionally evaluates a function.

def unless(condition, f):
    if not condition:
        return f()

def func():
    return "executed. Condition was not true."


print unless(1 > 0, func)

print unless(1 < 0, func)
None
executed. Condition was not true.

That is close, but requires us to wrap our code in a function. There does not seem to be any alternative though. It thought maybe a context manager could be used, but there does not seem to be a way to bypass the execution of the code (https://www.python.org/dev/peps/pep-0377/ ). Still, it might be a useful way to change how to think about doing some things differently in Python.

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Read and Post Comments

« Previous Page -- Next Page »