Expanding orgmode.py to get better org-python integration

| categories: orgmode, python | tags:

I have only ever been about 80% satisfied with Python/org-mode integration. I have developed a particular workflow that I like a lot, and works well for solving scientific and engineering problems. I typically use stand-alone Python blocks, i.e. not sessions. I tend to use print statements to create output that I want to see, e.g. the value of a calculation. I also tend to create multiple figures in a single block, which I want to display in the buffer. This workflow is represented extensively in PYCSE and dft-book which collectively have 700+ src blocks! So I use it alot ;)

There are some deficiencies though. For one, I have had to hand build any figures/tables that are generated from the code blocks. That means duplicating filenames, adding the captions, etc… It is not that easy to update captions from the code blocks, and there has been limited ability to use markup in the output.

Well finally I had some ideas to change this. The ideas are:

  1. Patch matplotlib so that savefig actually returns a figure link that can be printed to the output. savefig works the same otherwise.
  2. Patch matplotlib.pyplot.show to save the figure, and print a figure link in thhe output.
  3. Create special functions to generate org tables and figures.
  4. Create some other functions to generate some blocks and elements.

Then we could just import the library in our Python scripts (or add it as a prologue) and get this nice functionality. You can find the code for this here:

https://github.com/jkitchin/pycse/blob/master/pycse/orgmode.py

Finally, it seems like a good idea to specify that we want our results to be an org drawer. This makes the figures/tables export, and allows us to generate math and other markup in our programs. That has the downside of making exported results not be in the "verbatim" markup I am used to, but that may be solvable in other ways. We can make the org drawer output the default like this:

(setq org-babel-default-header-args:python
      (cons '(:results . "output org drawer replace")
            (assq-delete-all :results org-babel-default-header-args)))

With these, using Python blocks in org-mode gets quite a bit better!

Here is the first example, with savefig. I have the savefig function return the link, so we have to print it. We use this feature later. The figure is automatically inserted to the buffer. Like magic!

Here is a fun figure from http://matplotlib.org/xkcd/examples/pie_and_polar_charts/polar_scatter_demo.html

import pycse.orgmode

import numpy as np
import matplotlib.pyplot as plt
plt.xkcd()

N = 150
r = 2 * np.random.rand(N)
theta = 2 * np.pi * np.random.rand(N)
area = 200 * r**2 * np.random.rand(N)
colors = theta

ax = plt.subplot(111, polar=True)
c = plt.scatter(theta, r, c=colors, s=area, cmap=plt.cm.hsv)
c.set_alpha(0.75)

print(plt.savefig('test.png'))

How about another example with show. This just prints the link directly. It seems to make sense to do it that way. This is from http://matplotlib.org/xkcd/examples/showcase/xkcd.html .

import pycse.orgmode as org

from matplotlib import pyplot as plt
import numpy as np

plt.xkcd()

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
plt.xticks([])
plt.yticks([])
ax.set_ylim([-30, 10])

data = np.ones(100)
data[70:] -= np.arange(30)

plt.annotate(
    'THE DAY I REALIZED\nI COULD COOK BACON\nWHENEVER I WANTED',
    xy=(70, 1), arrowprops=dict(arrowstyle='->'), xytext=(15, -10))

plt.plot(data)

plt.xlabel('time')
plt.ylabel('my overall health')
plt.show()

# An intermediate result
print('Some intermediate result for x - 4 = 6:')
x = 6 + 4
org.fixed_width('x = {}'.format(x))

# And another figure
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.bar([-0.125, 1.0-0.125], [0, 100], 0.25)
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.set_xticks([0, 1])
ax.set_xlim([-0.5, 1.5])
ax.set_ylim([0, 110])
ax.set_xticklabels(['CONFIRMED BY\nEXPERIMENT', 'REFUTED BY\nEXPERIMENT'])
plt.yticks([])

plt.title("CLAIMS OF SUPERNATURAL POWERS")

plt.show()

Some intermediate result for x - 4 = 6:

x = 10

See, the figures show where they belong, with intermediate results that have some formatting, and they export correctly. Nice.

1 A Figure from Python

It has been a long desire of mine to generate full figures with captions from code blocks, and to get them where I want like this one:

Figure 3: An italicized histogram of 10000 points

Here is the code to generate the full figure. Note we use the output of savefig as the filename. That lets us save some intermediate variable construction. That seems nice.

import pycse.orgmode as org
import matplotlib.pyplot as plt
plt.xkcd()

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)

num_bins = 50
# the histogram of the data
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)

org.figure(plt.savefig('smarts.png'),
           label='fig:1',
           caption='An italicized /histogram/ of {} points'.format(len(x)),
           attributes=[('LATEX', ':width 3in'),
                       ('HTML', ':width 300'),
                       ('ORG', ':width 300')])

That is pretty awesome. You cannot put figures in more than one place like this, and you might not want to mix results with this, but it is still pretty awesome!

2 An example table.

Finally, I have wanted the same thing for tables. Here is the resulting table.

Table 1: Dependence of the energy on the encut value.
ENCUT Energy (eV)
100 11.233
200 21.233
300 31.233
400 41.233
500 51.233

Here is the code block that generated it.

import pycse.orgmode as org

data = [['<5>', '<11>'],  # Column aligners
        ['ENCUT', 'Energy (eV)'],
        None]

for encut in [100, 200, 300, 400, 500]:
    data += [[encut, 1.233 + 0.1 * encut]]

org.table(data,
          name='table-1',
          caption='Dependence of the energy on the encut value.')

The only obvious improvement on this is similar to getting images to redisplay after running a code block, it might be nice to reformat tables to make sure they are pretty looking. Otherwise this is good.

Let's go ahead and try that. Here we narrow down to the results, and align the tables in that region.

(defun org-align-visible-tables ()
  "Align all the tables in the results."
  (let ((location (org-babel-where-is-src-block-result)) start)
    (when location
      (setq start (- location 1))
      (save-restriction
        (save-excursion
          (goto-char location) (forward-line 1)
          (narrow-to-region start (org-babel-result-end))
          (goto-char (point-min))
          (while (re-search-forward org-table-any-line-regexp nil t)
            (save-excursion (org-table-align))
            (or (looking-at org-table-line-regexp)
                (forward-char 1)))
          (re-search-forward org-table-any-border-regexp nil 1))))))

(add-hook 'org-babel-after-execute-hook
          (lambda () (org-align-visible-tables)))
lambda nil (org-align-visible-tables)
lambda nil (org-refresh-images)

And that seems to solve that problem now too!

3 Miscellaneous outputs

Here are some examples of getting org-output from the pycse.orgmode module.

import pycse.orgmode as org

org.verbatim('One liner verbatim')

org.verbatim('''multiline
output
   with indentation
       at a few levels
that is verbatim.''')

org.fixed_width('your basic result')

org.fixed_width('''your
  basic
    result
on a few lines.''')

# A latex block
org.latex('\(e^{i\pi} - 1 = 0\)')

org.org(r'The equation is \(E = h \nu\).')

One liner

multiline
output
   with indentation
       at a few levels
that is verbatim.
your basic result
your
  basic
    result
on a few lines.

The equation is \(E = h \nu\).

4 Summary

This looks promising to me. There are a few things to get used to, like always having org output, and some minor differences in making figures. On the whole this looks like a big improvement though! I look forward to working with it more.

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

ob-hy.el - or better integration of hylang in org-mode

| categories: orgmode, hylang, emacs | tags:

The point of this post is to develop and test a more substantial integration of Hy into org-mode. We develop ob-hy.el here. This is based off of ob-clojure.el.

The next few blocks will get tangled to ob-hy.el. First, some variables.

(require 'ob)

(add-to-list 'org-structure-template-alist
             '("hy" "#+BEGIN_SRC hy\n?\n#+END_SRC" "<src lang=\"hy\">\n?\n</src>"))

(defvar org-babel-tangle-lang-exts)
(add-to-list 'org-babel-tangle-lang-exts '("hy" . "hy"))

(defvar org-babel-default-header-args:hy '())
(defvar org-babel-header-args:hy '((:results . "output")))
org-babel-header-args:hy

Next a function to expand the code body. This will allow us to pass vars in the header.

(defun org-babel-expand-body:hy (body params)
  "Expand BODY according to PARAMS, return the expanded body."
  (let* ((vars (mapcar #'cdr (org-babel-get-header params :var)))
         (result-params (cdr (assoc :result-params params)))
         (print-level nil)
         (print-length nil)
         (body (org-babel-trim
                (if (> (length vars) 0)
                    (concat "(let ["
                            (mapconcat
                             (lambda (var)
                               (format
                                "%S (quote %S)"
                                (car var)
                                (cdr var)))
                             vars "\n      ")
                            "]\n" body ")")
                  body))))
    (when (not (member "output" result-params))
      (setq body (format "(print (do  %s\n))" body)))
    body))
org-babel-expand-body:hy

And a function to execute the body. We still use a simple approach to write the code to a temp-file, execute it, capture the output, and delete the file. This limits things to

(defun org-babel-execute:hy (body params)
  "Execute a block of hy code with Babel."
  (let* ((temporary-file-directory ".")
         (tempfile (make-temp-file "hy-"))
         result
         (result-params (cdr (assoc :result-params params)))
         (body (org-babel-expand-body:hy body params)))

    (with-temp-file tempfile
      (insert body))

    (unwind-protect
        (progn
          (cond
           ((member "body" result-params)
            (setq result body))
           ((member "python" result-params)
            (setq result (shell-command-to-string
                          (format "hy2py %s" tempfile))))
           ((member "ast" result-params)
            (setq result (shell-command-to-string
                          (format "hy2py -a -np %s" tempfile))))
           (t
            (setq result (shell-command-to-string
                          (format "hy %s" tempfile)))))

          (org-babel-result-cond result-params
            result
            (condition-case nil (org-babel-script-escape result)
              (error result))))
      (delete-file tempfile))))

(provide 'ob-hy)
ob-hy

Now we tangle and load those blocks.

(org-babel-tangle)
(load-file "ob-hy.el")
t

Next, we do some tests. They are all simple tests.

1 Tests

1.1 Simple

(print "Hy world")
Hy world

We can see how this turns into Python:

(print "Hy world")
print(u'Hy world')

or the AST:

(print "Hy world")
Module(
    body=[Expr(value=Call(func=Name(id='print'), args=[Str(s=u'Hy world')], keywords=[], starargs=None, kwargs=None))])

Let's test :results value. It is not quite the value since we seem to get everything that is output from the script, but if you don't print stuff, it seems to get it right.

"test"
(+ 1 2 3)
6

1.2 vars in header

Here we test out adding variables to the header lines.

(print "Hy" data)
Hy world

Interesting, I am not sure where the space between them comes from. Let's check out the :results body option. It will show us the hy script that gets run.

(print "Hy" data)
(let [data (quote "world")]
(print "Hy" data))

Nothing obvious about the space there. We can test out passing block results in here.

(print data)
Hy  world

Here is the body of that:

(print data)
(let [data (quote "Hy world
")]
(print data))

2 Summary

It works well enough to make testing in org-mode pretty convenient. I can't think of anything else it "needs" right now, although communication with a repl might make it faster, and sessions are not supported at the moment. Saving that for another day ;)

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Jump to a tagged src block

| categories: orgmode, emacs | tags:

If you have a lot of src-blocks in your org-file, it might be nice to "tag" them and be able to jump around between them using tag expressions, or by the name of the block, language etc… Here we develop a way to do that and create a handy function to jump to blocks in the current buffer.

First, we look at how to "tag" a src-block. One way is to use a header like this:

#+header: :tags cool idiom two

These are not tags in the usual org-mode sense, they are just a space separated list of words we will later treat as tags. We can get the tags on a src-block with this function.

(defun src-block-tags (src-block)
  "Return tags for SRC-BLOCK (an org element)."
  (let* ((headers (-flatten
                   (mapcar 'org-babel-parse-header-arguments
                           (org-element-property :header src-block))))
         (tags (cdr (assoc :tags headers))))
    (when tags
      (split-string tags))))
src-block-tags

Now, we make a src-block with the tags "test" "one" and "idiom", and see how to tell if the block matches the tag expression "test+idiom".

(let* ((lexical-binding nil)
       (todo-only nil)
       (tags-list (src-block-tags (org-element-context)))
       (tag-expression "test+idiom"))
  (eval (cdr (org-make-tags-matcher tag-expression))))
t

It does, so we wrap that up into a function that tells us if a src-block matches some tag expression.

(defun src-block-match-tag-expression-p (src-block tag-expression)
  "Determine if SRC-BLOCK matches TAG-EXPRESSION."
  (let* ((lexical-binding nil)
         (todo-only nil)
         (tags-list (src-block-tags src-block)))
    (eval (cdr (org-make-tags-matcher tag-expression)))))
src-block-match-tag-expression-p

Here we test that on a block tagged "one three" on the expression "one-two" which means tagged one and not two.

(src-block-match-tag-expression-p (org-element-context) "one-two")
t

Those are the main pieces we need to jump around. We just need a selection tool with a list of filtered candidates. We get a list of src-block candidates to choose from in the next block as an example. Here we get blocks tagged one but not two. We can incorporate this into a selection backend like helm or ivy.

(org-element-map (org-element-parse-buffer) 'src-block
  (lambda (src-block)
    (when (src-block-match-tag-expression-p src-block "one-two")
      ;; Get a string and marker
      (cons
       (format "%15s|%15s|%s"
               (org-element-property :name src-block)
               (org-element-property :language src-block)
               (org-element-property :header src-block))
       (org-element-property :begin src-block)))))
(("    tag-matcher|     emacs-lisp|(:tags test one idiom)" . 1222)
 ("            nil|     emacs-lisp|(:tags one)" . 1641)
 ("            nil|     emacs-lisp|(:tags one three)" . 2120))

Now let us put that into ivy. We will ask for an expression to filter the blocks on, and then use ivy to narrow what is left, and the only action is to jump to the position of the selected block. You can start with a tag expression, or press enter to get all the tags. Then you can use ivy to further narrow by language, block name, or other tags.

(defun ivy-jump-to-src (tag-expression)
  (interactive "sTag expression: ")
  (ivy-read "Select: "
            (org-element-map (org-element-parse-buffer) 'src-block
              (lambda (src-block)
                (when (src-block-match-tag-expression-p src-block tag-expression)
                  ;; Get a string and marker
                  (cons
                   (format "%15s|%15s|%s"
                           (org-element-property :name src-block)
                           (org-element-property :language src-block)
                           (org-element-property :header src-block))
                   (org-element-property :begin src-block)))))
            :require-match t
            :action '(1
                      ("j" (lambda (pos) (interactive) (goto-char pos))))))
ivy-jump-to-src

For fun, here is a python block just for testing.

print(42)
42

That is it! It seems to work ok. There are some variations that might be preferrable, like putting the tags in as params in the src-block header to avoid needing a separate header line. It isn't clear how much I would use this, and it is slow if you have a lot of src blocks in a /large/org-file because of the parsing. (how large? I noticed a notable lag on my 22,800 line org-file this is in ;).

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Another approach to embedded molecular data in org-mode

| categories: chemistry, orgmode, emacs | tags:

In the last post we examined a molecule link to a src-block defining a molecule in some format. We blurred the distinction between program and data there. Here we re-separate them to try out some different ideas. We will use an org-mode special block to contain the "data" which is a molecular representation in some format. Then, we will use open-babel to convert the format to various other formats to explore using the data.

Here is a methane molecule (with 4 implicit hydrogens in the SMILES format). We put it in a named special block in org-mode, and even put a header on it to indicate the format and a display name!

C

We can use the SMILES representation block as input to a new command that converts it to the CML format, with coordinates. We use a simple shell command here and pass the contents of the molecule in as a variable. That is nice because in SMILES methane is represented by a single "C", and this CML is much more verbose.

echo $input | obabel -ismi -o cml --gen3d
<?xml version="1.0"?>
<molecule xmlns="http://www.xml-cml.org/schema">
 <atomArray>
  <atom id="a1" elementType="C" x3="1.047517" y3="-0.064442" z3="0.060284"/>
  <atom id="a2" elementType="H" x3="2.139937" y3="-0.064341" z3="0.059898"/>
  <atom id="a3" elementType="H" x3="0.683568" y3="-0.799429" z3="-0.661322"/>
  <atom id="a4" elementType="H" x3="0.683566" y3="0.927794" z3="-0.216100"/>
  <atom id="a5" elementType="H" x3="0.683669" y3="-0.321317" z3="1.056822"/>
 </atomArray>
 <bondArray>
  <bond atomRefs2="a1 a2" order="1"/>
  <bond atomRefs2="a1 a3" order="1"/>
  <bond atomRefs2="a1 a4" order="1"/>
  <bond atomRefs2="a1 a5" order="1"/>
 </bondArray>
</molecule>

We can also use the CML output as input to a command that generates an SVG image, again, passing the CML in via a variable in the header.

echo $cml | obabel -icml -o svg

With our previous molecule link we can refer to these in our text now as methane-smiles and methane-cml.

So far it all looks good. Let us do something new. We will use the SMILES representation to create an ase.atoms object in Python. First, we create an xyz format that ase can read. Rather than clutter up our document with the output, we silence it.

echo $input | obabel -ismi -o xyz --gen3d

Now, we can use the string generated in a Python file to generate a tempfile (or you could have saved the result above to a file and just read it in here). I was too lazy to make the file link to the image myself, so I setup a :file header and just print the result to stdout in this block. Although all we do here is create a new image, this demonstrates you can use data from a MOLECULE block and pass it into a Python script where other kinds of calculations might occur.

from ase.io import read, write

from tempfile import mkstemp
fd, fname = mkstemp(suffix=".xyz")
with open(fname, 'w') as f:
    f.write(xyz)

atoms = read(fname)
write('-', atoms, format="png")

The last point to discuss is discoverability. It would be helpful if we could use a program to "extract" molecular information about the molecules we use in our work. Here is a block that will map over the MOLECULE blocks and summarize what is found with a common format (SMILES again). We generate a table of clickable links to each molecule found in the documents. There is a small appendix in this document containing h2o and caffeine that will show in this table.

(defun mlc-to-smiles (blk)
  "Convert a molecule BLK to smiles format using openbabel."
  (let* ((headers (-flatten
                   (mapcar 'org-babel-parse-header-arguments
                           (org-element-property :header blk))))
         (format (cdr (assoc :format headers)))
         (content (buffer-substring-no-properties
                   (org-element-property :contents-begin blk)
                   (org-element-property :contents-end blk)))
         (tempfile (make-temp-file "obabel-")))
    (with-temp-file tempfile
      (insert content))

    ;; convert to smiles. This outputs a smiles string and the file it was
    ;; generated from. I don't know how to suppress the file, so we use awk to
    ;; just get the SMILEs strings. It is not pretty. I know.
    (prog1
        (s-trim (shell-command-to-string
                 (format  "obabel %s %s -osmi 2> /dev/null | awk '{print $1}'"
                          (format "-i%s" format) tempfile)))
      (delete-file tempfile))))


;; Generate the table of molecules
(append '(("Display name" "Name" "format" "SMILES representation"))
        '(hline)
        (org-element-map (org-element-parse-buffer) 'special-block
          (lambda (sb)
            (when (string= "MOLECULE" (org-element-property :type sb))
              (let ((headers (-flatten
                              (mapcar 'org-babel-parse-header-arguments
                                      (org-element-property :header sb)))))

                (list
                 (format "[[molecule:%s][%s]]" (org-element-property :name sb)
                         (cdr (assoc :display-name headers)))
                 (org-element-property :name sb)
                 (cdr (assoc :format headers))
                 (mlc-to-smiles sb)))))))
Display name Name format SMILES representation
methane-smiles methane-smiles smiles C
h2o h2o cml OO
caffeine caffeine xyz Cn1cnc2n(C)c(=O)n(C)c(=O)c12

That seems pretty discoverable to me. We not only can discover the molecules in this post, but can pretty easily convert them to other formats (SMILES) in this case. Since we can run any code we want on them, we could just as well import them to a database, or do subsequent calculations on them.

The MOLECULE block is not standard, and I have only demonstrated here that it is suitable for this purpose. But, it looks like we could extend it and deal with a variety of formats. We can use headers to add metadata, format, etc… Some features I find missing are similar to those in code blocks where we can type C-c ' to edit them in special modes, and the nice syntax highlighting that often comes with that.

It might be helpful to make the export of MOLECULE blocks nicer looking and more functional. The default export, for example doesn't put an id attribute in the block. First, we rewrite an org-function to add the id attribute to the exported blocks so our molecule links will work.

(defun org-html-special-block (special-block contents info)
  "Transcode a SPECIAL-BLOCK element from Org to HTML.
CONTENTS holds the contents of the block.  INFO is a plist
holding contextual information."
  (let* ((block-type (downcase
                      (org-element-property :type special-block)))
         (contents (or contents ""))
         (html5-fancy (and (org-html-html5-p info)
                           (plist-get info :html-html5-fancy)
                           (member block-type org-html-html5-elements)))
         (attributes (org-export-read-attribute :attr_html special-block)))
    (unless html5-fancy
      (let ((class (plist-get attributes :class)))
        (setq attributes (plist-put attributes :class
                                    (if class (concat class " " block-type)
                                      block-type)))
        (when (org-element-property :name special-block)
          (setq attributes (plist-put
                            attributes :id
                            (org-element-property :name special-block))))))
    (setq attributes (org-html--make-attribute-string attributes))
    (when (not (equal attributes ""))
      (setq attributes (concat " " attributes)))
    (if html5-fancy
        (format "<%s%s>\n%s</%s>" block-type attributes
                contents block-type)
      (format "<div%s>\n%s\n</div>" attributes contents))))
org-html-special-block

It would be nice to add some additional information around the block, e.g. that it is a molecule, maybe some tooltip about the format, etc…, but we leave that to another day. These should probably be handled specially with a dedicated export function. You will note that MOLECULE blocks don't export too well, they should probably be wrapped in <pre> for HTML export. We will at least make them stand out with this bit of css magic.

#+HTML_HEAD_EXTRA:  <style>.molecule {background-color:LightSkyBlue;}</style>

1 Summary thoughts

This looks pretty promising as a way to embed molecular data into org-files so that the data is reusable and discoverable. If there is metadata that cannot go into the MOLECULE format we can put it in headers instead. This seems like it could be useful.

2 Appendix of molecules

2.1 Water

Here is water in the CML format.

<?xml version="1.0"?> <molecule xmlns="http://www.xml-cml.org/schema"> <atomArray> <atom id="a1" elementType="O"/> <atom id="a2" elementType="O"/> </atomArray> <bondArray> <bond atomRefs2="a1 a2" order="1"/> </bondArray> </molecule>

2.2 Caffeine

This is a simple xyz format of caffeine.

24

C 1.02887 -0.01688 -0.03460 N 2.46332 0.11699 -0.03522 C 3.33799 -0.94083 -0.03530 N 4.59156 -0.53767 -0.03594 C 4.50847 0.82120 -0.03623 N 5.57252 1.69104 -0.03687 C 6.93040 1.17620 -0.03898 C 5.33446 3.06602 -0.03685 O 6.26078 3.88171 -0.03594 N 3.98960 3.48254 -0.03830 C 3.70813 4.90531 -0.04199 C 2.87287 2.63769 -0.03747 O 1.71502 3.04777 -0.03830 C 3.21603 1.25723 -0.03610 H 0.54478 0.95872 -0.03440 H 0.73663 -0.56946 0.86233 H 0.73584 -0.56959 -0.93118 H 3.00815 -1.97242 -0.03493 H 7.67209 1.97927 -0.03815 H 7.07929 0.56516 -0.93486 H 7.08112 0.56135 0.85404 H 4.61163 5.51902 -0.04152 H 3.11230 5.15092 0.84340 H 3.11643 5.14660 -0.93127

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

A molecule link for org-mode

| categories: chemistry, orgmode, emacs | tags:

Here I am exploring some ideas on compact and functional representations of molecules in org-mode. We will use some functionality from OpenBabel (https://openbabel.org/docs/dev/index.html ) for conversion of formats.

One approach we could use is the SMILES representation. OpenBabel provides tools to convert SMILES to a visualization like this. Let's check out an old favorite: caffeine.

obabel -:"Cn1cnc2n(C)c(=O)n(C)c(=O)c12" -osvg

We can imagine the SMILES string is a program, and use an org-mode src block to contain it. It isn't quite a program, as it is more like data, but we can make the block executable if we define how to "execute" the block, and for that we will just have obabel generate the svg representation of the molecule. Here is our execute function. It simply generates the svg to stdout. We can use a :file header to capture it in a file.

(defun org-babel-execute:smiles (body params)
  (shell-command-to-string
   (format "obabel -:\"%s\" -osvg 2> /dev/null" body)))
org-babel-execute:smiles

You can find a smiles block in Appendix of molecules that was adapted from here .

Now, we need a link to refer to our molecule. We want the follow action to jump to our src block which should have a name. We will have it export as the name of the block linked to the molecule definition. This should work fine for definitions in the document. It is not robust to link to molecules in other org-files in the export. That would require those files to be exported too. For now we just define an HTML export.

(defun molecule-jump (name)
  (org-mark-ring-push)
  (org-open-link-from-string (format "[[%s]]" path)))

(defun molecule-export (path desc backend)
  (let ((name (save-window-excursion
                (molecule-jump path)
                (org-element-property :name (org-element-context)))))
    (cond
     ((eq 'html backend)
      (format "<a href=\"#%s\">%s</a>" name name)))))

(org-add-link-type
 "molecule"
 'molecule-jump
 'molecule-export)

Now we link to LSD and ethanol that allows us to navigate to the definition. We can also refer to a molecule in another file like ethanol. The links are clickable, and should jump to the molecule definition. On export to HTML they will be links to the definition.

Our link provides some limited functionality. We can provide more by making the follow action open a menu for example. Instead, we created a major mode here. It provides a function to convert smiles to CML. It is readily extensible to do other conversions.

One of the reasons we want to have molecules as "data" is so we can find them in our papers. Here is an example of that. We defined two molecules in the Appendix, and we find them here.

(org-element-map (org-element-parse-buffer)
    'src-block
  (lambda (src)
    (when (string= "smiles" (org-element-property :language src))
      (org-element-property :name src))))
LSD ethanol

There is still a lot to do to make this really functional. For example, we might want to use the molecules to write reactions. We might also want more advanced conversion or lookup functions, and more export options. It might be desirable to have tooltips on the links to see the molecules too. No doubt one might want to fine-tune the way the blocks run, so that options could be passed as header args. Maybe I will work on that another day.

1 Appendix of molecules

Here is an example smiles block.

CCN(CC)C(=O)[C@H]1CN(C)[C@@H]2Cc3c[nH]c4cccc(C2=C1)c34

CCO

2 smiles major mode

It would be nice to have a language mode to do special edits of SMILES src blocks. This mode does very little but provide a function that converts SMILES to CML using obabel and open it in a buffer. We redirect stderr to /dev/null to avoid seeing the messages from obabel. We also provide another function that opens a browser to names of the molecule.

(require 'easymenu)

(defun smiles-cml ()
  "Convert the smiles string in the buffer to CML."
  (interactive)
  (let ((smiles (buffer-string)))
    (switch-to-buffer (get-buffer-create "SMILES-CML"))
    (erase-buffer)
    (insert
     (shell-command-to-string
      (format "obabel -:\"%s\" -ocml 2> /dev/null"
              smiles)))
    (goto-char (point-min))
    (xml-mode)))

(defun smiles-names ()
  (interactive)
  (browse-url
   (format "http://cactus.nci.nih.gov/chemical/structure/%s/names"
           (buffer-string))))

(defvar smiles-mode-map
  nil
  "Keymap for smiles-mode.")

;; adapted from http://ergoemacs.org/emacs/elisp_menu_for_major_mode.html
(define-derived-mode smiles-mode fundamental-mode "smiles-mode"
  "Major mode for SMILES code."
  (setq buffer-invisibility-spec '(t)
        mode-name " ☺")

  (when (not smiles-mode-map)
    (setq smiles-mode-map (make-sparse-keymap)))
  (define-key smiles-mode-map (kbd "C-c C-c") 'smiles-cml)
  (define-key smiles-mode-map (kbd "C-c C-n") 'smiles-names)

  (define-key smiles-mode-map [menu-bar] (make-sparse-keymap))

  (let ((menuMap (make-sparse-keymap "SMILES")))
    (define-key smiles-mode-map [menu-bar smiles] (cons "SMILES" menuMap))

    (define-key menuMap [cml]
      '("CML" . smiles-cml))
    (define-key menuMap [names]
      '("Names" . smiles-names))))
smiles-mode

Copyright (C) 2016 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter
« Previous Page -- Next Page »