## Another approach to embedded molecular data in org-mode

| categories: | tags: | View Comments

In the last post we examined a molecule link to a src-block defining a molecule in some format. We blurred the distinction between program and data there. Here we re-separate them to try out some different ideas. We will use an org-mode special block to contain the "data" which is a molecular representation in some format. Then, we will use open-babel to convert the format to various other formats to explore using the data.

Here is a methane molecule (with 4 implicit hydrogens in the SMILES format). We put it in a named special block in org-mode, and even put a header on it to indicate the format and a display name!

C

We can use the SMILES representation block as input to a new command that converts it to the CML format, with coordinates. We use a simple shell command here and pass the contents of the molecule in as a variable. That is nice because in SMILES methane is represented by a single "C", and this CML is much more verbose.

echo $input | obabel -ismi -o cml --gen3d  <?xml version="1.0"?> <molecule xmlns="http://www.xml-cml.org/schema"> <atomArray> <atom id="a1" elementType="C" x3="1.047517" y3="-0.064442" z3="0.060284"/> <atom id="a2" elementType="H" x3="2.139937" y3="-0.064341" z3="0.059898"/> <atom id="a3" elementType="H" x3="0.683568" y3="-0.799429" z3="-0.661322"/> <atom id="a4" elementType="H" x3="0.683566" y3="0.927794" z3="-0.216100"/> <atom id="a5" elementType="H" x3="0.683669" y3="-0.321317" z3="1.056822"/> </atomArray> <bondArray> <bond atomRefs2="a1 a2" order="1"/> <bond atomRefs2="a1 a3" order="1"/> <bond atomRefs2="a1 a4" order="1"/> <bond atomRefs2="a1 a5" order="1"/> </bondArray> </molecule>  We can also use the CML output as input to a command that generates an SVG image, again, passing the CML in via a variable in the header. echo$cml | obabel -icml -o svg


With our previous molecule link we can refer to these in our text now as methane-smiles and methane-cml.

So far it all looks good. Let us do something new. We will use the SMILES representation to create an ase.atoms object in Python. First, we create an xyz format that ase can read. Rather than clutter up our document with the output, we silence it.

echo $input | obabel -ismi -o xyz --gen3d  Now, we can use the string generated in a Python file to generate a tempfile (or you could have saved the result above to a file and just read it in here). I was too lazy to make the file link to the image myself, so I setup a :file header and just print the result to stdout in this block. Although all we do here is create a new image, this demonstrates you can use data from a MOLECULE block and pass it into a Python script where other kinds of calculations might occur. from ase.io import read, write from tempfile import mkstemp fd, fname = mkstemp(suffix=".xyz") with open(fname, 'w') as f: f.write(xyz) atoms = read(fname) write('-', atoms, format="png")  The last point to discuss is discoverability. It would be helpful if we could use a program to "extract" molecular information about the molecules we use in our work. Here is a block that will map over the MOLECULE blocks and summarize what is found with a common format (SMILES again). We generate a table of clickable links to each molecule found in the documents. There is a small appendix in this document containing h2o and caffeine that will show in this table. (defun mlc-to-smiles (blk) "Convert a molecule BLK to smiles format using openbabel." (let* ((headers (-flatten (mapcar 'org-babel-parse-header-arguments (org-element-property :header blk)))) (format (cdr (assoc :format headers))) (content (buffer-substring-no-properties (org-element-property :contents-begin blk) (org-element-property :contents-end blk))) (tempfile (make-temp-file "obabel-"))) (with-temp-file tempfile (insert content)) ;; convert to smiles. This outputs a smiles string and the file it was ;; generated from. I don't know how to suppress the file, so we use awk to ;; just get the SMILEs strings. It is not pretty. I know. (prog1 (s-trim (shell-command-to-string (format "obabel %s %s -osmi 2> /dev/null | awk '{print$1}'"
(format "-i%s" format) tempfile)))
(delete-file tempfile))))

;; Generate the table of molecules
(append '(("Display name" "Name" "format" "SMILES representation"))
'(hline)
(org-element-map (org-element-parse-buffer) 'special-block
(lambda (sb)
(when (string= "MOLECULE" (org-element-property :type sb))

(list
(format "[[molecule:%s][%s]]" (org-element-property :name sb)
(org-element-property :name sb)
(mlc-to-smiles sb)))))))

Display name Name format SMILES representation
methane-smiles methane-smiles smiles C
h2o h2o cml OO
caffeine caffeine xyz Cn1cnc2n(C)c(=O)n(C)c(=O)c12

That seems pretty discoverable to me. We not only can discover the molecules in this post, but can pretty easily convert them to other formats (SMILES) in this case. Since we can run any code we want on them, we could just as well import them to a database, or do subsequent calculations on them.

The MOLECULE block is not standard, and I have only demonstrated here that it is suitable for this purpose. But, it looks like we could extend it and deal with a variety of formats. We can use headers to add metadata, format, etc… Some features I find missing are similar to those in code blocks where we can type C-c ' to edit them in special modes, and the nice syntax highlighting that often comes with that.

It might be helpful to make the export of MOLECULE blocks nicer looking and more functional. The default export, for example doesn't put an id attribute in the block. First, we rewrite an org-function to add the id attribute to the exported blocks so our molecule links will work.

(defun org-html-special-block (special-block contents info)
"Transcode a SPECIAL-BLOCK element from Org to HTML.
CONTENTS holds the contents of the block.  INFO is a plist
holding contextual information."
(let* ((block-type (downcase
(org-element-property :type special-block)))
(contents (or contents ""))
(html5-fancy (and (org-html-html5-p info)
(plist-get info :html-html5-fancy)
(member block-type org-html-html5-elements)))
(unless html5-fancy
(let ((class (plist-get attributes :class)))
(setq attributes (plist-put attributes :class
(if class (concat class " " block-type)
block-type)))
(when (org-element-property :name special-block)
(setq attributes (plist-put
attributes :id
(org-element-property :name special-block))))))
(setq attributes (org-html--make-attribute-string attributes))
(when (not (equal attributes ""))
(setq attributes (concat " " attributes)))
(if html5-fancy
(format "<%s%s>\n%s</%s>" block-type attributes
contents block-type)
(format "<div%s>\n%s\n</div>" attributes contents))))

org-html-special-block


It would be nice to add some additional information around the block, e.g. that it is a molecule, maybe some tooltip about the format, etc…, but we leave that to another day. These should probably be handled specially with a dedicated export function. You will note that MOLECULE blocks don't export too well, they should probably be wrapped in <pre> for HTML export. We will at least make them stand out with this bit of css magic.

#+HTML_HEAD_EXTRA:  <style>.molecule {background-color:LightSkyBlue;}</style>


## 1 Summary thoughts

This looks pretty promising as a way to embed molecular data into org-files so that the data is reusable and discoverable. If there is metadata that cannot go into the MOLECULE format we can put it in headers instead. This seems like it could be useful.

## 2 Appendix of molecules

### 2.1 Water

Here is water in the CML format.

<?xml version="1.0"?> <molecule xmlns="http://www.xml-cml.org/schema"> <atomArray> <atom id="a1" elementType="O"/> <atom id="a2" elementType="O"/> </atomArray> <bondArray> <bond atomRefs2="a1 a2" order="1"/> </bondArray> </molecule>

### 2.2 Caffeine

This is a simple xyz format of caffeine.

24

C 1.02887 -0.01688 -0.03460 N 2.46332 0.11699 -0.03522 C 3.33799 -0.94083 -0.03530 N 4.59156 -0.53767 -0.03594 C 4.50847 0.82120 -0.03623 N 5.57252 1.69104 -0.03687 C 6.93040 1.17620 -0.03898 C 5.33446 3.06602 -0.03685 O 6.26078 3.88171 -0.03594 N 3.98960 3.48254 -0.03830 C 3.70813 4.90531 -0.04199 C 2.87287 2.63769 -0.03747 O 1.71502 3.04777 -0.03830 C 3.21603 1.25723 -0.03610 H 0.54478 0.95872 -0.03440 H 0.73663 -0.56946 0.86233 H 0.73584 -0.56959 -0.93118 H 3.00815 -1.97242 -0.03493 H 7.67209 1.97927 -0.03815 H 7.07929 0.56516 -0.93486 H 7.08112 0.56135 0.85404 H 4.61163 5.51902 -0.04152 H 3.11230 5.15092 0.84340 H 3.11643 5.14660 -0.93127

org-mode source

Org-mode version = 8.2.10

## A molecule link for org-mode

| categories: | tags: | View Comments

Here I am exploring some ideas on compact and functional representations of molecules in org-mode. We will use some functionality from OpenBabel (https://openbabel.org/docs/dev/index.html ) for conversion of formats.

One approach we could use is the SMILES representation. OpenBabel provides tools to convert SMILES to a visualization like this. Let's check out an old favorite: caffeine.

obabel -:"Cn1cnc2n(C)c(=O)n(C)c(=O)c12" -osvg


We can imagine the SMILES string is a program, and use an org-mode src block to contain it. It isn't quite a program, as it is more like data, but we can make the block executable if we define how to "execute" the block, and for that we will just have obabel generate the svg representation of the molecule. Here is our execute function. It simply generates the svg to stdout. We can use a :file header to capture it in a file.

(defun org-babel-execute:smiles (body params)
(shell-command-to-string
(format "obabel -:\"%s\" -osvg 2> /dev/null" body)))

org-babel-execute:smiles


You can find a smiles block in Appendix of molecules that was adapted from here .

Now, we need a link to refer to our molecule. We want the follow action to jump to our src block which should have a name. We will have it export as the name of the block linked to the molecule definition. This should work fine for definitions in the document. It is not robust to link to molecules in other org-files in the export. That would require those files to be exported too. For now we just define an HTML export.

(defun molecule-jump (name)
(org-mark-ring-push)

(defun molecule-export (path desc backend)
(let ((name (save-window-excursion
(molecule-jump path)
(org-element-property :name (org-element-context)))))
(cond
((eq 'html backend)
(format "<a href=\"#%s\">%s</a>" name name)))))

"molecule"
'molecule-jump
'molecule-export)


Now we link to LSD and ethanol that allows us to navigate to the definition. We can also refer to a molecule in another file like ethanol. The links are clickable, and should jump to the molecule definition. On export to HTML they will be links to the definition.

Our link provides some limited functionality. We can provide more by making the follow action open a menu for example. Instead, we created a major mode here. It provides a function to convert smiles to CML. It is readily extensible to do other conversions.

One of the reasons we want to have molecules as "data" is so we can find them in our papers. Here is an example of that. We defined two molecules in the Appendix, and we find them here.

(org-element-map (org-element-parse-buffer)
'src-block
(lambda (src)
(when (string= "smiles" (org-element-property :language src))
(org-element-property :name src))))

 LSD ethanol

There is still a lot to do to make this really functional. For example, we might want to use the molecules to write reactions. We might also want more advanced conversion or lookup functions, and more export options. It might be desirable to have tooltips on the links to see the molecules too. No doubt one might want to fine-tune the way the blocks run, so that options could be passed as header args. Maybe I will work on that another day.

## 1 Appendix of molecules

Here is an example smiles block.

CCN(CC)C(=O)[C@H]1CN(C)[C@@H]2Cc3c[nH]c4cccc(C2=C1)c34

CCO


## 2 smiles major mode

It would be nice to have a language mode to do special edits of SMILES src blocks. This mode does very little but provide a function that converts SMILES to CML using obabel and open it in a buffer. We redirect stderr to /dev/null to avoid seeing the messages from obabel. We also provide another function that opens a browser to names of the molecule.

(require 'easymenu)

(defun smiles-cml ()
"Convert the smiles string in the buffer to CML."
(interactive)
(let ((smiles (buffer-string)))
(switch-to-buffer (get-buffer-create "SMILES-CML"))
(erase-buffer)
(insert
(shell-command-to-string
(format "obabel -:\"%s\" -ocml 2> /dev/null"
smiles)))
(goto-char (point-min))
(xml-mode)))

(defun smiles-names ()
(interactive)
(browse-url
(format "http://cactus.nci.nih.gov/chemical/structure/%s/names"
(buffer-string))))

(defvar smiles-mode-map
nil
"Keymap for smiles-mode.")

(define-derived-mode smiles-mode fundamental-mode "smiles-mode"
"Major mode for SMILES code."
(setq buffer-invisibility-spec '(t)
mode-name " ☺")

(when (not smiles-mode-map)
(setq smiles-mode-map (make-sparse-keymap)))
(define-key smiles-mode-map (kbd "C-c C-c") 'smiles-cml)
(define-key smiles-mode-map (kbd "C-c C-n") 'smiles-names)

'("CML" . smiles-cml))

smiles-mode