New publication in Surface Science on data sharing

| categories: publication, news | tags:

In this perspective we illustrate how we use org-mode to prepare manuscripts and supporting information files that are rich in data, and that make it easy to share the code we use for our analysis. We use the supporting information file from boes-2015-core-cu to show examples of how to extract the data, and reuse it in new analyses. This approach works for both computational and experimental data. You can see the manuscript I submitted here: ss-manuscript-2015-05-07.zip , and the org file that generated it here: ss-manuscript.org . The references from the manuscript are contained here ss-manuscript.bib

http://www.sciencedirect.com/science/article/pii/S0039602815001326

@article{kitchin-2015-data-surfac-scien,
  author =       "John R. Kitchin",
  title =        {Data Sharing in Surface Science},
  journal =      "Surface Science ",
  number =       0,
  pages =        " - ",
  year =         2015,
  doi =          {10.1016/j.susc.2015.05.007},
  url =
                  "http://www.sciencedirect.com/science/article/pii/S0039602815001326",
  issn =         "0039-6028",
  keywords =     "Data sharing ",
}

Bibliography

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Python data structures to lisp

| categories: lisp, emacs, python | tags:

I have an idea in mind that would use the output of python scripts in lisp functions. Xah Lee posted an idea for writing emacs commands in scripting languages . In this post I want to explore an extension of the idea, where a Python script will return output that can be read in Lisp, e.g. we can convert a Python list to a lisp list, or a dictionary to an a-list or p-list. I can already see that simple data structures will be "simple", and arbitrary data structures will offer a lot of challenges, e.g. nested lists or dictionaries…

If I could add some custom functions to the basic builtin types in Python, then I could use another approach to format python objects as lisp data types. This isn't recommended by Pythonistas, but I guess they don't want to use lisp as much as I do ;) I found this approach to modifying builtins:

http://stackoverflow.com/questions/2444680/how-do-i-add-my-own-custom-attributes-to-existing-built-in-python-types-like-a

We use that almost verbatim here to get what I want. This is a super low level way to add functions to the builtins. I add some simple formatting to floats, ints and strings. I add a more complex recursive formatting function to lists, tuples and dictionaries. A dictionary can be represented as an alist or plist. Both examples are shown, but I leave the alist version commented out. Finally, we add a lispify function to numpy arrays.

import ctypes as c

class PyObject_HEAD(c.Structure):
    _fields_ = [('HEAD', c.c_ubyte * (object.__basicsize__ -
                                      c.sizeof(c.c_void_p))),
                ('ob_type', c.c_void_p)]

_get_dict = c.pythonapi._PyObject_GetDictPtr
_get_dict.restype = c.POINTER(c.py_object)
_get_dict.argtypes = [c.py_object]

def get_dict(object):
    return _get_dict(object).contents.value

get_dict(str)['lisp'] = lambda s:'"{}"'.format(str(s))
get_dict(float)['lisp'] = lambda f:'{}'.format(str(f))
get_dict(int)['lisp'] = lambda f:'{}'.format(str(f))

import collections
import numpy as np

def lispify(L):
    "Convert a Python object L to a lisp representation."
    if (isinstance(L, str)
        or isinstance(L, float)
        or isinstance(L, int)):
        return L.lisp()
    elif (isinstance(L, list)
          or isinstance(L, tuple)
          or isinstance(L, np.ndarray)):
        s = []
        for element in L:
            s += [element.lisp()]
        return '(' + ' '.join(s) + ')'
    elif isinstance(L, dict):
        s = []
        for key in L:
            # alist format
            # s += ["({0} . {1})".format(key, L[key].lisp())]
            # plist
            s += [":{0} {1}".format(key, L[key].lisp())]
        return '(' + ' '.join(s) + ')'

get_dict(list)['lisp'] = lispify
get_dict(tuple)['lisp'] = lispify
get_dict(dict)['lisp'] = lispify
get_dict(np.ndarray)['lisp'] = lispify

Let us test these out.

from pylisp import *
a = 4.5
print int(a).lisp()
print a.lisp()
print "test".lisp()

print [1, 2, 3].lisp()
print (1, 2, 3).lisp()

print [[1, 3], (5, 6)].lisp()

print {"a": 5}.lisp()
print [[1, 3], (5, 6), {"a": 5, "b": "test"}].lisp()


A = np.array([1, 3, 4])
print A.lisp()
print ({"tree": [5, 6]}, ["a", 4, "list"], 5, 2.0 / 3.0).lisp()
4
4.5
"test"
(1 2 3)
(1 2 3)
((1 3) (5 6))
(:a 5)
((1 3) (5 6) (:a 5 :b "test"))
(1 3 4)
((:tree (5 6)) ("a" 4 "list") 5 0.666666666667)

Now, is that better than a single lisp function with a lot of conditionals to handle each type? I am not sure. This seems to work pretty well.

Here is how I imagine using this idea. We would have some emacs-lisp variables and use them to dynamically generate a python script. We run the python script, capturing the output, and read it back in as a lisp data structure. Here is a simple kind of example that generates a dictionary.

(let* ((elisp-var 6)
       (result)
      (script (format "
from pylisp import *
print {x: [2*y for y in range(x)] for x in range(1, %s)}.lisp()
" elisp-var)))

  ;; start a python process
  (run-python)
  (setq result (read (python-shell-send-string-no-output
   script)))
  (plist-get result :5))
(0 2 4 6 8)

That seems to work pretty well. One alternative idea to this is Pymacs , which I have written about before . This project isn't currently under active development, and I ran into some difficulties with it before.

Here we can solve the problem I previously posed and get the result back as an elisp float, and then reuse the result

(let* ((myvar 3)
       (script (format "from pylisp import *
from scipy.optimize import fsolve
def objective(x):
    return x - 5

ans, = fsolve(objective, %s)
print ans.lisp()" myvar)))
  (run-python)
  (setq result (read (python-shell-send-string-no-output
                       script)))
  (- 5 result))
0.0

Bottom line: we can write python code in lisp functions that are dynamically updated, execute them, and get lisp data structures back for simple data types. I think that could be useful in some applications, where it is easier to do parsing/analysis in Python, but you want to do something else that is easier in Lisp.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Another approach to embedding org-source in html

| categories: data, orgmode | tags:

In this post I examined a way to embed the org-source in a comment in the html of the post, and developed a reasonably convenient way to extract the source in emacs. One downside of the approach was the need to escape at least the dashes, and then unescape them on extraction. I came across another idea, which is to put the org-source in base64 encoded form in a data uri .

First let us see what the encoding means:

(base64-encode-string "<!-- test-->")
PCEtLSB0ZXN0LS0+

And decoding:

(base64-decode-string "PCEtLSB0ZXN0LS0+")
<!-- test-->

The encoding looks random, but it is reversible. More importantly, it probably will not have any html like characters in it that need escaped. The idea of a data uri is that the data it serves is embedded in the URL href attribute. This is basically how to make a data uri. We give the url here a class so we can find it later.

<a class="some-org-source" href="data:text/plain;charset=US-ASCII;base64,PCEtLSB0ZXN0LS0+">source</a>

Here is the actual html for the browser. If you click on it, your browser automatically decodes it for you!

source

So, during the blog publish step, we just need to add this little step to the html generation, and it will be included as a data uri. Here is the function that generates the data uri for us, and example of using it. The encoded source is not at all attractive to look at it, but you almost never need to look at it, it is invisible in the browser. Interestingly, if you click on the link, you will see the org source right in your browser!

(defun source-data-uri (source)
  "Encode the string in SOURCE to a data uri."
  (format
   "<a class=\"org-source\" href=\"data:text/plain;charset=US-ASCII;base64,%s\">source</a>"
   (base64-encode-string source)))

(source-data-uri (buffer-string))
source

Now, we integrate it into the blogofile function:

(defun bf-get-post-html ()
  "Return a string containing the YAML header, the post html, my
copyright line, and a link to the org-source code."
  (interactive)
  (let ((org-source (buffer-string))
        (url-to-org (bf-get-url-to-org-source))
        (yaml (bf-get-YAML-heading))
        (body (bf-get-HTML)))

    (with-temp-buffer
      (insert yaml)
      (insert body)
      (insert
       (format "<p>Copyright (C) %s by John Kitchin. See the <a href=\"/copying.html\">License</a> for information about copying.<p>"
               (format-time-string "%Y")))
      (insert (format "<p><a href=\"%s\">org-mode source</a><p>"
                      url-to-org))
      (insert (format "<p>Org-mode version = %s</p>" (org-version)))
      ;; this is the only new code we need to add.
      (insert (source-data-uri org-source))
      ;; return value
      (buffer-string))))

Now we need a new adaptation of the grab-org-source function. We still need a regexp search to get the source, and we still need to decode it.

(defun grab-org-source (url)
  "Extract org-source from URL to a buffer named *grab-org-source*."
  (interactive "sURL: ")
  (switch-to-buffer (get-buffer-create "*grab-org-source*"))
  (erase-buffer)
  (org-mode)
  (insert
   (with-current-buffer
       (url-retrieve-synchronously url)
     (let (start)
       (re-search-forward
        "<a class=\"org-source\" href=\"data:text/plain;charset=US-ASCII;base64,\\([^\"]*\\)\\\">" nil t)
       (base64-decode-string  (match-string 1))))))

What else could we do with this? One idea would be to generate data uris for each code block that you could open in your browser. For example, here we generate a list of data uris for each code block in the buffer. We don't take care to label them or make it easy to see what they are, but if you click on one, you should see a plain text version of the block. If this is done a lot, it might even make sense to change the mime type to download the code in some native app.

(org-element-map (org-element-parse-buffer) 'src-block
  (lambda (src-block)
    (source-data-uri (org-element-property :value src-block))))
(source source source source source source)

I am not sure if this is better or worse than the other approach. I have not tested it very thoroughly, but it seems like it should work pretty generally. I imagine you could also embed other kinds of files in the html, if for some reason you did not want to put the files on your server. Overall this seems to lack some elegance in searching for data, e.g. like RDF or RDFa is supposed to enable, but it might be a step in that direction, using org-mode and Emacs as the editor.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

source
Discuss on Twitter

An alternative approach to including org-source in blog posts

| categories: orgmode | tags:

When you publish a Matlab m-file to HTML, Matlab includes the m-file source as an html comment in the output. They also provide a nice function called grabcode that will take a url, and open the source code in the editor. Today, we try a similar approach for org-mode.

This post is not totally self-contained. I have my own emacs-lisp module that converts org-mode to blogofile posts, and so far I have not made it broadly available. This is also a super exploratory idea, so I am just going to show the changes I need to make to my setup to get to the evaluation of the idea.

The idea is pretty simple, we just insert the current buffer string into an HTML comment. I just modify the bf-get-post-html function lightly to do that. This is a somewhat pathological example since there are html comments in the post! So, we will encode all the dashes to get around that.

(require 'browse-url)
(defun bf-get-post-html ()
  "Return a string containing the YAML header, the post html, my
copyright line, and a link to the org-source code."
  (interactive)
  (let ((org-source (buffer-string))
        (url-to-org (bf-get-url-to-org-source))
        (yaml (bf-get-YAML-heading))
        (body (bf-get-HTML)))

    (with-temp-buffer
      (insert yaml)
      (insert body)
      (insert
       (format "<p>Copyright (C) %s by John Kitchin. See the <a href=\"/copying.html\">License</a> for information about copying.<p>"
               (format-time-string "%Y")))
      (insert (format "<p><a href=\"%s\">org-mode source</a><p>"
                      url-to-org))
      (insert (format "<p>Org-mode version = %s</p>" (org-version)))
      ;; this is the only new code we need to add.
      (insert (format "
<!--
  ##### SOURCE BEGIN #####
%s
##### SOURCE END #####
-->" (browse-url-url-encode-chars org-source "[-]")))
      ;; return value
      (buffer-string))))

By itself, that has limited value to me. So, let's also create a grab-org-source function to get the embedded source and open it in a buffer. This might be a naive approach, we just use a regexp to find the source boundaries and open it in a new buffer. We have to unescape the dashes, which appear as %2D in the comments. Here is our function.

(defun grab-org-source (url)
  "Extract org-source from URL to a buffer named *grab-org-source*."
  (interactive "sURL: ")
  (switch-to-buffer (get-buffer-create "*grab-org-source*"))
  (erase-buffer)
  (org-mode)
  (insert
   (with-current-buffer
       (url-retrieve-synchronously url)
     (let (start)
       (re-search-forward
        "
<!--
  ##### SOURCE BEGIN #####
" nil t)
       (setq start (point))
       (re-search-forward "##### SOURCE END #####
-->" nil t)
       (buffer-substring start (match-beginning 0)))))
  (goto-char (point-min))
  (while (search-forward "%2D" nil t)
    (replace-match "-"))
  (goto-char (point-min)))

This concludes my basic proof of concept. I think there is a general escaping challenge in this approach, because it isn't clear if you can put really arbitrary stuff in an html comment, e.g. you cannot put –>! I am going to try incorporating this into my posts and see what other issues come up in the future.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

New publication in J. Phys. Chem. C

| categories: publication, news | tags:

In this paper we show that the electrolyte can modify the reactivity of nickel hydroxide based electrodes for electrochemical water oxidation. There are two effects that are important: 1) Fe-impurities, and 2) the identify of the electrolyte cation. Fe-impurities are known to promote water oxidation. We found that a LiOH electrolyte can suppress the oxygen evolution reaction, which is also known from the battery literature. KOH and CsOH are the best electrolytes for the OER on nickel hydroxide based electrodes.

"Alkaline Electrolyte and Fe Impurity Effects on the Performance and Active-phase Structure of NiOOH Thin Films for OER Catalysis Applications"

http://pubs.acs.org/doi/abs/10.1021/acs.jpcc.5b02458

@article{michael-2015-alkal-elect,
  author =       {Michael, John and Demeter, Ethan L and Illes, Steven M. and
                  Fan, Qingqi and Boes, Jacob R. and Kitchin, John R.},
  title =        {Alkaline Electrolyte and Fe Impurity Effects on the
                  Performance and Active-Phase Structure of NiOOH Thin Films for
                  OER Catalysis Applications},
  journal =      {The Journal of Physical Chemistry C},
  volume =       0,
  number =       {ja},
  pages =        {null},
  year =         2015,
  doi =          {10.1021/acs.jpcc.5b02458},
  url =          { https://doi.org/10.1021/acs.jpcc.5b02458 },
  eprint =       { https://doi.org/10.1021/acs.jpcc.5b02458 },
}

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter
« Previous Page -- Next Page »