The Kitchin Research Group: data

Another approach to embedding org-source in html

Posted May 09, 2015 at 07:19 PM | categories: data, orgmode | tags:

Updated May 10, 2015 at 09:34 AM

In this post I examined a way to embed the org-source in a comment in the html of the post, and developed a reasonably convenient way to extract the source in emacs. One downside of the approach was the need to escape at least the dashes, and then unescape them on extraction. I came across another idea, which is to put the org-source in base64 encoded form in a data uri .

First let us see what the encoding means:

(base64-encode-string "<!-- test-->")

PCEtLSB0ZXN0LS0+

And decoding:

(base64-decode-string "PCEtLSB0ZXN0LS0+")

<!-- test-->

The encoding looks random, but it is reversible. More importantly, it probably will not have any html like characters in it that need escaped. The idea of a data uri is that the data it serves is embedded in the URL href attribute. This is basically how to make a data uri. We give the url here a class so we can find it later.

<a class="some-org-source" href="data:text/plain;charset=US-ASCII;base64,PCEtLSB0ZXN0LS0+">source</a>

Here is the actual html for the browser. If you click on it, your browser automatically decodes it for you!

source

So, during the blog publish step, we just need to add this little step to the html generation, and it will be included as a data uri. Here is the function that generates the data uri for us, and example of using it. The encoded source is not at all attractive to look at it, but you almost never need to look at it, it is invisible in the browser. Interestingly, if you click on the link, you will see the org source right in your browser!

(defun source-data-uri (source)
  "Encode the string in SOURCE to a data uri."
  (format
   "<a class=\"org-source\" href=\"data:text/plain;charset=US-ASCII;base64,%s\">source</a>"
   (base64-encode-string source)))

(source-data-uri (buffer-string))

source

Now, we integrate it into the blogofile function:

(defun bf-get-post-html ()
  "Return a string containing the YAML header, the post html, my
copyright line, and a link to the org-source code."
  (interactive)
  (let ((org-source (buffer-string))
        (url-to-org (bf-get-url-to-org-source))
        (yaml (bf-get-YAML-heading))
        (body (bf-get-HTML)))

    (with-temp-buffer
      (insert yaml)
      (insert body)
      (insert
       (format "<p>Copyright (C) %s by John Kitchin. See the <a href=\"/copying.html\">License</a> for information about copying.<p>"
               (format-time-string "%Y")))
      (insert (format "<p><a href=\"%s\">org-mode source</a><p>"
                      url-to-org))
      (insert (format "<p>Org-mode version = %s</p>" (org-version)))
      ;; this is the only new code we need to add.
      (insert (source-data-uri org-source))
      ;; return value
      (buffer-string))))

Now we need a new adaptation of the grab-org-source function. We still need a regexp search to get the source, and we still need to decode it.

(defun grab-org-source (url)
  "Extract org-source from URL to a buffer named *grab-org-source*."
  (interactive "sURL: ")
  (switch-to-buffer (get-buffer-create "*grab-org-source*"))
  (erase-buffer)
  (org-mode)
  (insert
   (with-current-buffer
       (url-retrieve-synchronously url)
     (let (start)
       (re-search-forward
        "<a class=\"org-source\" href=\"data:text/plain;charset=US-ASCII;base64,\\([^\"]*\\)\\\">" nil t)
       (base64-decode-string  (match-string 1))))))

What else could we do with this? One idea would be to generate data uris for each code block that you could open in your browser. For example, here we generate a list of data uris for each code block in the buffer. We don't take care to label them or make it easy to see what they are, but if you click on one, you should see a plain text version of the block. If this is done a lot, it might even make sense to change the mime type to download the code in some native app.

(org-element-map (org-element-parse-buffer) 'src-block
  (lambda (src-block)
    (source-data-uri (org-element-property :value src-block))))

(source source source source source source)

I am not sure if this is better or worse than the other approach. I have not tested it very thoroughly, but it seems like it should work pretty generally. I imagine you could also embed other kinds of files in the html, if for some reason you did not want to put the files on your server. Overall this seems to lack some elegance in searching for data, e.g. like RDF or RDFa is supposed to enable, but it might be a step in that direction, using org-mode and Emacs as the editor.

org-mode source

Org-mode version = 8.2.10

source

Discuss on Twitter

Git archives for data sharing

Posted October 26, 2013 at 06:49 PM | categories: data | tags:

1. A molecule
2. A bulk calculation
3. Analysis via the JSON files
4. Create the archive file
5. Summary

in some past posts we have looked at constructing JSON files for data sharing. While functional, that approach requires some extra work to create the data files for sharing, and may not be useful for all sorts of data. For instance you may not want to store electron density in a JSON file.

Here we consider using git archives for packaging exactly the data you used in the same hierarchy as on your file system. The idea is to store your work in a git repository. You commit any data files you would want to share, and then create an archive of that data. This enables you to control what gets shared, while keeping the data that should not be shared out of the archive.

We will run a few VASP calculations, and summarize each one in a JSON file. We will commit those JSON files to the git repository, and finally make a small archive that contains them.

1 A molecule

Calculate the total energy of a CO molecule.

from ase import Atoms, Atom
from jasp import *
import numpy as np
import json
np.set_printoptions(precision=3, suppress=True)

co = Atoms([Atom('C',[0,   0, 0]),
            Atom('O',[1.2, 0, 0])],
            cell=(6., 6., 6.))

with jasp('molecules/simple-co', #output dir
          xc='PBE',  # the exchange-correlation functional
          nbands=6,  # number of bands
          encut=350, # planewave cutoff
          ismear=1,  # Methfessel-Paxton smearing
          sigma=0.01,# very small smearing factor for a molecule
          atoms=co) as calc:
    print 'energy = {0} eV'.format(co.get_potential_energy())
    print co.get_forces()
    with open('JSON', 'wb') as f:
        f.write(json.dumps(calc.dict))
    os.system('git add JSON')

energy = -14.687906 eV
[[ 5.095  0.     0.   ]
 [-5.095  0.     0.   ]]

2 A bulk calculation

Now we run a bulk calculation

from jasp import *

from ase import Atom, Atoms

atoms = Atoms([Atom('Cu',  [0.000,      0.000,      0.000])],
              cell=  [[ 1.818,  0.000,  1.818],
                      [ 1.818,  1.818,  0.000],
                      [ 0.000,  1.818,  1.818]])

with jasp('bulk/alloy/cu',
          xc='PBE',
          encut=350,
          kpts=(13,13,13),
          nbands=9,
          ibrion=2,
          isif=4,
          nsw=10,
          atoms=atoms) as calc:
    print atoms.get_potential_energy()
    with open('JSON', 'wb') as f:
        f.write(json.dumps(calc.dict))
    os.system('git add JSON')

-3.723306

3 Analysis via the JSON files

This analysis is independent of jasp and therefore is portable

We can retrieve the bulk data

import json

with open('bulk/alloy/cu/JSON', 'rb') as f:
    d = json.loads(f.read())
    print d['data']['total_energy']

-3.723306

Or the molecule data:

import json

with open('molecules/simple-co/JSON', 'rb') as f:
    d = json.loads(f.read())
    print d['data']['total_energy']

-14.687906

4 Create the archive file

As you do your work, you add and commit files as needed. For this project all that needs to be shared are the JSON files, and the scripts (which are in this document) that we used to run the calculations and do the analysis. If we are satisfied with the state of the git repository, we create an archive like this:

git archive --format zip HEAD -o archive.zip

Here is the result: archive.zip .

You can download the zip file, unzip it, and rerun the analysis to extract the total energies on any system with a modern Python installation.

5 Summary

This seems to be an easy way to share data from a single project, i.e. a single git repository. It isn't obvious how you would package data from multiple projects, or how you would run multiple projects in a single directory.

org-mode source

Discuss on Twitter

The Kitchin Research Group

Chemical Engineering at Carnegie Mellon University

Another approach to embedding org-source in html

Git archives for data sharing

Table of Contents

1 A molecule

2 A bulk calculation

3 Analysis via the JSON files

4 Create the archive file

5 Summary