The Kitchin Research Group

An xml representation of an org document for indexing with swish-e

Posted July 04, 2015 at 11:49 AM | categories: emacs, search | tags:

Updated July 04, 2015 at 07:34 PM

Swish-e can index xml data, and enable searching by tag. Here we push our org-mode indexing idea a little further. Initially we indexed org files as text. Then, we exported it to html, and indexed the html. That enabled some richer searching. Now, we will create an xml representation of the org file for indexing. This will enable us to use a custom tag system and search for specific text in tables, or src-blocks, or in headlines, or for headlines with certain tags, todo state or properties.

Incidentally, this is a general strategy for indexing arbitrary files. You just make an xml representation of the file containing the data to be indexed, and use swish-e to index that xml.

Let us start with code to generate xml. I adapted this from some code in Land Of Lisp . First, a function that simply prints a tag with attributes.

(defun print-tag (name attrs &optional closingp)
  "Print an xml tag with symbol NAME and ATTRS (a cons list of (attribute . value)).
if CLOSINGP print the closing tag instead."
  (format
   "<%s%s%s>"
   (if closingp "/" "")
   name
   (if (and attrs (not closingp))
       (concat
        " "
        (mapconcat
         (lambda (x)
           (format "%s=\"%s\""
                   (car x)
                   (xml-escape-string (cdr x))))
         attrs
         " "))
     "")))

(print-tag 'html '((color . "blue") (label . "test")))

<html color="blue" label="test">

XML tags almost always come in pairs. We define a macro to make this happen here. The macro prints the opening tag, evaluates the body, and prints the closing body. Note that the body may contain other tags, or a string. The string should be escaped to avoid illegal xml characters.

(defmacro tag (name attributes &rest body)
  `(format "%s%s%s"
           (print-tag ,name ,attributes nil)
           (concat
           ,@body)
           (print-tag ,name nil t)))

;; example usage
(tag "xml" '((test . "id"))
     (tag "body" nil
          (tag "p" nil (xml-escape-string "paragraph & < 1"))
          (tag "p" nil "paragraph 2")))

<xml test="id"><body><p>paragraph &amp; &lt; 1</p><p>paragraph 2</p></body></xml>

Now, we can use this to get an xml representation of the source blocks, e.g.

(mapconcat 'identity
           (org-element-map
               (org-element-parse-buffer)
               'src-block
             (lambda (element)
               (tag
                'src-block
                `((language . ,(org-element-property :language element)))
                (tag 'contents ()
                     (xml-escape-string
                      (org-element-property :value element))))))
           "")

<src-block language="emacs-lisp"><contents>(defun print-tag (name attrs &amp;optional closingp)
  &quot;Print an xml tag with symbol NAME and ATTRS (a cons list of (attribute . value)).
if CLOSINGP print the closing tag instead.&quot;
  (format
   &quot;&lt;%s%s%s&gt;&quot;
   (if closingp &quot;/&quot; &quot;&quot;)
   name
   (if (and attrs (not closingp))
       (concat
	&quot; &quot;
	(mapconcat
	 (lambda (x)
	   (format &quot;%s=\&quot;%s\&quot;&quot;
		   (car x)
		   (xml-escape-string (cdr x))))
	 attrs
	 &quot; &quot;))
     &quot;&quot;)))

(print-tag &apos;html &apos;((color . &quot;blue&quot;) (label . &quot;test&quot;)))
</contents></src-block><src-block language="emacs-lisp"><contents>(defmacro tag (name attributes &amp;rest body)
  `(format &quot;%s%s%s&quot;
	   (print-tag ,name ,attributes nil)
           (concat
	   ,@body)
	   (print-tag ,name nil t)))

(tag &quot;xml&quot; &apos;((test . &quot;id&quot;))
     (tag &quot;body&quot; nil
	  (tag &quot;p&quot; nil (xml-escape-string &quot;paragraph &amp; &lt; 1&quot;))
	  (tag &quot;p&quot; nil &quot;paragraph 2&quot;)))
</contents></src-block><src-block language="emacs-lisp"><contents>(mapconcat &apos;identity
	   (org-element-map
	       (org-element-parse-buffer)
	       &apos;src-block
	     (lambda (element)
	       (tag
		&apos;src-block
		`((language . ,(org-element-property :language element)))
		(tag &apos;contents ()
		     (xml-escape-string
		      (org-element-property :value element))))))
	   &quot;&quot;)
</contents></src-block><src-block language="emacs-lisp"><contents>(let ((xml (tag &apos;root `((filename . ,(buffer-file-name))
			(indexed-on . ,(current-time-string)))
		;; map the headlines
		(mapconcat
		 &apos;identity
		 (org-map-entries
		  (lambda ()
		    (let* ((tags (org-get-tags))
			   (heading-components (org-heading-components))
			   (title (nth 4 heading-components))
			   (level (nth 0 heading-components))
			   (properties (org-entry-properties))
			   (elem (org-element-at-point))
			   (bp (org-element-property :contents-begin elem))
			   (ep (org-element-property :contents-end elem))
			   (content (buffer-substring bp ep)))
		      (tag &apos;heading `((level . ,level))
			   (tag &apos;title () (xml-escape-string title))
			   (tag &apos;tags () (mapconcat &apos;identity tags &quot; &quot;))
			   (tag &apos;properties ()
				(mapconcat
				 (lambda (x)
				   (tag &apos;property `((label . (car ,x))) (cdr x)))
				 properties
				 &quot;&quot;))
			   (tag &apos;content ()
				(format &quot;%s&quot; (xml-escape-string content)))))))
		 &quot;&quot;)

		;; map specific element types
		(tag &apos;source-blocks ()
		     (mapconcat
		      &apos;identity
		      (org-element-map
			  (org-element-parse-buffer)
			  &apos;src-block
			(lambda (element)
			  (tag &apos;src-block
			       `((language .
					   ,(org-element-property
					     :language element)))
			       (tag &apos;contents ()
				    (xml-escape-string
				     (org-element-property :value element)))))) &quot;&quot;))

		(tag &apos;tables ()
		     (mapconcat
		      &apos;identity
		      (org-element-map
			  (org-element-parse-buffer)
			  &apos;table
			(lambda (element)
			  (tag &apos;table ()
			       (when (org-element-property :caption element)
				 (tag &apos;caption ()
				(caaar (org-element-property :caption element))))
			       (xml-escape-string
				(buffer-substring
				 (org-element-property :contents-begin element)
				 (org-element-property :contents-end element))))))
		      &quot;&quot;))

		(tag &apos;paragraphs ()
		     (mapconcat
		      &apos;identity
		      (org-element-map
			  (org-element-parse-buffer)
			  &apos;paragraph
			(lambda (element)
			  (tag &apos;paragraph ()
			       (xml-escape-string
				(buffer-substring
				 (org-element-property :contents-begin element)
				 (org-element-property :contents-end element))))))
		      &quot;&quot;
		      ))
		)))
  (with-temp-file &quot;org2xml.xml&quot;
    (insert xml)))
</contents></src-block><src-block language="emacs-lisp"><contents>(xml-parse-file &quot;org2xml.xml&quot;)
</contents></src-block>

So, finally we can map the entries to get some information about them, e.g. the tags, properties, todo state, etc… Then we create xml representing all that information so we can have a more precise search. Instead of looking for a word, we can specify that the word be in a property for example. Then, we make xml representations of the tables, src-blocks and paragraphs.

I am going to follow the example here that we worked out before on html and create a filter function that takes an org-file and spits out xml at the command line.

:;exec emacs -batch -l $0 -f main "$@"
(require 'org)
(require 'xml)

(defun print-tag (name attrs &optional closingp)
  "Print an xml tag with symbol NAME and ATTRS (a cons list of (attribute . value)).
if CLOSINGP print the closing tag instead.
You should use `xml-escape-string' on text going into the attributes to avoid errors."
  (format
   "<%s%s%s>"
   (if closingp "/" "")
   name
   (if (and attrs (not closingp))
       (concat
        " "
        (mapconcat
         (lambda (x)
           (format "%s=\"%s\"" (car x) (cdr x)))
           attrs
           " "))
     "")))

(defmacro tag (name attributes &rest body)
  `(format "%s%s%s"
           (print-tag ,name ,attributes nil)
           (concat
           ,@body)
           (print-tag ,name nil t)))

(defun main ()
  (find-file (car command-line-args-left))
  (princ (tag 'root `((filename . ,(buffer-file-name))
                      (indexed-on . ,(current-time-string)))
              ;; map the headlines
              (mapconcat
               'identity
               (org-map-entries
                (lambda ()
                  (let* ((tags (org-get-tags))
                         (heading-components (org-heading-components))
                         (todo (nth 2 heading-components))
                         (headline (nth 4 heading-components))
                         (thislevel (nth 0 heading-components))
                         (properties (org-entry-properties)))
                    (tag 'heading `((level . ,thislevel))
                         (tag 'headline () (xml-escape-string headline))
                         (tag 'tags () (mapconcat 'identity tags " "))
                         (when todo
                           (tag 'todo () todo))
                         (tag 'properties ()
                              (mapconcat
                               (lambda (x)
                                 (tag 'property `((name . ,(xml-escape-string (car x))))
                                      (xml-escape-string (cdr x))))
                               properties
                               ""))))))
               "")

              ;; get file keywords, TITLE, authors, etc...
              (tag 'file-keywords ()
                   (mapconcat 'identity
                              (org-element-map (org-element-parse-buffer 'element) 'keyword
                                (lambda (keyword)
                                  (tag (xml-escape-string (org-element-property :key keyword)) ()
                                       (xml-escape-string (org-element-property :value keyword)))))
                              ""))

              ;; map specific element types
              (tag 'source-blocks ()
                   (mapconcat
                    'identity
                    (org-element-map
                        (org-element-parse-buffer)
                        'src-block
                      (lambda (element)
                        (tag 'src-block
                             `((language .
                                         ,(org-element-property
                                           :language element)))
                             (tag 'contents ()
                                  (xml-escape-string
                                   (org-element-property :value element)))))) ""))

              (tag 'tables ()
                   (mapconcat
                    'identity
                    (org-element-map
                        (org-element-parse-buffer)
                        'table
                      (lambda (element)
                        (tag 'table ()
                             (when (org-element-property :caption element)
                               (tag 'caption ()
                                    (format
                                     "%s"
                                     (org-element-property
                                      :caption element))))
                             (xml-escape-string
                              (buffer-substring
                               (org-element-property :contents-begin element)
                               (org-element-property :contents-end element))))))
                    ""))

              (tag 'paragraphs ()
                   (mapconcat
                    'identity
                    (org-element-map
                        (org-element-parse-buffer)
                        'paragraph
                      (lambda (element)
                        (tag 'paragraph ()
                             (xml-escape-string
                              (buffer-substring
                               (org-element-property :contents-begin element)
                               (org-element-property :contents-end element))))))
                    ""
                    )))))

We could do more, e.g. links, or images, but this is pretty good for now. Now, let's configure a swish indexer. We instruct swish-e to use some metanames, and attributes so we can search on them later.

# Example configuration file

# Tell Swish-e what to directories to index
IndexDir /Users/jkitchin/blogofile-jkitchin.github.com/_site

# where to save the index
IndexFile /Users/jkitchin/blogofile-jkitchin.github.com/_blog/index-org2xml.swish-e

# What to index
IndexOnly .org

# Tell Swish-e that .txt files are to use the HTML parser.
IndexContents XML* .org

FileFilter .org /Users/jkitchin/blogofile-jkitchin.github.com/_blog/org2xml.el

# index all tags for searching
UndefinedMetaTags auto
UndefinedXMLAttributes auto

And now, run the index command. I did this at the command line. There might be some problems with the script as there were some warnings about non-zero exits, but there was only a few so we ignore them for now.

swish-e -c swish-org2xml.conf

1 Examples of searching for org-files

1.1 Files with words in the filename

Here we look for filenames with the word "Extracting" in them.

swish-e -f index-org2xml.swish-e -w root.filename=Extracting

# SWISH format: 2.4.7
# Search words: root.filename=Extracting
# Removed stopwords:
# Number of hits: 2
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2014/02/19/Extracting-bibtex-file-from-an-org-buffer.org "Extracting-bibtex-file-from-an-org-buffer.org" 6094
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/media/2014-02-19-Extracting-bibtex-file-from-an-org-buffer/notes.org "notes.org" 195515
.

Or, thanks to the date being in the path, we can find by year, How about July of 2012?

swish-e -f index-org2xml.swish-e -w root.filename="(2012/07)"

# SWISH format: 2.4.7
# Search words: root.filename=(2012/07)
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2012/07/15/Professor-Kitchin-was-awarded-the-Presidential-Early-Career-Award-for-Scientists-and-Engineers-(PECASE).org "Professor-Kitchin-was-awarded-the-Presidential-Early-Career-Award-for-Scientists-and-Engineers-(PECASE).org" 311
.

Interesting we have to use the parentheses here.

1.2 DONE Files with headlines containing a word

Now, lets find documents with "Compiled" in a heading title with level=2

swish-e -f index-org2xml.swish-e -w heading.level=2 title=Compiled -m5

# SWISH format: 2.4.7
# Search words: heading.level=2 title=Compiled
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/media/2014-07-12-Org-mode-is-awesome/why-org-mode.org "why-org-mode.org" 13522
.

1.3 Headlines marked TODO

We can find documents with headlines marked TODO:

swish-e -f index-org2xml.swish-e  -w "todo=TODO" -m 5

# SWISH format: 2.4.7
# Search words: todo=TODO
# Removed stopwords:
# Number of hits: 12
# Search time: 0.000 seconds
# Run time: 0.008 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/media/2014-01-27-Clocking-your-time-in-org-mode/blog.org "blog.org" 134160
624 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2014/02/16/A-dynamic-snippet-for-a-task-due-7-days-from-now.org "A-dynamic-snippet-for-a-task-due-7-days-from-now.org" 2587
425 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2014/02/16/END.org "END.org" 1531
269 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2015/02/01/Handling-multiple-selections-in-helm.org "Handling-multiple-selections-in-helm.org" 3290
269 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2015/01/30/More-adventures-in-helm---more-than-one-action.org "More-adventures-in-helm---more-than-one-action.org" 3236
.

1.4 For a table

so2-capacity-1

swish-e -f index-org2xml.swish-e -w table="energy"

# SWISH format: 2.4.7
# Search words: table=energy
# Removed stopwords:
# Number of hits: 2
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2014/08/21/Using-org-entries-like-a-database.org "Using-org-entries-like-a-database.org" 53035
633 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/07/04/Estimating-uncertainties-in-equations-of-state.org "Estimating-uncertainties-in-equations-of-state.org" 3117
.

1.5 Tagged headlines

Find entries with a "slide" tag.

swish-e -f index-org2xml.swish-e -w "tags=slide"

# SWISH format: 2.4.7
# Search words: tags=slide
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.009 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/media/2014-07-12-Org-mode-is-awesome/why-org-mode.org "why-org-mode.org" 13522
.

Evidently there is one file where I talk about slides in org-show.

1.6 Headlines with a property

Here I find documents with headlines that have thermodynamics in the property "categories".

swish-e -f index-org2xml.swish-e -w "property.label=categories property=thermodynamics"

# SWISH format: 2.4.7
# Search words: property.label=categories property=thermodynamics
# Removed stopwords:
# Number of hits: 10
# Search time: 0.000 seconds
# Run time: 0.009 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/02/01/Water-gas-shift-equilibria-via-the-NIST-Webbook.org "Water-gas-shift-equilibria-via-the-NIST-Webbook.org" 10789
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/03/01/Gibbs-energy-minimization-and-the-NIST-webbook.org "Gibbs-energy-minimization-and-the-NIST-webbook.org" 5441
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/03/01/Finding-equilibrium-composition-by-direct-minimization-of-Gibbs-free-energy-on-mole-numbers.org "Finding-equilibrium-composition-by-direct-minimization-of-Gibbs-free-energy-on-mole-numbers.org" 6155
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/02/27/Reading-parameter-database-text-files-in-python.org "Reading-parameter-database-text-files-in-python.org" 3947
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/02/18/The-Gibbs-free-energy-of-a-reacting-mixture-and-the-equilibrium-composition.org "The-Gibbs-free-energy-of-a-reacting-mixture-and-the-equilibrium-composition.org" 8230
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/02/18/Calculating-a-bubble-point-pressure-of-a-mixture.org "Calculating-a-bubble-point-pressure-of-a-mixture.org" 3203
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/02/15/The-equal-area-method-for-the-van-der-Waals-equation.org "The-equal-area-method-for-the-van-der-Waals-equation.org" 5737
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/02/12/Using-constrained-optimization-to-find-the-amount-of-each-phase-present.org "Using-constrained-optimization-to-find-the-amount-of-each-phase-present.org" 5210
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/02/05/Constrained-minimization-to-find-equilibrium-compositions.org "Constrained-minimization-to-find-equilibrium-compositions.org" 5666
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2014/09/23/Generating-an-atomic-stoichiometric-matrix.org "Generating-an-atomic-stoichiometric-matrix.org" 3487
.

That seems about right, according to http://kitchingroup.cheme.cmu.edu/categories.html there are 9 documents. I am not sure why they don't totally agree, but I can live with it.

Here are documents containing headlines with the property "TOTAL_ENERGY"

swish-e -f index-org2xml.swish-e -w property.label=TOTAL_ENERGY

# SWISH format: 2.4.7
# Search words: property.label=TOTAL_ENERGY
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.008 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2014/08/21/Using-org-entries-like-a-database.org "Using-org-entries-like-a-database.org" 53035
.

1.7 Documents with a Python source block containing a word

Find org files with diffusion in a python source block.

swish-e -f index-org2xml.swish-e -w src-block.language=python -w src-block=diffusion

# SWISH format: 2.4.7
# Search words: src-block.language=python src-block=diffusion
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.011 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2013/04/02/Transient-diffusion---partial-differential-equations.org "Transient-diffusion---partial-differential-equations.org" 3660
.

1.8 An org-file with a UUID

swish-e -f index-org2xml.swish-e -w  property="(38FCCF3D-7FC5-49BF-BB77-486BBAA17CD9)"

# SWISH format: 2.4.7
# Search words: property=(38FCCF3D-7FC5-49BF-BB77-486BBAA17CD9)
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2014/11/23/Machine-gradable-quizzes-in-emacs+org-modex.org "Machine-gradable-quizzes-in-emacs+org-modex.org" 5743
.

Interesting, again the parentheses are necessary to find a match. I think because of the dashes. The next example is similar, but finds an entry with that bibtex key in a CUSTOM_ID property.

swish-e -f index-org2xml.swish-e -w  property="(mantina-2008-first-princ)"

# SWISH format: 2.4.7
# Search words: property=(mantina-2008-first-princ)
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.010 seconds
1000 /Users/jkitchin/blogofile-jkitchin.github.com/_site/media/2014-02-19-Extracting-bibtex-file-from-an-org-buffer/notes.org "notes.org" 195515
.

2 Summary

This is pretty cool. There are still some bugs to work out in the indexing filter I think, but this demonstrates you can index org-files, and have pretty refined searches to find your files. There is still some thinking to do on how to schedule an incremental indexing, and whether we need more or better metanames. The indexing is not fast, but that is probably because I am running this through a FileFilter, rather than the -s prog option in swish-e. This is super promising to me though. Imagine building an agenda from files found with TODO headlines in them; a global todo list! Or, grabbing contacts from wherever they are. No more losing files you have not used in a while. Find all documents containing a citation. With some extra work, you could index links, citations, chemical formulas , or other types of identifiable content.

The logical conclusion of this work might be an ox-swish-e-xml export engine to render the org-file into xml, rather than the script I used here. It would be really great to get some refined output, e.g. rather than just get matching documents, get location information so you could open the document to the matching element. That might be out of reach for swish-e, but could be in reach for other programs like Sphinx that are more integrated with a database. There is a very interesting project here: https://github.com/wvxvw/sphinx-mode to integrate org-mode with the Sphinx search (http://sphinxsearch.com ) engine.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Using swish-e to index org files as html

Posted July 03, 2015 at 10:13 AM | categories: emacs, search | tags:

When we wrote about using swish-e before , we just indexed the org files as text. This worked pretty well, but we lost some resolution, e.g. being able to search for text in a headline. that is more possible if we index html or xml. So, here we try indexing the org files as html. It will be slower to index because we will filter each org file through a command that exports it to html, but hopefully it will be worth it for the enhanced search capability.

We will need a filter shell command that takes an org-file and spits out html. This command is shown as an emacs-lisp script here. This is a pretty bare bones export, and would lack the export of all my custom links from org-ref. I tried this, but org-ref outputs a lot of stuff to stdout when it loads, and unless I can figure out how to suppress that I don't want it here for now.

:;exec emacs -batch -l $0 -f main "$@"
(require 'org)
;(add-to-list 'load-path "/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa")
;(add-to-list 'load-path "/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref")
;(setq package-user-dir "/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa")
;(package-initialize)
;(require 'org-ref)
(defun main ()
  (find-file (car command-line-args-left))
  (org-html-export-as-html nil nil nil t)
  (switch-to-buffer "*Org HTML Export*")
  (print (buffer-string)))

;; Local Variables:
;; mode: emacs-lisp
;; End:

We try it out here:

./org2html.el index-org-as-html.org

"<div id=\"table-of-contents\">
<h2>Table of Contents</h2>
<div id=\"text-table-of-contents\">
<ul>
<li><a href=\"#sec-1\">1. Using swish-e to index org files as html</a></li>
</ul>
</div>
</div>
<div id=\"outline-container-sec-1\" class=\"outline-2\">
<h2 id=\"sec-1\"><span class=\"section-number-2\">1</span> Using swish-e to index org files as html</h2>
<div class=\"outline-text-2\" id=\"text-1\">
<p>
When we wrote about using swish-e <a href=\"http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/\">before</a>, we just indexed the org files as text. This worked pretty well, but we lost some resolution, e.g. being able to search for text in a headline. that is more possible if we index html or xml. So, here we try indexing the org files as html. It will be slower to index because we will filter each org file through a command that exports it to html, but hopefully it will be worth it for the enhanced search capability.
</p>

<p>
We will need a filter shell command that takes an org-file and spits out html. This command is shown as an emacs-lisp script here. This is a pretty bare bones export, and would lack the export of all my custom links
</p>

<p>
cite:dauenhauer-2006-renew
</p>

<div class=\"org-src-container\">

<pre class=\"src src-emacs-lisp\">:;exec emacs -batch -l $0 -f main \"$@\"
(require 'org)
;(add-to-list 'load-path \"/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa\")
;(add-to-list 'load-path \"/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref\")
;(setq package-user-dir \"/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa\")
;(package-initialize)
;(require 'org-ref)
(defun main ()
  (find-file (car command-line-args-left))
  (org-html-export-as-html nil nil nil t)
  (switch-to-buffer \"*Org HTML Export*\")
  (print (buffer-string)))

;; Local Variables:
;; mode: emacs-lisp
;; End:
</pre>
</div>


<div class=\"org-src-container\">

<pre class=\"src src-sh\">./org2html.el index-org-as-html.org
</pre>
</div>

<div class=\"org-src-container\">

<pre class=\"src src-text\"># Example configuration file

# Tell Swish-e what to directories to index
IndexDir /Users/jkitchin/blogofile-jkitchin.github.com

# where to save the index
IndexFile /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/index.swish-e

# What to index
IndexOnly .org

# Tell Swish-e that .txt files are to use the text parser.
IndexContents TXT* .org

FileFilter .org /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/org2html.el

# Ask libxml2 to report any parsing errors and warnings or
# any UTF-8 to 8859-1 conversion errors
ParserWarnLevel 9
</pre>
</div>
</div>
</div>
"

I think that looks good. Now, let's configure a swish indexer.

# Example configuration file

# Tell Swish-e what to directories to index
IndexDir /Users/jkitchin/blogofile-jkitchin.github.com

# where to save the index
IndexFile /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/index.swish-e

# What to index
IndexOnly .org

# Tell Swish-e that .txt files are to use the HTML parser.
IndexContents HTML* .org

FileFilter .org /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/org2html.el

# Ask libxml2 to report any parsing errors and warnings or
# any UTF-8 to 8859-1 conversion errors
ParserWarnLevel 9

MetaNames class swishtitle
HTMLLinksMetaName links

PropertyNames author subjects

StoreDescription HTML <body>

And now, run the index command. I did this at the command line. A lot of output! mostly not being able to fontify source blocks because htmlize was not on the path, and a bunch of attribute parsing errors, and a few utf-8 errors.

swish-e -c swish-org-html.conf

And a test search for files with "selector" in a headline.

swish-e -f index.swish-e -x '%r\t%p\n' -w selector -t h

# SWISH format: 2.4.7
# Search words: selector
# Removed stopwords:
# Number of hits: 4
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000	/Users/jkitchin/blogofile-jkitchin.github.com/org/2015/03/14/A-helm-mu4e-contact-selector.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2015/03/14/A-helm-mu4e-contact-selector.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_deploy/org/2015/03/14/A-helm-mu4e-contact-selector.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_blog/blog-2014.org
.

A phrase in a headline.

swish-e -f index.swish-e -x '%r\t%p\n' -w "information for all documents" -t h

# SWISH format: 2.4.7
# Search words: information for all documents
# Removed stopwords:
# Number of hits: 5
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_blog/blog.org
921	/Users/jkitchin/blogofile-jkitchin.github.com/_blog/blog-2014.org
794	/Users/jkitchin/blogofile-jkitchin.github.com/org/2015/04/03/Getting-data-from-the-Scopus-API.org
794	/Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2015/04/03/Getting-data-from-the-Scopus-API.org
794	/Users/jkitchin/blogofile-jkitchin.github.com/_deploy/org/2015/04/03/Getting-data-from-the-Scopus-API.org
.

Sweet. How about all documents containing this citation:

swish-e -f index.swish-e -x '%r\t%p\n' -w cite:kitchin-2004-modif-pt

# SWISH format: 2.4.7
# Search words: cite:kitchin-2004-modif-pt
# Removed stopwords:
# Number of hits: 3
# Search time: 0.000 seconds
# Run time: 0.008 seconds
1000	/Users/jkitchin/blogofile-jkitchin.github.com/media/2014-02-19-Extracting-bibtex-file-from-an-org-buffer/notes.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_site/media/2014-02-19-Extracting-bibtex-file-from-an-org-buffer/notes.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_deploy/media/2014-02-19-Extracting-bibtex-file-from-an-org-buffer/notes.org
.

Super nice.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Pyparsing meets Emacs to find chemical formulas

Posted July 02, 2015 at 12:22 PM | categories: emacs, python | tags:

Updated July 02, 2015 at 12:38 PM

see the video: https://www.youtube.com/watch?v=sjxS9m8QCoo

Today we expand the concepts of clickable text and merge an idea from Python with Emacs. Here we will use Python to find chemical formulas in the buffer, and then highlight them with Emacs. We will use pyparsing to find the chemical formulas and then use them to create a pattern for button-lock. I chose this approach because regular expressions are hard to use on the most general kinds of chemical formulas, and a (possibly recursive) parser should be better equipped to handle this. I adapted an example grammar to match simple chemical formulas, i.e. ones that do not have any parentheses, or charges different than + or -. I think something like this could be done in Emacs, but I am not as familiar with this kind of parsing in Emacs.

Basically, we treat a formula as a group of one or more Elements that have an optional number following them. Spoiler alert: This mostly works, but in the end I conclude there is a clear benefit to a markup language for chemical formulas. Here is an example usage of a parser:

# adapted from [[https://pyparsing.wikispaces.com/file/view/chemicalFormulas.py/31041705/chemicalFormulas.py]]

from pyparsing import *

element = oneOf( """H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No""" )

integer = Word(nums)
elementRef = Group(element + Optional(integer))
chemicalFormula = (WordStart(alphas.upper())
                   + OneOrMore(elementRef).leaveWhitespace()
                   + Optional(Or([Literal("-"),
                                  Literal("+")]))
                   + WordEnd(alphas + nums + "-+"))


s = '''Water is  H2O or OH2  not h2O, methane is CH4 and of course there is PtCl4.
What about H+ and OH-? and carbon or Carbon or H2SO4?

Is this C6H6? or C2H5OH?

and a lot of elements:
H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No'''

matches = []
for match, start, stop in chemicalFormula.scanString(s):
   matches.append(s[start:stop])

print sorted(matches, key=lambda x: len(x), reverse=True)

['C2H5OH', 'PtCl4', 'H2SO4', 'C6H6', 'H2O', 'OH2', 'CH4', 'OH-', 'Uub', 'Uut', 'Uuq', 'Uup', 'Uuh', 'Uus', 'Uuo', 'H+', 'He', 'Li', 'Be', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'Cl', 'Ar', 'Ca', 'Sc', 'Ti', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'Xe', 'Cs', 'Ba', 'Lu', 'Hf', 'Ta', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Lr', 'Rf', 'Db', 'Sg', 'Bh', 'Hs', 'Mt', 'Ds', 'Rg', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Ac', 'Th', 'Pa', 'Np', 'Pu', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'O', 'H', 'B', 'C', 'N', 'O', 'F', 'P', 'S', 'K', 'V', 'Y', 'I', 'W', 'U']

That is pretty good. If the string was actually our buffer, we could use those to create a regexp to put text-properties on them. The trick is how to get the buffer string to the Python function, and then get back usable information in lisp. We actually explored this before ! Rather than use that, we will just create the lisp output manually since this is a simple list of strings.

The first thing we should do is work out a Python script that will output the lisp results we want, which are the found formulas (I tried getting the start and stop positions, but I don't think they map onto the buffer positions very well). Here it is. We set it up as a command line tool that takes a string. We use set to get a unique list, then sort the list by length so we try matching the longest patterns first. There are a few subtle differences in this script and the example above because of some odd false hits I unsuccessfully tried to get rid of.

import sys
from pyparsing import *

element_string =  """H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No"""
element = oneOf([x for x in element_string.split()])

integer = Word(nums)
elementRef = Group(element + Optional(integer))
chemicalFormula = (WordStart(alphas.upper()).leaveWhitespace()
                   + OneOrMore(elementRef).leaveWhitespace()
                   + Optional(Or([Literal("-"),
                                  Literal("+")])).leaveWhitespace()
                   + WordEnd(alphas + alphas.lower() + nums + "-+").leaveWhitespace())

s = sys.stdin.read().strip()

matches = []
for match, start, stop in chemicalFormula.scanString(s):
   matches.append(s[start:stop])
matches = list(set(matches))
matches.sort(key=lambda x: len(x), reverse=True)

print "'(" + ' '.join(["\"{}\"".format(m) for m in matches]) + ')'

Now we can test this:

echo "Water is H2O, methane is CH4 and of course PtCl4, what about H+ and OH-? and carbon or Carbon. Water is H2O not h2o or mH2o, methane is CH4 and of course PtCl4, what about H+ and OH-? carbon, Carbon and SRC, or H2SO4? Is this C6H6? Ethanol is C2H5OH in a sentence.

 C2H5OH firs con

This is CH3OH

H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No
" | ./parse_chemical_formulas.py

'("C2H5OH" "CH3OH" "PtCl4" "H2SO4" "C6H6" "CH4" "OH-" "Uub" "Uuq" "Uup" "Uus" "Uuo" "Uuh" "H2O" "Uut" "Ru" "Re" "Rf" "Rg" "Ra" "Rb" "Rn" "Rh" "Be" "Ba" "Bh" "Bi" "Bk" "Br" "Ho" "Os" "Es" "Hg" "Ge" "Gd" "Ga" "Pr" "Pt" "Pu" "Pb" "Pa" "Pd" "Cd" "Po" "Pm" "Hs" "Hf" "He" "Md" "Mg" "Mo" "Mn" "Mt" "Zn" "H+" "Eu" "Zr" "Er" "Ni" "No" "Na" "Nb" "Nd" "Ne" "Np" "Fr" "Fe" "Fm" "Sr" "Kr" "Si" "Sn" "Sm" "Sc" "Sb" "Sg" "Se" "Co" "Cm" "Cl" "Ca" "Cf" "Ce" "Xe" "Tm" "Cs" "Cr" "Cu" "La" "Li" "Tl" "Lu" "Lr" "Th" "Ti" "Te" "Tb" "Tc" "Ta" "Yb" "Db" "Dy" "Ds" "Ac" "Ag" "Ir" "Am" "Al" "As" "Ar" "Au" "At" "In" "H" "P" "C" "K" "O" "S" "W" "B" "F" "N" "V" "I" "U" "Y")

That seems to work great. Now, we have a list of chemical formulas. Now, the Emacs side to call that function. We do not use regexp-opt here because I found it optimizes too much, and doesn't always match the formulas. We want explicit matches on each formula.

(defun shell-command-on-region-to-string (start end command)
  (with-output-to-string
    (shell-command-on-region start end command standard-output)))

(read (shell-command-on-region-to-string
        (point-min) (point-max)
        "./parse_chemical_formulas.py"))

quote

(C2H5OH ext; t CH3OH PtCl4 H2SO4 the fir C6H6 CH4 OH- OH2 Uub co Uuq Uup Uus Uuo Uuh ord H2O Uut Ru Re Rf Rg Ra Rb Rn Rh Be Ba Bh Bi Bk Br Ho Os Es Hg Ge Gd Ga Pr t Pt Pu Pb Pa Pd Cd Po Pm Hs Hf He Md Mg Mo Mn Mt Zn H+ Eu Zr Er Ni No Na Nb Nd Ne Np Fr Fe Fm Sr Kr Si Sn Sm Sc Sb Sg Se Co Cm Cl Ca Cf Ce Xe Tm Cs Cr Cu La Li Tl Lu Lr Th Ti Te Tb Tc as Ta Yb Db Dy Ds In Ac Ag Ir Am Al As Ar Au At n H P l t C r K O S W w B F N V I U Y e i)

That is certainly less than perfect, you can see a few false hits that are not too easy to understand, e.g. why is "fir" or "the " or "as" in the list? They don't even start with an uppercase letter. One day maybe I will figure it out. I assume it is a logic flaw in my parser. Until then, let's go ahead and make the text functional, so it looks up the formula in the NIST webbook. The regexp is a little funny, we have to add word-boundaries to each formula to avoid some funny, bad matches.

(defvar chemical-formula-button nil "store button for removal later.")

(require 'nist-webbook)
(setq chemical-formula-button
      (button-lock-set-button
       (mapconcat
        (lambda (formula)
          (concat "\\<" (regexp-quote formula) "\\>"))
        (eval (read (shell-command-on-region-to-string
                     (point-min) (point-max)
                     "./parse_chemical_formulas.py")))
        "\\|")
       (lambda () (interactive)
         (nist-webbook-formula
          (get-surrounding-text-with-property
           'chemical-formula)))
       :face '((:underline t) (:background "gray80"))
       :help-echo "A chemical formula"
       :additional-property 'chemical-formula))

Here are a few tests: CH4, C2H5OH, C6H6. C(CH3)4. C6H6 is benzene. As you can see our pattern lacks context; the first word of the sentence is "as" not the symbol for arsenic. Also, our parser does not consider formulas with parentheses in them. Whenever I refer to myself, I mean myself, and not the element iodine. There are a few weird matchs I just don't understand, like firs d t x rn lac? These do not seem to match anything, and I wonder how they are getting in the list. I think this really shows that it would be useful to use some light markup for chemical formulas which would a) provide context, and b) enhance parsing accuracy. In LaTeX you would use \ce{I} to indicate that is iodine, and not a reference to myself. That is more clear than saying I use I in chemical reactions ;) And it also clarifies sentences like the letter W is used to represent tungsten as the symbol \ce{W}.

Nevertheless, we can click on the formulas, and get something to happen that is potentially useful. Is this actually useful? Conceptually yes, I think it could be, but clearly the parsing is not recognizing formulas perfectly. Sending the buffer to a dedicated program that can return a list of matches to highlight in Emacs is a good idea, especially if it is not easy to build in Emacs, or if a proven solution already exists.

Finally, we can remove the highlighted text like this. That was the reason for saving the button earlier!

(when chemical-formula-button
  (button-lock-unset-button chemical-formula-button)
  (setq chemical-formula-button nil))

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Spoken translations in Emacs

Posted July 01, 2015 at 11:42 AM | categories: speech, emacs | tags:

Finally, continuing our experiments with computer speech for fun, let us try a translation of text to another language that is then spoken. Here is a free translator that has the courtesy to reply with json with the translated text in it. http://mymemory.translated.net/api/get?q=Hello%20World!&langpair=en|de I had to download a German voice called Anna, then get some translated text.

As with previous posts, there is a video: https://www.youtube.com/watch?v=8CBKnahE0ak . I am trying ScreenFlow for these (instead of Camtasia), and I still have not quite mastered the aspect ratio, so the videos still look a little odd.

As a reminder, we have this easy way to speak text in applescript. If you are on Linux, check out Festival and on windows you may find some inspiration here .

(do-applescript "say \"Hello. My name is John. I am glad to meet you.\"")

You can retrieve json data of the translated text, and then we can use it in our word-speak function we previously developed. Here is an example in in German.

(let* ((words-voice "Anna")
       (text "Hello. My name is John. I am glad to meet you.")
       (url (format "http://mymemory.translated.net/api/get?q=%s!&langpair=en|de"
                    text))
       (json (with-current-buffer
                 (url-retrieve-synchronously url)
               (json-read-from-string
                (buffer-substring url-http-end-of-headers (point-max)))))
       (translated-text (cdr (assoc 'translatedText (cdr (assoc 'responseData json))))))
  (words-speak translated-text)
  translated-text)

"Hallo. Mein Name ist John. Ich freue mich, Sie kennen zu lernen.!"

How about Chinese? Again, I downloaded a Chinese voice called "Ting-Ting".

(let* ((words-voice "Ting-Ting")
       (text "Hello. My name is John. I am glad to meet you.")
       (url (format "http://mymemory.translated.net/api/get?q=%s!&langpair=en|zh"
                    text))
       (json (with-current-buffer
                 (url-retrieve-synchronously url)
               (json-read-from-string
                (buffer-substring url-http-end-of-headers (point-max)))))
       (translated-text (cdr (assoc 'translatedText (cdr (assoc 'responseData json))))))
  (words-speak translated-text)
  translated-text)

"你好。我的名字是约翰。我很高兴见到你。!"

So, can any Chinese readers and listeners confirm if the text translates correctly, and if Ting-Ting said it correctly? Hopefully it is good enough to make some sense and be useful!

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Get spoken definitions from the Meriam dictionary

Posted June 30, 2015 at 11:26 AM | categories: emacs | tags:

Now that I can get Emacs to speak words , here is a new application of the idea. We use it to speak the definition of the word at point. We look up the definition here: http://www.dictionaryapi.com/account/index.htm

You may want to head straight to the video to see how this works here: https://www.youtube.com/watch?v=m529gXMrXZA

I had to get an API key for this. I suppose this key should be secret, but it could only be secure by obscurity in any kind of webapp and I don't anticipate using this much so here are the keys I got for the dictionary and thesaurus.

Key (Dictionary): 64f0950a-03b9-4315-9ba5-a73a964251ed Key (Thesaurus): ff0e39e2-b31f-4f17-833c-24e2875aad5d

(with-current-buffer
    (url-retrieve-synchronously
     (format
      "http://www.dictionaryapi.com/api/v1/references/collegiate/xml/%s?key=%s"
      "synchronous"
      "64f0950a-03b9-4315-9ba5-a73a964251ed"))
  (buffer-substring url-http-end-of-headers (point-max)))

<?xml version="1.0" encoding="utf-8" ?>
<entry_list version="1.0">

: <entry id="synchronous"><ew>synchronous</ew><subj>AE-4b#CP-5#TL-5</subj><hw>syn*chro*nous</hw><sound><wav>synchr14.wav</wav><wpr>!siN-kru-nus</wpr></sound><pr>ˈsiŋ-krə-nəs, ˈsin-</pr><fl>adjective</fl><et>Late Latin <it>synchronos,</it> from Greek, from <it>syn-</it> + <it>chronos</it> time</et><def><date>1669</date> <sn>1</sn> <dt>:happening, existing, or arising at precisely the same time</dt> <sn>2</sn> <dt>:recurring or operating at exactly the same periods</dt> <sn>3</sn> <dt>:involving or indicating <fw>synchronism</fw></dt> <sn>4 a</sn> <dt>:having the same period</dt> <sd>also</sd> <dt>:having the same period and phase</dt> <sn>b</sn> <dt>:<sx>geostationary</sx></dt> <sn>5</sn> <dt>:of, used in, or being digital communication (as between computers) in which a common timing signal is established that dictates when individual bits can be transmitted and which allows for very high rates of data transfer</dt><ss>contemporary</ss></def><uro><ure>syn*chro*nous*ly</ure> <fl>adverb</fl></uro><uro><ure>syn*chro*nous*ness</ure> <fl>noun</fl></uro></entry> : <entry id="synchronous motor"><ew>synchronous motor</ew><subj>ME#EE</subj><hw>synchronous motor</hw><fl>noun</fl><def><date>1897</date><dt>:an electric motor having a speed strictly proportional to the frequency of the operating current</dt></def></entry>

</entry_list>

The idea is to query the url, get some xml back, and collect the definitions from it. Then, construct a string of the word, the number of definitions, then the definitions, and say it.

(defun speak-definition ()
  (interactive)
  (let* ((keyword (thing-at-point 'word))
         (api-key "64f0950a-03b9-4315-9ba5-a73a964251ed")
         (xml (with-current-buffer
                  (url-retrieve-synchronously
                   (format
                    "http://www.dictionaryapi.com/api/v1/references/collegiate/xml/%s?key=%s"
                    keyword
                    api-key))
                (xml-parse-region url-http-end-of-headers (point-max))))
         (entries (xml-get-children (car xml) 'entry))
         (nentries (length entries))
         (defs (loop for entry in entries
                     collect (car (xml-get-children entry 'def))))
         (definition (format
                      "%s"
                      (concat
                       (format "%s has %s definition%s. "
                               keyword
                               nentries
                               (if (or (= 0 nentries)
                                       (> nentries 1))
                                   "s"
                                 ""))
                       (mapconcat
                        'identity
                        (loop for element in
                              (loop for def in defs
                                    collect (car (xml-get-children def 'dt)))
                              for i from 1
                              collect (format "%s %s" i (car (xml-node-children element))))
                        " ")))))
    (message definition)
    (do-applescript
     (format
      "say \"%s\"" definition))))

speak-definition

Let us try this out on a few words: asynchronous synchronous flibbity

I guess this would be helpful sometimes ;)

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

« Previous Page -- Next Page »