Using swish-e to index org files as html

| categories: search, emacs | tags:

When we wrote about using swish-e before , we just indexed the org files as text. This worked pretty well, but we lost some resolution, e.g. being able to search for text in a headline. that is more possible if we index html or xml. So, here we try indexing the org files as html. It will be slower to index because we will filter each org file through a command that exports it to html, but hopefully it will be worth it for the enhanced search capability.

We will need a filter shell command that takes an org-file and spits out html. This command is shown as an emacs-lisp script here. This is a pretty bare bones export, and would lack the export of all my custom links from org-ref. I tried this, but org-ref outputs a lot of stuff to stdout when it loads, and unless I can figure out how to suppress that I don't want it here for now.

:;exec emacs -batch -l $0 -f main "$@"
(require 'org)
;(add-to-list 'load-path "/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa")
;(add-to-list 'load-path "/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref")
;(setq package-user-dir "/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa")
;(package-initialize)
;(require 'org-ref)
(defun main ()
  (find-file (car command-line-args-left))
  (org-html-export-as-html nil nil nil t)
  (switch-to-buffer "*Org HTML Export*")
  (print (buffer-string)))

;; Local Variables:
;; mode: emacs-lisp
;; End:

We try it out here:

./org2html.el index-org-as-html.org
"<div id=\"table-of-contents\">
<h2>Table of Contents</h2>
<div id=\"text-table-of-contents\">
<ul>
<li><a href=\"#sec-1\">1. Using swish-e to index org files as html</a></li>
</ul>
</div>
</div>
<div id=\"outline-container-sec-1\" class=\"outline-2\">
<h2 id=\"sec-1\"><span class=\"section-number-2\">1</span> Using swish-e to index org files as html</h2>
<div class=\"outline-text-2\" id=\"text-1\">
<p>
When we wrote about using swish-e <a href=\"http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/\">before</a>, we just indexed the org files as text. This worked pretty well, but we lost some resolution, e.g. being able to search for text in a headline. that is more possible if we index html or xml. So, here we try indexing the org files as html. It will be slower to index because we will filter each org file through a command that exports it to html, but hopefully it will be worth it for the enhanced search capability.
</p>

<p>
We will need a filter shell command that takes an org-file and spits out html. This command is shown as an emacs-lisp script here. This is a pretty bare bones export, and would lack the export of all my custom links
</p>

<p>
cite:dauenhauer-2006-renew
</p>

<div class=\"org-src-container\">

<pre class=\"src src-emacs-lisp\">:;exec emacs -batch -l $0 -f main \"$@\"
(require 'org)
;(add-to-list 'load-path \"/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa\")
;(add-to-list 'load-path \"/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref\")
;(setq package-user-dir \"/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa\")
;(package-initialize)
;(require 'org-ref)
(defun main ()
  (find-file (car command-line-args-left))
  (org-html-export-as-html nil nil nil t)
  (switch-to-buffer \"*Org HTML Export*\")
  (print (buffer-string)))

;; Local Variables:
;; mode: emacs-lisp
;; End:
</pre>
</div>


<div class=\"org-src-container\">

<pre class=\"src src-sh\">./org2html.el index-org-as-html.org
</pre>
</div>

<div class=\"org-src-container\">

<pre class=\"src src-text\"># Example configuration file

# Tell Swish-e what to directories to index
IndexDir /Users/jkitchin/blogofile-jkitchin.github.com

# where to save the index
IndexFile /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/index.swish-e

# What to index
IndexOnly .org

# Tell Swish-e that .txt files are to use the text parser.
IndexContents TXT* .org

FileFilter .org /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/org2html.el

# Ask libxml2 to report any parsing errors and warnings or
# any UTF-8 to 8859-1 conversion errors
ParserWarnLevel 9
</pre>
</div>
</div>
</div>
"

I think that looks good. Now, let's configure a swish indexer.

# Example configuration file

# Tell Swish-e what to directories to index
IndexDir /Users/jkitchin/blogofile-jkitchin.github.com

# where to save the index
IndexFile /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/index.swish-e

# What to index
IndexOnly .org

# Tell Swish-e that .txt files are to use the HTML parser.
IndexContents HTML* .org

FileFilter .org /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/org2html.el

# Ask libxml2 to report any parsing errors and warnings or
# any UTF-8 to 8859-1 conversion errors
ParserWarnLevel 9

MetaNames class swishtitle
HTMLLinksMetaName links

PropertyNames author subjects

StoreDescription HTML <body>

And now, run the index command. I did this at the command line. A lot of output! mostly not being able to fontify source blocks because htmlize was not on the path, and a bunch of attribute parsing errors, and a few utf-8 errors.

swish-e -c swish-org-html.conf

And a test search for files with "selector" in a headline.

swish-e -f index.swish-e -x '%r\t%p\n' -w selector -t h
# SWISH format: 2.4.7
# Search words: selector
# Removed stopwords:
# Number of hits: 4
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000	/Users/jkitchin/blogofile-jkitchin.github.com/org/2015/03/14/A-helm-mu4e-contact-selector.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2015/03/14/A-helm-mu4e-contact-selector.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_deploy/org/2015/03/14/A-helm-mu4e-contact-selector.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_blog/blog-2014.org
.

A phrase in a headline.

swish-e -f index.swish-e -x '%r\t%p\n' -w "information for all documents" -t h
# SWISH format: 2.4.7
# Search words: information for all documents
# Removed stopwords:
# Number of hits: 5
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_blog/blog.org
921	/Users/jkitchin/blogofile-jkitchin.github.com/_blog/blog-2014.org
794	/Users/jkitchin/blogofile-jkitchin.github.com/org/2015/04/03/Getting-data-from-the-Scopus-API.org
794	/Users/jkitchin/blogofile-jkitchin.github.com/_site/org/2015/04/03/Getting-data-from-the-Scopus-API.org
794	/Users/jkitchin/blogofile-jkitchin.github.com/_deploy/org/2015/04/03/Getting-data-from-the-Scopus-API.org
.

Sweet. How about all documents containing this citation:

swish-e -f index.swish-e -x '%r\t%p\n' -w cite:kitchin-2004-modif-pt
# SWISH format: 2.4.7
# Search words: cite:kitchin-2004-modif-pt
# Removed stopwords:
# Number of hits: 3
# Search time: 0.000 seconds
# Run time: 0.008 seconds
1000	/Users/jkitchin/blogofile-jkitchin.github.com/media/2014-02-19-Extracting-bibtex-file-from-an-org-buffer/notes.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_site/media/2014-02-19-Extracting-bibtex-file-from-an-org-buffer/notes.org
1000	/Users/jkitchin/blogofile-jkitchin.github.com/_deploy/media/2014-02-19-Extracting-bibtex-file-from-an-org-buffer/notes.org
.

Super nice.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Pyparsing meets Emacs to find chemical formulas

| categories: emacs, python | tags:

see the video: https://www.youtube.com/watch?v=sjxS9m8QCoo

Today we expand the concepts of clickable text and merge an idea from Python with Emacs. Here we will use Python to find chemical formulas in the buffer, and then highlight them with Emacs. We will use pyparsing to find the chemical formulas and then use them to create a pattern for button-lock. I chose this approach because regular expressions are hard to use on the most general kinds of chemical formulas, and a (possibly recursive) parser should be better equipped to handle this. I adapted an example grammar to match simple chemical formulas, i.e. ones that do not have any parentheses, or charges different than + or -. I think something like this could be done in Emacs, but I am not as familiar with this kind of parsing in Emacs.

Basically, we treat a formula as a group of one or more Elements that have an optional number following them. Spoiler alert: This mostly works, but in the end I conclude there is a clear benefit to a markup language for chemical formulas. Here is an example usage of a parser:

# adapted from [[https://pyparsing.wikispaces.com/file/view/chemicalFormulas.py/31041705/chemicalFormulas.py]]

from pyparsing import *

element = oneOf( """H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No""" )

integer = Word(nums)
elementRef = Group(element + Optional(integer))
chemicalFormula = (WordStart(alphas.upper())
                   + OneOrMore(elementRef).leaveWhitespace()
                   + Optional(Or([Literal("-"),
                                  Literal("+")]))
                   + WordEnd(alphas + nums + "-+"))


s = '''Water is  H2O or OH2  not h2O, methane is CH4 and of course there is PtCl4.
What about H+ and OH-? and carbon or Carbon or H2SO4?

Is this C6H6? or C2H5OH?

and a lot of elements:
H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No'''

matches = []
for match, start, stop in chemicalFormula.scanString(s):
   matches.append(s[start:stop])

print sorted(matches, key=lambda x: len(x), reverse=True)
['C2H5OH', 'PtCl4', 'H2SO4', 'C6H6', 'H2O', 'OH2', 'CH4', 'OH-', 'Uub', 'Uut', 'Uuq', 'Uup', 'Uuh', 'Uus', 'Uuo', 'H+', 'He', 'Li', 'Be', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'Cl', 'Ar', 'Ca', 'Sc', 'Ti', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'Xe', 'Cs', 'Ba', 'Lu', 'Hf', 'Ta', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Lr', 'Rf', 'Db', 'Sg', 'Bh', 'Hs', 'Mt', 'Ds', 'Rg', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Ac', 'Th', 'Pa', 'Np', 'Pu', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'O', 'H', 'B', 'C', 'N', 'O', 'F', 'P', 'S', 'K', 'V', 'Y', 'I', 'W', 'U']

That is pretty good. If the string was actually our buffer, we could use those to create a regexp to put text-properties on them. The trick is how to get the buffer string to the Python function, and then get back usable information in lisp. We actually explored this before ! Rather than use that, we will just create the lisp output manually since this is a simple list of strings.

The first thing we should do is work out a Python script that will output the lisp results we want, which are the found formulas (I tried getting the start and stop positions, but I don't think they map onto the buffer positions very well). Here it is. We set it up as a command line tool that takes a string. We use set to get a unique list, then sort the list by length so we try matching the longest patterns first. There are a few subtle differences in this script and the example above because of some odd false hits I unsuccessfully tried to get rid of.

import sys
from pyparsing import *

element_string =  """H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No"""
element = oneOf([x for x in element_string.split()])

integer = Word(nums)
elementRef = Group(element + Optional(integer))
chemicalFormula = (WordStart(alphas.upper()).leaveWhitespace()
                   + OneOrMore(elementRef).leaveWhitespace()
                   + Optional(Or([Literal("-"),
                                  Literal("+")])).leaveWhitespace()
                   + WordEnd(alphas + alphas.lower() + nums + "-+").leaveWhitespace())

s = sys.stdin.read().strip()

matches = []
for match, start, stop in chemicalFormula.scanString(s):
   matches.append(s[start:stop])
matches = list(set(matches))
matches.sort(key=lambda x: len(x), reverse=True)

print "'(" + ' '.join(["\"{}\"".format(m) for m in matches]) + ')'

Now we can test this:

echo "Water is H2O, methane is CH4 and of course PtCl4, what about H+ and OH-? and carbon or Carbon. Water is H2O not h2o or mH2o, methane is CH4 and of course PtCl4, what about H+ and OH-? carbon, Carbon and SRC, or H2SO4? Is this C6H6? Ethanol is C2H5OH in a sentence.

 C2H5OH firs con

This is CH3OH

H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No
" | ./parse_chemical_formulas.py
'("C2H5OH" "CH3OH" "PtCl4" "H2SO4" "C6H6" "CH4" "OH-" "Uub" "Uuq" "Uup" "Uus" "Uuo" "Uuh" "H2O" "Uut" "Ru" "Re" "Rf" "Rg" "Ra" "Rb" "Rn" "Rh" "Be" "Ba" "Bh" "Bi" "Bk" "Br" "Ho" "Os" "Es" "Hg" "Ge" "Gd" "Ga" "Pr" "Pt" "Pu" "Pb" "Pa" "Pd" "Cd" "Po" "Pm" "Hs" "Hf" "He" "Md" "Mg" "Mo" "Mn" "Mt" "Zn" "H+" "Eu" "Zr" "Er" "Ni" "No" "Na" "Nb" "Nd" "Ne" "Np" "Fr" "Fe" "Fm" "Sr" "Kr" "Si" "Sn" "Sm" "Sc" "Sb" "Sg" "Se" "Co" "Cm" "Cl" "Ca" "Cf" "Ce" "Xe" "Tm" "Cs" "Cr" "Cu" "La" "Li" "Tl" "Lu" "Lr" "Th" "Ti" "Te" "Tb" "Tc" "Ta" "Yb" "Db" "Dy" "Ds" "Ac" "Ag" "Ir" "Am" "Al" "As" "Ar" "Au" "At" "In" "H" "P" "C" "K" "O" "S" "W" "B" "F" "N" "V" "I" "U" "Y")

That seems to work great. Now, we have a list of chemical formulas. Now, the Emacs side to call that function. We do not use regexp-opt here because I found it optimizes too much, and doesn't always match the formulas. We want explicit matches on each formula.

(defun shell-command-on-region-to-string (start end command)
  (with-output-to-string
    (shell-command-on-region start end command standard-output)))

(read (shell-command-on-region-to-string
        (point-min) (point-max)
        "./parse_chemical_formulas.py"))
quote (C2H5OH ext; t CH3OH PtCl4 H2SO4 the fir C6H6 CH4 OH- OH2 Uub co Uuq Uup Uus Uuo Uuh ord H2O Uut Ru Re Rf Rg Ra Rb Rn Rh Be Ba Bh Bi Bk Br Ho Os Es Hg Ge Gd Ga Pr t Pt Pu Pb Pa Pd Cd Po Pm Hs Hf He Md Mg Mo Mn Mt Zn H+ Eu Zr Er Ni No Na Nb Nd Ne Np Fr Fe Fm Sr Kr Si Sn Sm Sc Sb Sg Se Co Cm Cl Ca Cf Ce Xe Tm Cs Cr Cu La Li Tl Lu Lr Th Ti Te Tb Tc as Ta Yb Db Dy Ds In Ac Ag Ir Am Al As Ar Au At n H P l t C r K O S W w B F N V I U Y e i)

That is certainly less than perfect, you can see a few false hits that are not too easy to understand, e.g. why is "fir" or "the " or "as" in the list? They don't even start with an uppercase letter. One day maybe I will figure it out. I assume it is a logic flaw in my parser. Until then, let's go ahead and make the text functional, so it looks up the formula in the NIST webbook. The regexp is a little funny, we have to add word-boundaries to each formula to avoid some funny, bad matches.

(defvar chemical-formula-button nil "store button for removal later.")

(require 'nist-webbook)
(setq chemical-formula-button
      (button-lock-set-button
       (mapconcat
        (lambda (formula)
          (concat "\\<" (regexp-quote formula) "\\>"))
        (eval (read (shell-command-on-region-to-string
                     (point-min) (point-max)
                     "./parse_chemical_formulas.py")))
        "\\|")
       (lambda () (interactive)
         (nist-webbook-formula
          (get-surrounding-text-with-property
           'chemical-formula)))
       :face '((:underline t) (:background "gray80"))
       :help-echo "A chemical formula"
       :additional-property 'chemical-formula))

Here are a few tests: CH4, C2H5OH, C6H6. C(CH3)4. C6H6 is benzene. As you can see our pattern lacks context; the first word of the sentence is "as" not the symbol for arsenic. Also, our parser does not consider formulas with parentheses in them. Whenever I refer to myself, I mean myself, and not the element iodine. There are a few weird matchs I just don't understand, like firs d t x rn lac? These do not seem to match anything, and I wonder how they are getting in the list. I think this really shows that it would be useful to use some light markup for chemical formulas which would a) provide context, and b) enhance parsing accuracy. In LaTeX you would use \ce{I} to indicate that is iodine, and not a reference to myself. That is more clear than saying I use I in chemical reactions ;) And it also clarifies sentences like the letter W is used to represent tungsten as the symbol \ce{W}.

Nevertheless, we can click on the formulas, and get something to happen that is potentially useful. Is this actually useful? Conceptually yes, I think it could be, but clearly the parsing is not recognizing formulas perfectly. Sending the buffer to a dedicated program that can return a list of matches to highlight in Emacs is a good idea, especially if it is not easy to build in Emacs, or if a proven solution already exists.

Finally, we can remove the highlighted text like this. That was the reason for saving the button earlier!

(when chemical-formula-button
  (button-lock-unset-button chemical-formula-button)
  (setq chemical-formula-button nil))

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Spoken translations in Emacs

| categories: speech, emacs | tags:

Finally, continuing our experiments with computer speech for fun, let us try a translation of text to another language that is then spoken. Here is a free translator that has the courtesy to reply with json with the translated text in it. http://mymemory.translated.net/api/get?q=Hello%20World!&langpair=en|de I had to download a German voice called Anna, then get some translated text.

As with previous posts, there is a video: https://www.youtube.com/watch?v=8CBKnahE0ak . I am trying ScreenFlow for these (instead of Camtasia), and I still have not quite mastered the aspect ratio, so the videos still look a little odd.

As a reminder, we have this easy way to speak text in applescript. If you are on Linux, check out Festival and on windows you may find some inspiration here .

(do-applescript "say \"Hello. My name is John. I am glad to meet you.\"")

You can retrieve json data of the translated text, and then we can use it in our word-speak function we previously developed. Here is an example in in German.

(let* ((words-voice "Anna")
       (text "Hello. My name is John. I am glad to meet you.")
       (url (format "http://mymemory.translated.net/api/get?q=%s!&langpair=en|de"
                    text))
       (json (with-current-buffer
                 (url-retrieve-synchronously url)
               (json-read-from-string
                (buffer-substring url-http-end-of-headers (point-max)))))
       (translated-text (cdr (assoc 'translatedText (cdr (assoc 'responseData json))))))
  (words-speak translated-text)
  translated-text)
"Hallo. Mein Name ist John. Ich freue mich, Sie kennen zu lernen.!"

How about Chinese? Again, I downloaded a Chinese voice called "Ting-Ting".

(let* ((words-voice "Ting-Ting")
       (text "Hello. My name is John. I am glad to meet you.")
       (url (format "http://mymemory.translated.net/api/get?q=%s!&langpair=en|zh"
                    text))
       (json (with-current-buffer
                 (url-retrieve-synchronously url)
               (json-read-from-string
                (buffer-substring url-http-end-of-headers (point-max)))))
       (translated-text (cdr (assoc 'translatedText (cdr (assoc 'responseData json))))))
  (words-speak translated-text)
  translated-text)
"你好。我的名字是约翰。我很高兴见到你。!"

So, can any Chinese readers and listeners confirm if the text translates correctly, and if Ting-Ting said it correctly? Hopefully it is good enough to make some sense and be useful!

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Get spoken definitions from the Meriam dictionary

| categories: emacs | tags:

Now that I can get Emacs to speak words , here is a new application of the idea. We use it to speak the definition of the word at point. We look up the definition here: http://www.dictionaryapi.com/account/index.htm

You may want to head straight to the video to see how this works here: https://www.youtube.com/watch?v=m529gXMrXZA

I had to get an API key for this. I suppose this key should be secret, but it could only be secure by obscurity in any kind of webapp and I don't anticipate using this much so here are the keys I got for the dictionary and thesaurus.

Key (Dictionary): 64f0950a-03b9-4315-9ba5-a73a964251ed Key (Thesaurus): ff0e39e2-b31f-4f17-833c-24e2875aad5d

(with-current-buffer
    (url-retrieve-synchronously
     (format
      "http://www.dictionaryapi.com/api/v1/references/collegiate/xml/%s?key=%s"
      "synchronous"
      "64f0950a-03b9-4315-9ba5-a73a964251ed"))
  (buffer-substring url-http-end-of-headers (point-max)))
<?xml version="1.0" encoding="utf-8" ?>
<entry_list version="1.0">

: <entry id="synchronous"><ew>synchronous</ew><subj>AE-4b#CP-5#TL-5</subj><hw>syn*chro*nous</hw><sound><wav>synchr14.wav</wav><wpr>!siN-kru-nus</wpr></sound><pr>ˈsiŋ-krə-nəs, ˈsin-</pr><fl>adjective</fl><et>Late Latin <it>synchronos,</it> from Greek, from <it>syn-</it> + <it>chronos</it> time</et><def><date>1669</date> <sn>1</sn> <dt>:happening, existing, or arising at precisely the same time</dt> <sn>2</sn> <dt>:recurring or operating at exactly the same periods</dt> <sn>3</sn> <dt>:involving or indicating <fw>synchronism</fw></dt> <sn>4 a</sn> <dt>:having the same period</dt> <sd>also</sd> <dt>:having the same period and phase</dt> <sn>b</sn> <dt>:<sx>geostationary</sx></dt> <sn>5</sn> <dt>:of, used in, or being digital communication (as between computers) in which a common timing signal is established that dictates when individual bits can be transmitted and which allows for very high rates of data transfer</dt><ss>contemporary</ss></def><uro><ure>syn*chro*nous*ly</ure> <fl>adverb</fl></uro><uro><ure>syn*chro*nous*ness</ure> <fl>noun</fl></uro></entry> : <entry id="synchronous motor"><ew>synchronous motor</ew><subj>ME#EE</subj><hw>synchronous motor</hw><fl>noun</fl><def><date>1897</date><dt>:an electric motor having a speed strictly proportional to the frequency of the operating current</dt></def></entry>

</entry_list>

The idea is to query the url, get some xml back, and collect the definitions from it. Then, construct a string of the word, the number of definitions, then the definitions, and say it.

(defun speak-definition ()
  (interactive)
  (let* ((keyword (thing-at-point 'word))
         (api-key "64f0950a-03b9-4315-9ba5-a73a964251ed")
         (xml (with-current-buffer
                  (url-retrieve-synchronously
                   (format
                    "http://www.dictionaryapi.com/api/v1/references/collegiate/xml/%s?key=%s"
                    keyword
                    api-key))
                (xml-parse-region url-http-end-of-headers (point-max))))
         (entries (xml-get-children (car xml) 'entry))
         (nentries (length entries))
         (defs (loop for entry in entries
                     collect (car (xml-get-children entry 'def))))
         (definition (format
                      "%s"
                      (concat
                       (format "%s has %s definition%s. "
                               keyword
                               nentries
                               (if (or (= 0 nentries)
                                       (> nentries 1))
                                   "s"
                                 ""))
                       (mapconcat
                        'identity
                        (loop for element in
                              (loop for def in defs
                                    collect (car (xml-get-children def 'dt)))
                              for i from 1
                              collect (format "%s %s" i (car (xml-node-children element))))
                        " ")))))
    (message definition)
    (do-applescript
     (format
      "say \"%s\"" definition))))
speak-definition

Let us try this out on a few words: asynchronous synchronous flibbity

I guess this would be helpful sometimes ;)

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter

Getting Emacs to read to me

| categories: emacs | tags:

I thought it would be interesting to have Emacs read text on the screen. Why? Sometimes I get tired of reading ;) Seriously though, this has applications in accessibility, learning to read, translation, taking a break from looking at the screen, reading emails out loud, fun and games, etc… Seems like a worthwhile endeavor!

You may want to see this video: https://www.youtube.com/watch?v=8bgS8yDSkXw to hear how it works.

On a Mac, it turns out to be easy to get a voice with a little applescript:

(do-applescript "say \"Hello John\" using \"Victoria\"")

Interesting idea to integrate some feedback into Emacs-lisp functions! at least if you are on a Mac. All we need are some interactive functions that grab text, and pass them to the applescript with an appropriate amount of escaping any quotes and backslashes.

Here is a function to speak the word at point, or selected region, or the text passed to the function:

(defvar words-voice "Vicki"
  "Mac voice to use for speaking.")

(defun words-speak (&optional text)
  "Speak word at point or region. Mac only."
  (interactive)
  (unless text
    (setq text (if (use-region-p)
                   (buffer-substring
                    (region-beginning) (region-end))
                 (thing-at-point 'word))))
  ;; escape some special applescript chars
  (setq text (replace-regexp-in-string "\\\\" "\\\\\\\\" text))
  (setq text (replace-regexp-in-string "\"" "\\\\\"" text))
  (do-applescript
   (format
    "say \"%s\" using \"%s\""
    text
    words-voice)))
words-speak

Now we can write:

(words-speak "Hello John")

One reason I wrote this is to read org-files to me. So, now we write some functions to read words, sentences and paragraphs. These are all syntactic units in Emacs. We write code to enable us to read the next or previous units with the prefix args. Finally, we bind the commands to some keys and a hydra for fun.

(setq sentence-end-double-space nil)

(defun mac-say-word (&optional arg)
  "Speak word at point. With ARG, go forward ARG words."
  (interactive "P")
  ;; arg can be (4), 4, "-", or -1. we handle these like this.
  (let ((newarg))
    (when arg
      (setq newarg (cond
                    ((listp arg)
                     (round (log (car arg) 4)))
                    ((and (stringp arg) (string= "-" arg))
                     ((< 0 arg) arg)
                     -1)
                    (t arg)))
      (forward-word newarg))
    (when (thing-at-point 'word)
      (words-speak (thing-at-point 'word)))))

(defun mac-say-sentence (&optional arg)
  "Speak sentence at point. With ARG, go forward ARG sentences."
  (interactive "P")
  ;; arg can be (4), 4, "-", or -1. we handle these like this.
  (let ((newarg))
    (when arg
      (setq newarg (cond
                    ((listp arg)
                     (round (log (car arg) 4)))
                    ((and (stringp arg) (string= "-" arg))
                     ((< 0 arg) arg)
                     -1)
                    (t arg)))
      (forward-sentence newarg)
      (when (< 0 newarg) (forward-word)))
    (when (thing-at-point 'sentence)
      (words-speak (thing-at-point 'sentence)))))

(defun mac-say-paragraph (&optional arg)
  "Speak paragraph at point. With ARG, go forward ARG paragraphs."
  (interactive "P")
  ;; arg can be (4), 4, "-", or -1. we handle these like this.
  (let ((newarg))
    (when arg
      (setq newarg (cond
                    ((listp arg)
                     (round (log (car arg) 4)))
                    ((and (stringp arg) (string= "-" arg))
                     ((< 0 arg) arg)
                     -1)
                    (t arg)))
      (forward-paragraph newarg)
      (when (< 0 newarg) (forward-word)))
    (when (thing-at-point 'paragraph)
      (words-speak (thing-at-point 'paragraph)))))
mac-say-paragraph

Now for some key-bindings. I will make a hydra that allows repeating commands, and a keymap for more direct function calls.

(defhydra mac-speak (:color red)
  "word speak"
  ("w" (progn (mac-say-word) (forward-word)) "Next word")
  ("W" (mac-say-word -1) "Previous word")
  ("s" (progn (mac-say-sentence) (forward-sentence)(forward-word)) "Next sentence")
  ("S" (mac-say-sentence -1) "Previous sentence")
  ("p" (progn (mac-say-paragraph) (forward-paragraph)) "Next paragraph")
  ("P" (mac-say-paragraph -1) "Previous paragraph"))

(define-prefix-command 'mac-speak-keymap)
(define-key mac-speak-keymap (vector ?w) 'mac-say-word)
(define-key mac-speak-keymap (vector ?s) 'mac-say-sentence)
(define-key mac-speak-keymap (vector ?p) 'mac-say-paragraph)
(define-key mac-speak-keymap (vector ?h) 'mac-speak/body)
(global-set-key (kbd "\C-xr") 'mac-speak-keymap)
mac-speak-keymap

Now, I can navigate text and have my Mac read it to me. It isn't quite like hearing a real person read it, but it is not too bad either. When you need a break from reading, this might be a nice tool!

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

Discuss on Twitter
« Previous Page -- Next Page »