## Exporting org-mode to Jupyter notebooks

| categories: | tags: | View Comments

I am going to use Jupyter notebooks to teach from this semester. I really dislike preparing notebooks though. A browser is a really poor editor, and I really dislike Markdown. Notebooks do not seem to have any real structure in them, e.g. the collapsible outline that I am used to in org-mode, so for long notebooks, it is difficult to get a sense for the structure. I am anticipating spending up to 80 hours preparing notebooks this semester, so today I worked out some code to export org-mode to an ipython notebook!

This will let me use the power tools I am accustomed to for the creation of IPython notebooks for my students, and perhaps others who do not use org-mode.

Jupyter notebooks are just json files, so all we need to do is generate it from an org document. The basic strategy was to build up a lisp data structure that represents the notebook and then just convert that data structure to json. I split the document up into sequential markdown and code cells, and then encode those in the format required for the notebook (json).

So, here is an example of what can be easily written in org-mode, posted to this blog, and exported to an IPython notebook, all from one org-document.

Check out the notebook: exporting-orgmode-to-ipynb.ipynb .

## 1 Solve a nonlinear problem

Consider the equation $$x^2 = 4$$. Find a solution to it in Python using a nonlinear solver.

To do that, we need to define an objective function that will be equal to zero at the solution. Here is the function:

def objective(x):
return x**2 - 4


Next, we use fsolve with an initial guess. We get fsolve from scipy.optimize.

from scipy.optimize import fsolve

ans = fsolve(objective, 3)
print(ans)

[ 2.]


That should have been an obvious answer. The answer is in brackets because fsolve returns an array. In the next block we will unpack the solution into the answer using the comma operator. Also, we can see that using a different guess leads to a different answer. There are, of course, two answers: $$x = \pm 2$$

ans, = fsolve(objective, -3)
print(ans)

-2.0


Now you see we get a float answer!

Here are some other ways to get a float:

ans = fsolve(objective, -3)

print(float(ans))
print(ans[0])

-2.0000000000000084
-2.0


It is worth noting from the first result that fsolve is iterative and stops when it reaches zero within a tolerance. That is why it is not exactly -2.

## 2 Benefits of export to ipynb

1. I can use org-mode
2. And emacs
3. and ipynb for teaching.

The export supports org-markup: bold, italic, underlined, and ~~strike~~.

We can use tables:

Table 1: A table of squares.
x y
1 2
2 4
3 9
4 16

We can make plots.

import numpy as np

t = np.linspace(0, 2 * np.pi)

x = np.cos(t)
y = np.sin(t)

import matplotlib.pyplot as plt
plt.plot(x, y)
plt.axis('equal')
plt.xlabel('x')
plt.ylabel('y')
plt.savefig('circle.png')


Even include HTML: <font color="red">Pay special attention to the axis labels!</font>

## 3 Limitations

• Only supports iPython blocks
• Does not do inline images in results
• Will not support src-block variables
• Currently only supports vanilla output results

## 4 Summary

The code that does this is here: ox-ipynb.el . After I use it a while I will put it in scimax. There are some tricks in it to fix up some markdown export of latex fragments and links with no descriptions.

I just run this command in Emacs to get the notebook. Even it renders reasonably in the notebook.

(export-ipynb-buffer)


Overall, this looks extremely promising to develop lecture notes and assignments in org-mode, but export them to Ipython notebooks for the students.

org-mode source

Org-mode version = 9.0.3

## Find stuff in org-mode anywhere

| categories: | tags: | View Comments

I use org-mode extensively. I write scientific papers, keep notes on meetings, write letters of recommendation, notes on scientific articles, keep TODO lists in projects, help files for software, write lecture notes, students send me homework solutions in it, it is a contact database, … Some files are on Dropbox, Google Drive, Box, some in git repos, etc. The problem is that leads to org-files everywhere on my hard drive. At this point I have several thousand org-files that span about five years of work.

It is not that easy after a while to find them. Yes there are things like recent-files, bookmarks, counsel-find-file, helm-for-files, counsel/helm-locate, helm/counsel-grep/ag/pt, projectile for searching within a project, a slew of tools to search open buffers, there is recoll, etc… There are desktop search tools, and of course, good organization habits. Over a five year time span though, these change, and I have yet to find a solution to finding what I want. What about a file I made a year ago that is not in the current directory or this project, and not in my org-agenda-files list? How do I get a dynamic todo list across all these files? Or find all the files that cite a particular bibtex entry, or that were authored by a particular student?

Previously, I indexed org files with Swish-e to make it easy to search them, with an ability to search just headlines, or paragraphs, etc. The problem with that is the nightly indexing was slow since I basically had to regenerate the database each time due to limitations in Swish-e. Finally I have gotten around to the next iteration of this idea, which is a better database. In this post, I explore using sqlite to store headlines and links in org-files.

The idea is that anytime I open or save any org file, it will be added/updated in the database. The database will store the headlines and its properties and content, as well as the location and properties of all links and file keywords. That means I should be able to efficiently query all org files I have ever visited to find TODO headlines, tagged headlines, different types of links, etc. Here we try it out and see if it is useful.

## 1 The database design

I used emacsql to create and interact with a sqlite3 database. It is a lispy way to generate SQL queries. I will not talk about the code much here, you can see this version org-db.el . The database design consists of several tables that contain the filenames, headlines, tags, properties, (optionally) headline-content, headline-tags, headline-properties, and links. The lisp code is a work in progress, and not something I use on a daily basis yet. This post is a proof of concept to see how well this approach works.

I use hooks to update the database when an org-file is opened (only if it is different than what is in the database based on an md5 hash) and when it is saved. Basically, these functions delete the current entries in the database for a file, then use regular expressions to go to each headline or link in the file, and add data back to the database. I found this to be faster than parsing the org-file with org-element especially for large files. Since this is all done by a hook, anytime I open an org-file anywhere it gets added/updated to the database. The performance of this is ok. This approach will not guarantee the database is 100% accurate all the time (e.g. if something modifies the file outside of emacs, like a git pull), but it doesn't need to be. Most of the files do not change often, the database gets updated each time you open a file, and you can always reindex the database from files it knows about. Time will tell how often that seems necessary.

emacsql lets you use lisp code to generate SQL that is sent to the database. Here is an example:

(emacsql-flatten-sql [:select [name] :from main:sqlite_master :where (= type table)])

SELECT name FROM main.sqlite_master WHERE type = "table";


There are some nuances, for example, main:sqlite_master gets converted to main.sqlite_master. You use vectors, keywords, and sexps to setup the command. emacsql will turn a name like filename-id into filename_id. It was not too difficulty to figure out, and the author of emacsql was really helpful on a few points. I will be referring to this post in the future to remember some of these nuances!

Here is a list of tables in the database. There are a few primary tables, and then some that store tags, properties, and keywords on the headlines. This is typical of emacsql code; it is a lisp expression that generates SQL. In this next expression org-db is a variable that stores the database connection created in org-db.el.

(emacsql org-db [:select [name] :from main:sqlite_master :where (= type table)])


Here is a description of the columns in the files table:

(emacsql org-db [:pragma (funcall table_info files)])

 0 rowid INTEGER 0 nil 1 1 filename 0 nil 0 2 md5 0 nil 0

(emacsql org-db [:pragma (funcall table_info headlines)])

 0 rowid INTEGER 0 nil 1 1 filename_id 0 nil 0 2 title 0 nil 0 3 level 0 nil 0 4 todo_keyword 0 nil 0 5 todo_type 0 nil 0 6 archivedp 0 nil 0 7 commentedp 0 nil 0 8 footnote_section_p 0 nil 0 9 begin 0 nil 0

The database is not large if all it has is headlines and links (no content). It got up to half a GB with content, and seemed a little slow, so for this post I leave the content out.

du -hs ~/org-db/org-db.sqlite

 56M /Users/jkitchin/org-db/org-db.sqlite

Here we count how many files are in the database. These are just the org-files in my Dropbox folder. There are a lot of them! If I include all the org-files from my research and teaching projects this number grows to about 10,000! You do not want to run org-map-entries on that. Note this also includes all of the org_archive files.

(emacsql org-db [:select (funcall count) :from files])

 1569

Here is the headlines count. You can see there is no chance of remembering where these are because there are so many!

(emacsql org-db [:select (funcall count) :from headlines])

 38587

(emacsql org-db [:select (funcall count) :from links])

 303739

That is a surprising number of links.

## 2 Querying the link table

Let's see how many are cite links from org-ref there are.

(emacsql org-db [:select (funcall count) :from links :where (= type "cite")])

 14766

Wow, I find that to also be surprisingly large! I make a living writing proposals and scientific papers, and I wrote org-ref to make that easier, so maybe it should not be so surprising. We can search the link database for files containing citations of "kitchin-2015-examp" like this. The links table only stores the filename-id, so we join it with the files table to get useful information. Here we show the list of files that contain a citation of that reference. It is a mix of manuscripts, proposals, presentations, documentation files and notes.

(emacsql org-db [:select :distinct [files:filename]
:where (and (= type "cite") (like path "%kitchin-2015-examp%"))])


Obviously we could use this to generate candidates for something like helm or ivy like this.

(ivy-read "Open: " (emacsql org-db [:select [files:filename links:begin]
:where (and (= type "cite") (like path "%kitchin-2015-examp%"))])
:action '(1 ("o"
(lambda (c)
(find-file (car c))
(goto-char (nth 1 c))
(org-show-entry)))))

/Users/jkitchin/Dropbox/CMU/manuscripts/2015/human-readable-data/manuscript.org


Now, you can find every org-file containing any bibtex key as a citation. Since SQL is the query language, you should be able to build really sophisticated queries that combine filters for multiple citations, different kinds of citations, etc.

Every headline is stored, along with its location, tags and properties. We can use the database to find headlines that are tagged or with certain properties. You can see here I have 293 tags in the database.

(emacsql org-db [:select (funcall count) :from tags])

 293

Here we find headlines tagged with electrolyte. I tagged some papers I read with this at some point.

(emacsql org-db [:select :distinct [files:filename headlines:title]
:inner :join tags :on (= tags:rowid headline-tags:tag-id)
:inner :join files :on (= headlines:filename-id files:rowid)
:where (= tags:tag "electrolyte") :limit 5])

 /Users/jkitchin/Dropbox/org-mode/prj-doe-early-career.org 2010 - Nickel-borate oxygen-evolving catalyst that functions under benign conditions /Users/jkitchin/Dropbox/bibliography/notes.org 1971 - A Correlation of the Solution Properties and the Electrochemical Behavior of the Nickel Hydroxide Electrode in Binary Aqueous Alkali Hydroxides /Users/jkitchin/Dropbox/bibliography/notes.org 1981 - Studies concerning charged nickel hydroxide electrodes IV. Reversible potentials in LiOH, NaOH, RbOH and CsOH /Users/jkitchin/Dropbox/bibliography/notes.org 1986 - The effect of lithium in preventing iron poisoning in the nickel hydroxide electrode /Users/jkitchin/Dropbox/bibliography/notes.org 1996 - The role of lithium in preventing the detrimental effect of iron on alkaline battery nickel hydroxide electrode: A mechanistic aspect

Here we see how many entries have an EMAIL property. These could serve as contacts to send email to.

(emacsql org-db [:select [(funcall count)] :from
:inner :join properties :on (= properties:rowid headline-properties:property-id)
:where (and (= properties:property "EMAIL") (not (null headline-properties:value)))])

 7452

If you want to see the ones that match "jkitchin", here they are.

(emacsql org-db [:select :distinct [headlines:title headline-properties:value] :from
:inner :join properties :on (= properties:rowid headline-properties:property-id)
:where (and (= properties:property "EMAIL") (like headline-properties:value "%jkitchin%"))])

 John Kitchin jkitchin@andrew.cmu.edu John Kitchin jkitchin@cmu.edu Kitchin, John jkitchin@andrew.cmu.edu

Here is a query to find the number of headlines where the deadline matches 2017. Looks like I am already busy!

(emacsql org-db [:select (funcall count) :from
:inner :join properties :on (= properties:rowid headline-properties:property-id)

 50

## 4 Keyword queries

We also store file keywords, so we can search on document titles, authors, etc. Here are five documents with titles longer than 35 characters sorted in descending order.

(emacsql org-db [:select :distinct [value] :from
file-keywords :inner :join keywords :on (= file-keywords:keyword-id keywords:rowid)
:where (and (> (funcall length value) 35) (= keywords:keyword "TITLE"))
:order :by value :desc
:limit 5])

 pycse - Python3 Computations in Science and Engineering org-show - simple presentations in org-mode org-mode - A Human Readable, Machine Addressable Approach to Data Archiving and Sharing in Science and Engineering modifying emacs to make typing easier. jmax - John's customizations to maximize Emacs

It is possible to search on AUTHOR, and others. My memos have a #+SUBJECT keyword, so I can find memos on a subject. They also use the LATEX_CLASS of cmu-memo, so I can find all of them easily too:

(emacsql org-db [:select [(funcall count)] :from
file-keywords :inner :join keywords :on (= file-keywords:keyword-id keywords:rowid)
:where (and (= value "cmu-memo") (= keywords:keyword "LATEX_CLASS"))
:limit 5])

 119

How about that, 119 memos… Still it sure is nice to be able to find them.

## 5 Full text search

In theory, the database has a table for the headline content, and it should be fully searchable. I found the database got a little sluggish, and nearly 1/2 a GB in size when using it so I am leaving it out for now.

## 6 Summary

The foundation for something really good is here. It is still a little tedious to wrote the queries with all the table joins, but some of that could be wrapped into a function for a query. I like the lispy style of the queries, although it can be tricky to map all the concepts onto SQL. A function that might wrap this could look like this:

(org-db-query (and (= properties:property "DEADLINE") (glob headline-properties:value "*2017*")))


This is what it would ideally look like using the org tag/property match syntax. Somehow that string would have to get expanded to generate the code above. I do not have a sense for how difficult that would be. It might not be hard with a recursive descent parser, written by the same author as emacsql.

(org-db-query "DEADLINE={2017}")


The performance is only ok. For large org files there is a notable lag in updating the database, which is notable because while updating, Emacs is blocked. I could try using an idle timer for updates with a queue, or get more clever about when to update. It is not essential that the updates be real-time, only that they are reasonably accurate or done by the time I next search. For now, it is not too annoying though. As a better database, I have had my eye on xapian since that is what mu4e (and notmuch) uses. It might be good to have an external library for parsing org-files, i.e. not through emacs, for this. It would certainly be faster. It seems like a big project though, maybe next summer ;)

Another feature this might benefit from is ignore patterns, or some file feature that prevents it from being indexed. For example, I keep an encrypted password file in org-mode, but as soon as I opened it, it got indexed right into the database, in plain text. If you walk your file system, it might make sense to avoid some directories, like .dropbox.cache. Otherwise, this still looks like a promising approach.

org-mode source

Org-mode version = 9.0.3

## Context-specific org-mode speed keys

| categories: | tags: | View Comments

I have been using org-mode to make a contact database. A contact is basically just a headline with an EMAIL property, e.g. https://julien.danjou.info/projects/emacs-packages#org-contacts. I thought it would be nice to have an org-mode speed key so that if I was at the beginning of a contact headline, I could just press "e" to open an email buffer to that contact. This might generally be useful to have different speed keys that serve different purposes or are only defined on specific types of headlines.

Org-mode already had this feature in mind for speed keys. All you have to do is define the list of speed keys and their functions, provide a function that picks the right one, and add it to the org-speed-command-hook. Here is the code that makes this possible. This defines "c" to copy the email to the clipboard, "e" to email the contact, and "m" to copy a "name <email>" string to the clipboard, but only when you are on a headline with an EMAIL property. If there is not a contact specific speed key defined, then a user-defined speed key or a default key will be used if it is defined. In case I do not remember the keys, "?" will show them to me. It is small hack, but if you end up using the contact headlines for much, it might be really helpful as an alternative to M-x some-contacts-command.

(setq org-speed-commands-contacts
'(("c" . (lambda ()
"Copy the email address to the clipboard."
(message (kill-new (org-entry-get (point) "EMAIL")))))
("e" . (lambda ()
"Send an email to the contact."
(let ((email (org-entry-get (point) "EMAIL")))
(compose-mail)
(message-goto-to)
(insert email)
(message-goto-subject))))
("m" . (lambda ()
"Copy \"name <email>\""
(message (kill-new
(format "%s <%s>"
(org-entry-get (point) "EMAIL"))))))
("?" . (lambda ()
"Print contacts speed key help."
(with-output-to-temp-buffer "*Help*"
(princ "Contacts Speed commands\n===========================\n")
(mapc #'org-print-speed-command org-speed-commands-contacts)
(princ "\n")
(princ "User-defined Speed commands\n===========================\n")
(mapc #'org-print-speed-command org-speed-commands-user)
(princ "Built-in Speed commands\n=======================\n")
(mapc #'org-print-speed-command org-speed-commands-default))
(with-current-buffer "*Help*"
(setq truncate-lines t))))))

(defun org-speed-contacts (keys)
(when (and (bolp) (looking-at org-outline-regexp)
(not (null (org-entry-get (point) "EMAIL"))))
(cdr (assoc keys org-speed-commands-contacts))))



org-mode source

Org-mode version = 9.0

## Persistent highlighting in Emacs

| categories: | tags: | View Comments

In this recent post I showed a way to use org-mode links to color text. The main advantage of that approach is it is explicit markup in the file, so it is persistent and exportable to html. The downside of that approach is you cannot use it in code, since the markup will break the code.

An alternative approach is to use overlays to color the text. This allows you to color the text, add annotations as tooltips and to provide a variety of highlighting colors. Overlays are not explicit markup in the file, so it is necessary to think of a way to save them so they can be restored later. We do this by using hook functions to store the overlays in a file-local variable on saving, and a file-local variable to restore the overlays when the file is opened. I bind the primary function ov-highlighter/body' to a key, in my case hyper-h, which launches a hydra to access the commands.

You can find the code here: https://github.com/jkitchin/scimax/blob/org-9/ov-highlighter.el. Probably around mid-December it will get merged into the master branch.

Here is what this looks like in my buffer:

You may want to see the video:

1. blue green pink yellow custom
2. Put a comment here.
3. Markup a tpyo.
4. Get a list of the highlights in the buffer.

These highlights are pretty awesome. They work in code blocks, and comments. They also work in non-org files (only in Emacs of course).

a = 5
b = 6

print(a+b)#print the sum of a and b


11

Overall, this is pretty handy. You can highlight your own notes, provide feedback to others, etc. without changing the actual text in the document (well, except for the local variables at the end of the buffer, but these are usually in a "comment" that does not affect the document).

Here are few limitations though:

1. You can only edit/change the file in Emacs, and the hook functions have to enabled, or the overlay data will get corrupted. That means a merge conflict can ruin the overlays.
2. Anyone you share the file with needs to have the ov-highlighter library loaded too. Otherwise they will not see the highlights, and any edits will make the overlay data incorrect.
3. The highlights do not export from org-mode (although they do work with htmlize-buffer'!).
(let* ((html-buffer (htmlize-buffer))
(html (with-current-buffer html-buffer
(buffer-string))))
(with-temp-file "test.html"
(insert html))
(kill-buffer html-buffer))

(browse-url "test.html")

#<process open test.html>


org-mode source

Org-mode version = 9.0

## New and improved asynchronous org-babel python blocks

| categories: | tags: | View Comments

About a year ago I posted some code to run org-babel python blocks asynchronously. This year, my students asked for some enhancements related to debugging. Basically, they were frustrated by a few things when they got errors. First, they found it difficult to find the line number in the Traceback in the src block because there are no line numbers in the block, and it is annoying to do a special edit just for line numbers.

I thought about this, and figured out how to significantly improve the situation. The async python code in scimax now has the following features:

1. When you get a Traceback, it goes in the results, and each file listed in it is hyperlinked to the source file and line so it is easy to get to them.
2. The cursor jumps to the last line in the code block that is listed in the Traceback, and a beacon shines to show you the line
3. You can turn on temporary line numbers in the code block to see where the lines are in the block, and these disappear when you start typing. This is done in the variable org-babel-async-python-show-line-numbers'.
4. You can control whether a buffer of the results shows or not via the variable org-babel-async-python-show-results'.
5. When you run the block, you get a clickable link in the RESULTS section to kill the process.
6. You may also find the autopep8' and pylint' functions helpful.

The code for this is currently found here: https://github.com/jkitchin/scimax/blob/org-9/scimax-org-babel-python.el

Eventually, I will merge this into master, after I am sure about all the changes needed for org 9.0. That is not likely to happen until the semester ends, so I do not mess up my students who use scimax in class. So, sometime mid-December it will make into master.

To make async the default way to run a python block use this code, so that you can use C-c C-c to run them:

(require 'scimax-org-babel-python)


As with the past few posts, this video will make it much more clear what the post is about:

Here is a prototypical example that shows how it works. While it runs you can view the progress if you click on the link to show the results.

import time

for i in range(5):
print(i)
time.sleep(2)


0 1 2 3 4 Traceback (most recent call last): File "Org SRC", line 5, in <module> time.sleep(2) KeyboardInterrupt

This block has a pretty obvious issue when we run it. The cursor jumps right to the problem!

print('This line is ok')
# 5 / 0
print('We will not see this')


This line is ok We will not see this

This block shows we can access any of the links in the Traceback. Here we have an error in calling a function that is raised in an external file.

import numpy as np
from scipy.integrate import odeint

Vspan = np.linspace(0, 2) # L

# dF/dV = F
def dFdV(F, V, v0):
return F

print(odeint(dFdV, 1.0, Vspan))


Traceback (most recent call last): File "Org SRC", line 11, in <module> print(odeint(dFdV, 1.0, Vspan)) File "/Users/jkitchin/anaconda3/lib/python3.5/site-packages/scipy/integrate/odepack.py", line 215, in odeint ixpr, mxstep, mxhnil, mxordn, mxords) TypeError: dFdV() missing 1 required positional argument: 'v0'

Here we show how nice it is to be able to kill a process. This block will not end on its own.

while True:
pass


Traceback (most recent call last): File "Org SRC", line 2, in <module> pass KeyboardInterrupt

## 1 autopep8

autopep8 is a tool for reformatting Python code. We wrapped this into an Emacs command so you can quickly reformat a Python code block.

a = 4
b = 5
c = a * b  # comment
# another comment

def f(x):
return x
print(f(5))


## 2 pylint

pylint is a great tool for checking your Python code for errors, style and conventions. We also wrapped this into an Emacs command so you can run it on a Python src block. The report that is generated had clickable links to help you get right to the lines in your code block with problems.

import numpy as np

a = np.array(5, 5)

def f(x): return x

print(f(6))