Getting information about named tables in exported org-files

| categories: orgmode | tags: | View Comments

I have found that the names of tables typically get lost when you export an org-file to another format like html or pdf. Since we may use named tables as data sources, it can become unclear in the exported file what has happened, or which table data came from. In this post, we examine how to include the name of a table in exported html. Here are two named tables tbl-1 and tbl-2 that will form the beginning of our effort.

x y
1 2
2 3

Another table, so we have something to work with later.

a
5
3

Org-buffers get parsed into nested lists, with properties usually in plists. It will be convenient to get a list of the keys for an element, so we can tell what information we have on each element. Some code for this can be found here: http://www.emacswiki.org/emacs/mon-plist-utils.el . Rather than use that recursive approach, here we just loop through the plist and accumulate the keys.

(defun plist-get-keys (plist)
  (interactive)
  (let ((keys))
    (while (car plist)
      (add-to-list 'keys (car plist) t)
      (setq plist (cddr plist)))
    keys))

; example of use
(plist-get-keys '(:a 1 :b 3 :parent '(another plist)))
:a :b :parent

Now, when we parse a buffer for elements, we get a nested lisp data structure, and the best I can tell is we need the cadr of that list to get to the relevant plist of properties. So, here, we map over the tables, and see what properties are available.

(org-element-map
    (org-element-parse-buffer) 'table
  (lambda (element)  (plist-get-keys (cadr element))))
:begin :end :type :tblfm :contents-begin :contents-end :value :post-blank :post-affiliated :name :parent  
:begin :end :type :tblfm :contents-begin :contents-end :value :post-blank :post-affiliated :name :parent  
:begin :end :type :tblfm :contents-begin :contents-end :value :post-blank :post-affiliated :results :parent  
:begin :end :type :tblfm :contents-begin :contents-end :value :post-blank :post-affiliated :caption :parent  
:begin :end :type :tblfm :contents-begin :contents-end :value :post-blank :post-affiliated :name :caption :parent
:begin :end :type :tblfm :contents-begin :contents-end :value :post-blank :post-affiliated :results :parent  

Depending on when you run the codeblock above (i.e. I ran it at different stages of development of this document, so some tables after this point are shown), you see different results; some of the tables are RESULTS from code blocks with no names, and two tables have a caption.

Let us now map over the tables and see if they have names. We add an unnamed table, and a named table, both with captions.

Table 1: an unnamed table of category counts.
category count
emacs 4
orgmode 3
Table 2: an named table of category counts on python.
category count
Python 4
pep8 3

Here we get the names of the tables. Only three tables have names, and several are unnamed.

(org-element-map
    (org-element-parse-buffer) 'table
  (lambda (element)  (plist-get (cadr element) :name)))
tbl-1 tbl-2 python-table

If you think that is a little awkward, I agree. Here is probably a better way to get that information using features in org-mode..

(org-element-map
    (org-element-parse-buffer) 'table
  (lambda (element)  (org-element-property :name element)))
tbl-1 tbl-2 python-table

I had thought we could use a filter to add the name to each table. The issue with filtering is that we get the transcoded text directly, and no practical way to get back to the element it came from (at least none I could find). I have previously used filters (e.g. for changing links on export ) for something like this, but it involved parsing the document once, then exporting, and iterating through the results to change the output. I want to do something different here, and fix the issue on the export.

That requires us to derive a new backend for export, with our new function for formatting. This will give us access to the actual table element, and we can use the original transcoding function to get most of the table, and our own code to modify that before it is exported.

Basically, we just want to add an HTML anchor to the table with some text to indicate the table name. With the anchor we can then link to it elsewhere like this:

See tbl-2

We just define a function that satisfies the transcoding function signature (element contents info), and if our element has a :name property, we will prepend it onto the usual table output for html. We will go ahead and code in some conditional code for different backends, although for now only handle the html backend.

(defun my-table-format (table contents info)
  (let ((tblname (org-element-property :name table)))    
    (cond
     ((eq (elt (plist-get info :back-end) 2) 'html)  
      (concat
       (when tblname
         (format "<br>TBLNAME: <a name=\"%s\"></a>%s<br>" tblname tblname))
       (org-html-table table contents info))))))

(org-export-define-derived-backend 'my-html 'html
  :translate-alist '((table . my-table-format)))


(browse-url (org-export-to-file 'my-html "custom-src-table-export.html"))
#<process open custom-src-table-export.html>

That seems to do it. You may need to see custom-src-table-export.html to see the newly annotated tables, since they probably do not show up in the blog post.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.7c

blog comments powered by Disqus