f-strings in emacs-lisp

| categories: emacs, elisp | tags: | View Comments

I am a big fan of f-strings in Python 3. They let you put variable names and expressions in a string template that get expanded to create new strings. Here is a simple example of using those:

username = 'John Kitchin'
somevar = 5**0.5
print(f'{username:30s}{somevar:1.2f}')
John Kitchin                  2.24


String formatting in emacs-lisp is by comparison not as fun and easy. Out of the box we have:

(let ((username "John Kitchin")
      (somevar (sqrt 5)))
  (format "%-30s%1.2f" username somevar))
John Kitchin                  2.24

That is still three lines of code, but it is ugly and hard to read like the old python code:

print('%-30s%1.2f' % (username, somevar))
John Kitchin                  2.24


My experience has shown that this gets harder to figure out as the strings get larger, and f-strings are easier to read.

The wonderful 's library provides some salvation for emacs-lisp, if you don't want the format fields. You can refer to variables in a lexical environment like this.

(let ((username "John Kitchin")
      (somevar (sqrt 5)))
  (s-lex-format "${username}${somevar}"))
John Kitchin2.23606797749979

Today, I decided to do something about this, and wrote this little macro. It is a variation on s-lex-format that introduces a slightly new syntax. You can now add an optional format field separated from the variable name by a space.

(defmacro f-string (fmt)
  "Like `s-format' but with format fields in it.
FMT is a string to be expanded against the current lexical
environment. It is like what is used in `s-lex-format', but has
an expanded syntax to allow format-strings. For example:
${user-full-name 20s} will be expanded to the current value of
the variable `user-full-name' in a field 20 characters wide.
  (let ((f (sqrt 5)))  (f-string \"${f 1.2f}\"))
  will render as: 2.24
This function is inspired by the f-strings in Python 3.6, which I
enjoy using a lot.
"
  (let* ((matches (s-match-strings-all"${\\(?3:\\(?1:[^} ]+\\) *\\(?2:[^}]*\\)\\)}" fmt))
         (agetter (cl-loop for (m0 m1 m2 m3) in matches
                        collect `(cons ,m3  (format (format "%%%s" (if (string= ,m2 "")
                                                                      (if s-lex-value-as-lisp "S" "s")
                                                                   ,m2))
                                                  (symbol-value (intern ,m1)))))))

    `(s-format ,fmt 'aget (list ,@agetter))))
f-string

Here it is in action.

(let ((username "John Kitchin")
      (somevar (sqrt 5)))
  (f-string "${username -30s}${somevar 1.2f}"))
John Kitchin                  2.24

It still lacks some of the capability of f-strings in python, e.g. in Python, arguments inside the template to be expanded get evaluated. The solution used above is too simple for that, since it just used a regexp and is limited to the value of variables in the lexical environment.

print(f'{5**0.5:1.3f}')
2.236


Nevertheless, this simple solution matches what I do most of the time anyway, so I still consider it an improvement!

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.13

Read and Post Comments

Making it easier to extend the export of org-mode links with generic functions

| categories: emacs, orgmode | tags: | View Comments

I am a big fan of org-mode links. Lately, I have had a need to modify how some links are exported, e.g. defining new exports for different backends, or fine-tuning a particular backend. This can be difficult, depending on how the link was set up. Here is a typical setup I am used to using, where the different options for the backends are handled in a conditional statement in a single function. I will just use a link that just serves to illustrate the issues here. These links are just sytactic sugar for markup, they don't do anything else. We start with an example that just converts text to italic text for different backends like html or latex.

(defun italic-link-export (path desc backend)
  (cond
   ((eq 'html backend)
    (format "<em>%s</em>" path))
   ((eq 'latex backend)
    (format "\\textit{%s}" path))
   ;; fall-through case for everything else
   (t
    path)))

(org-link-set-parameters "italic" :export 'italic-link-export)
:export italic-link-export
(org-export-string-as "italic:text" 'html t)
<p>
<em>text</em></p>

(org-export-string-as "italic:text" 'latex t)
\textit{text}

This falls through though to the default case.

(require 'ox-md)
(org-export-string-as "italic:text" 'md t)

# Table of Contents



text


The point I want to make here is that this is not easy to extend as a user. You have to either modify the italic-link-export function, advise it, or monkey-patch it. None of these are especially nice.

I could define italic-link-export in a way that it retrieves the function to use from an alist or hash-table using the backend, but then you have to do two things to modify the behavior: define a backend specific function and register it in the lookup variable. It is also possible to just look up a function by a derived symbol, e.g. using fboundp, and then using funcall to execute it. This looks something like this:

;; a user definable function for exporting to latex
(defun italic-link-export-latex (path desc backend)
  (format "\\textit{%s}" path))

;; generic export function that looks up functions or defaults to
(defun italic-link-exporter (path desc backend)
  "Run `italic-link-export-BACKEND' if it exists, or return path."
  (let ((func (intern-soft (format "italic-link-export-%s" backend))))
    (if (fboundp func)
        (funcall func path desc backend)
      path)))

This has some indirection, but allows you to just define new functions to add new export backends, or replace single backend exports. It isn't bad, but there is room for improvement.

In this comment in org-ref, I saw a new opportunity to address this issue using generic functions in elisp! The idea is to define a generic function that handles the general export case, and then define additional functions for each specific backend based on the signature of the export function. I will switch to bold markup for this.

(cl-defgeneric bold-link-export (path desc backend)
 "Generic function to export a bold link."
 path)

;; this one runs when the backend is equal to html
(cl-defmethod bold-link-export ((path t) (desc t) (backend (eql html)))
 (format "<b>%s</b>" path))

;; this one runs when the backend is equal to latex
(cl-defmethod bold-link-export ((path t) (desc t) (backend (eql latex)))
 (format "\\textit{%s}" path))

(org-link-set-parameters "bold" :export 'bold-link-export)
:export bold-link-export

Here it is in action:

(org-export-string-as "some bold:text" 'html t)
<p>
some <b>text</b></p>

(org-export-string-as "some bold:text" 'latex t)

This uses the generic function.

(require 'ox-md)
(org-export-string-as "some bold:text" 'md t)

# Table of Contents



some text


The syntax for defining the generic function is pretty similar to a regular function. The specific methods are a little different since they have to provide the specific "signature" that triggers each method. Here we only differentiate on the type of the backend. It is nice these are all separate functions though. It makes it trivial to add new ones, and less intrusive to replace in my opinion.

Generic functions have many other potential applications to replace functions that use lots of conditions to control flow, with a fall-through option at the end. You can learn more about them here: https://www.gnu.org/software/emacs/manual/html_node/elisp/Generic-Functions.html. There is a lot more to them than I have illustrated here.

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.13

Read and Post Comments

New publication in Nature Catalysis

| categories: news, publication | tags: | View Comments

Machine learning (ML) is impacting many fields, including catalysis. In this comment, I briefly discuss the major directions that ML is influencing the field of catalysis, along with some outlook on future directions. There were strict word and reference limits, so apologies in advance if I left out your work!

@article{kitchin-2018-machin-learn-catal,
  author =       {John R. Kitchin},
  title =        {Machine Learning in Catalysis},
  journal =      {Nature Catalysis},
  volume =       1,
  number =       4,
  pages =        {230-232},
  year =         2018,
  doi =          {10.1038/s41929-018-0056-y},
  url =          {https://doi.org/10.1038/s41929-018-0056-y},
  DATE_ADDED =   {Mon Apr 16 12:50:43 2018},
}

You can see a read-only version of the paper here: https://rdcu.be/LGrM

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.6

Read and Post Comments

Caching searches using biblio and only seeing new results

| categories: biblio, arxiv, elisp | tags: | View Comments

In this issue in scimax, Robert asked if it was possible to save searches, and then to repeat them every so often and only see the new results. This needs some persistent caching of the records, and a comparison of the current search results with the previous search results.

biblio provides a nice interface to searching a range of resources for bibliographic references. In this post, I will focus on arxiv. Out of the box, biblio does not seem to support this use case, but as you will see, it has many of the pieces required to achieve it. Let's start picking those pieces apart.

(require 'biblio)
biblio

Here is the first piece we need: a way to run a query, and get results back as a data structure. Here we just look at the first result.

(let* ((query "alloy segregration")
       (backend 'biblio-arxiv-backend)
       (cb (url-retrieve-synchronously (funcall backend 'url query)))
       (results (with-current-buffer cb
                  (funcall backend 'parse-buffer))))
  (car results))
((doi . "10.1103/PhysRevB.76.014112")
 (identifier . "0704.2752v2")
 (year . "2007")
 (title . "Modelling Thickness-Dependence of Ferroelectric Thin Film Properties")
 (authors nil nil nil nil nil nil nil nil nil nil nil nil nil "L. Palova" nil "P. Chandra" nil "K. M. Rabe" nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil)
 (container . "PRB 76, 014112 (2007)")
 (category . "cond-mat.mtrl-sci")
 (references "10.1103/PhysRevB.76.014112" "0704.2752v2")
 (type . "eprint")
 (url . "https://doi.org/10.1103/PhysRevB.76.014112")
 (direct-url . "http://arxiv.org/pdf/0704.2752v2"))

Next, we need a database to store the results in. I will just use a flat file database with a file for each record. The filename will be the md5 hash of the doi or the record itself. Why is that a good idea? Well, the doi is a constant, so if it exists the md5 will also be a constant. The doi itself is not a good filename in general, but the md5 is. The md5 of the record itself will be fragile to any changes, so if it has a doi, we should use it. If it doesn't and later gets one, we should see it again since that could mean it has been published. Also, if it changes because of some new version we might want to see it again. In any case, the existence of that file will be evidence we have seen that record before, and will indicate we need to remove it from the current view.

The flat file database is not super inspired. It is modeled a little after elfeed, but other solutions might work better for large sets of records, but this approach will work fine for this post.

Here is a function that returns nil if the record has been seen, and if not, saves the record and returns it.

(defvar db-dir "~/.arxiv-db/")

(unless (f-dir? db-dir) (make-directory db-dir t))

(defun unseen-record-p (record)
  "Given a RECORD return it if it is unseen.
Also, save the record so next time it will be marked seen. A
record is seen if we have seen the DOI or the record as a string
before."
  (let* ((doi (cdr (assoc 'doi record)))
         (contents (with-temp-buffer
                     (prin1 record (current-buffer))
                     (buffer-string)))
         (hash (md5 (or doi contents)))
         (fname (expand-file-name hash db-dir)))

    (if (f-exists? fname)
        nil
      (with-temp-file fname
        (insert contents))
      record)))
unseen-record-p

Now we can use that as a filter that saves records by side effect.

(defun scimax-arxiv (query)
  (interactive "Query: ")

  (let* ((backend 'biblio-arxiv-backend)
         (cb (url-retrieve-synchronously (funcall backend 'url query)))
         (results (-filter 'unseen-record-p (with-current-buffer cb
                                              (funcall backend 'parse-buffer))))
         (results-buffer (biblio--make-results-buffer (current-buffer) query backend)))
    (with-current-buffer results-buffer
      (biblio-insert-results results ""))
    (pop-to-buffer results-buffer)))

(scimax-arxiv "alloy segregation")
#<buffer *arXiv search*>

Now, when I run that once I see something like this:

and if I run it again:

(scimax-arxiv "alloy segregation")
#<buffer *arXiv search*>

Then the buffer is empty, since we have seen all the entries before.

Here are the files in our database:

ls ~/.arxiv-db/

Here are the contents of one of those files:

(with-temp-buffer
 (insert-file-contents "~/.arxiv-db/18085fe2512e15d66addc7dfb71f7cd2")
 (read (buffer-string)))
((doi) (identifier . 1101.3464v3) (year . 2011) (title . Characterizing Solute Segregation and Grain Boundary Energy in a Binary
  Alloy Phase Field Crystal Model) (authors nil nil nil nil nil nil nil nil nil nil nil nil nil Jonathan Stolle nil Nikolas Provatas nil nil nil nil nil nil nil nil nil nil nil) (container) (category . cond-mat.mtrl-sci) (references nil 1101.3464v3) (type . eprint) (url . http://arxiv.org/abs/1101.3464v3) (direct-url . http://arxiv.org/pdf/1101.3464v3))

So, if you need to read this in again later, no problem.

Now, what could go wrong? I don't know much about how the search results from arxiv are returned. For example, this query returns 10 hits.

(let* ((query "alloy segregration")
       (backend 'biblio-arxiv-backend)
       (cb (url-retrieve-synchronously (funcall backend 'url query)))
       (results (with-current-buffer cb
                  (funcall backend 'parse-buffer))))
  (length results))
10

There is just no way there are only 10 hits for this query. So, there must be a bunch more that you get by either changing the requested number in some argument, or by using subsequent queries to get the rest of them. I don't know if there are more advanced query options with biblio, e.g. to find entries newer than the last time it was run. On the advanced search page for arxiv, it looks like there is only a by year option.

This is still a good idea, and a lot of the pieces are here,

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.6

Read and Post Comments

Zhitao Guo receives the 2017-2018 James C. Meade Fellowship in Chemical Engineering

| categories: news | tags: | View Comments

The James C. Meade Fellowship was made possible by a generous donation by James Meade. This will help support Zhitao during his research this year. Zhitao is a first year PhD student who is co-advised by Andy Gellman and myself (John Kitchin), and is working on segregation in ternary alloy thin films.

Zhitao joined us from Tsinghua University in Beijing, China, where he studied chemical engineering and double majored in economics.

Congratulations Zhitao!

Copyright (C) 2018 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.1.6

Read and Post Comments

« Previous Page -- Next Page »