Overloading mathematical operators in elisp

| categories: elisp, emacs | tags: | View Comments

Table of Contents

In Python I am used to some simple idioms like this:

print([1, 2, 3] * 2)
print("ab" * 3)

[1, 2, 3, 1, 2, 3] ababab

There is even such fanciness as defining operators for objects, as long as they have the appropriate dunder methods defined:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __str__(self):
        return "Point ({}, {})".format(self.x, self.y)

    def __mul__(self, a):
        return Point(self.x * a, self.y * a)

    def __rmul__(self, a):
        return Point(self.x * a, self.y * a)
    
p = Point(1, 1)
print(p * 2)
print(3 * p)

Point (2, 2) Point (3, 3)

Out of the box, these things are not possible in elisp. Operators like * in elisp only take numbers or markers. We have a few options to change this. The worst option is to simply redefine these functions. That is bad because it is not reversible. We could define new functions that have the behavior we want, but then we lose the semantic meaning of "*" that we were aiming for. A better option is to advise these functions. This is reversible, because you can later unadvise them. Today we look at some strategies to do this.

We will use "around" advise because it will let us bypass the original intent of the function when we want to, or use it when we do. First, we create a function that will be the advice and add it to the * function. This first draft won't actually change the behavior of *; if all the args are numbers or markers it will simply use the original function as before.

(require 'dash)

(defun *--*-around (orig-fun &rest args)
  "if every arg is a number do *, else do something else."
  (cond
   ((-every? (lambda (x) (or (numberp x) (markerp x))) args)
    (apply orig-fun args))))

(advice-add '* :around #'*--*-around)

Let's just confirm

(* 1 2 3)
6

Now, we can start modifying our function to handle some other cases. Let's do the list and string first. The * function is variadic, but in these cases it makes sense to limit to two arguments. We need two cases for each type since we can write (* 2 list) or (* list 2). We also should create a fall-through case that raises an error to alert us we can't multiply things.

(defun *--*-around (orig-fun &rest args)
  "if every arg is a number do *, else do something else."
  (cond
   ;; The original behavior
   ((-every? (lambda (x) (or (numberp x) (markerp x))) args)
    (apply orig-fun args))

   ;; create repeated copies of list
   ((and (listp (first args))
         (integerp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (second args) append (copy-list (first args))))

   ((and (integerp (first args))
         (listp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (first args) append (copy-list (second args))))

   ;; Make repeated string
   ((and (stringp (first args))
         (integerp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (second args) concat (first args)))

   ((and (integerp (first args))
         (stringp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (first args) concat (second args)))

   (t
    (error "You cannot * %s" args))))
*--*-around

Here is the new advice in action.

(list
 (* '(a b) 2)
 (* 2 '(c d))
 (* 2 "ab")
 (* "cd" 2))
(a b a b) (c d c d) abab cdcd

That captures the spirit of overloading * for lists and strings. What about that object example? We have to make some assumptions here. Python looks for an uses a dunder mul method. We will assume a double dash method (–mul–) in a similar spirit. We have to modify the advice one final time. We just add a condition to check if one of the arguments is an eieio-object, and then call the –mul– function on the arguments.

(defun *--*-around (orig-fun &rest args)
  "if every arg is a number do *, else do something else."
  (cond
   ;; The original behavior
   ((-every? (lambda (x) (or (numberp x) (markerp x))) args)
    (apply orig-fun args))

   ;; create repeated copies of list
   ((and (listp (first args))
         (integerp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (second args) append (copy-list (first args))))

   ((and (integerp (first args))
         (listp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (first args) append (copy-list (second args))))

   ;; Make repeated string
   ((and (stringp (first args))
         (integerp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (second args) concat (first args)))

   ((and (integerp (first args))
         (stringp (second args))
         (= 2 (length args)))
    (loop for i from 0 below (first args) concat (second args)))

   ;; Handle object
   ((or (and (eieio-object-p (first args))
             (numberp (second args)))
        (and (numberp (first args))
             (eieio-object-p (second args))))
    (apply '--mul-- args))

   (t
    (error "You cannot * %s" args))))
*--*-around

Now, we can define a class and the –mul– function and show that our overloaded * function works. Note we can define two signatures of –mul– so it is not necessary to define an –rmul– in this case as it was with Python (although we still create two functions in the end).

(require 'eieio)

(defclass Point ()
  ((x :initarg :x)
   (y :initarg :y)))

(cl-defmethod --mul-- ((p Point) a)
  (Point :x (* (oref p :x) a) :y (* (oref p :y) a)))

(cl-defmethod --mul-- (a (p Point))
  (Point :x (* (oref p :x) a) :y (* (oref p :y) a)))

(cl-defmethod --str-- ((p Point))
  (format "Point (%s, %s)" (oref p :x) (oref p :y)))

(let ((P (Point :x 1 :y 1)))
  (list
   (--str-- (* P 2))
   (--str-- (* 3 P))))
Point (2, 2) Point (3, 3)

That is pretty awesome. Before going on, here is how you remove the advice:

(advice-remove '* '*--*-around)

This example has been pretty instructive. You have to handle overloading for all the intrinsic types. We did lists and strings here; you might also consider vectors. For objects, it looks like we can at least try using a generic method like –mul–. One detail I neglected to consider here is that * is natively variadic. For these special cases, we did not implement variadic versions. This isn't a feature of Python which uses infix notation, so every call is with two arguments. In some cases it might make sense to support variadic args, but that seems like a generally challenging thing to do. While (* "a" 2 3) might be expected to create a string of "aaaaaa", (* "a" 2 '(3)) doesn't make sense at all.

It would be straightforward to extend this to other operators like '+ to concatenate strings, lists and vectors, or '- to remove chars or elements, including extensions to objects using double-dash functions like –add–, –subtract–, etc. Another nice idea might be to advise print to use –str– on objects.

On the surface this looks useful so far. Python defines a lot of dunder methods that cover all kinds of scenarios including logical comparisons, bit shifting, mod, incrementing operators, casting, comparisons, right/left operations, indexing and assignment, length and others. That would be a lot of advices. This approach is moderately tedious to expand though; you have to keep adding conditional cases.

An alternative to the big conditional statement used in the advice might be the use of a generic function. With this approach we define a generic function that just does multiplication by default. Then we define specific cases with specific signatures that are used for lists, strings, objects, etc. That is basically all our conditional above was doing, matching signatures and executing a chunk of code accordingly.

Here is our default case that does the original behavior. We still use advice to apply the function.

(cl-defgeneric generic-multiply (orig-fun &rest args)
  "Generic multiply for when no specific case exists."
  (apply orig-fun args))

(defun *--*-around-generic (orig-fun &rest args)
  (apply 'generic-multiply orig-fun args))

(advice-add '* :around #'*--*-around-generic)

That should just work as usual for regular multiplication.

(* 1 2 3 4)
24

Sure enough it does. Now, we can define a specific method for a string. We need a specialized method for each signature, e.g. pre and post multiplication.

(cl-defmethod generic-multiply ((orig-fun subr) (s string) (n integer))
  (loop for i from 0 below n concat s))

(cl-defmethod generic-multiply ((orig-fun subr) (n integer) (s string))
  (loop for i from 0 below n concat s))

(list
 (* "Ac" 2)
 (* 2 "Ad"))
AcAc AdAd

That works fine, and we did not have to modify our original advice function at all! Next the list:

(cl-defmethod generic-multiply ((orig-fun subr) (L list) (n integer))
  (loop for i from 0 below n append (copy-list L)))

(cl-defmethod generic-multiply ((orig-fun subr) (n integer) (L list))
  (loop for i from 0 below n append (copy-list L)))

(list (* '(1 2) 2)
      (* 2 '(3 4)))
1 2 1 2
3 4 3 4

That also works fine. Last, our class example. This should work on all objects I think (unless there is some way to make classes that do not inherit the default superclass).

(cl-defmethod generic-multiply ((orig-fun subr) (n integer) (obj eieio-default-superclass))
  (--mul-- n obj))

(cl-defmethod generic-multiply ((orig-fun subr) (obj eieio-default-superclass) (n integer))
  (--mul-- n obj))

(let ((P (Point :x 1 :y 1)))
  (list
   (--str-- (* P 2))
   (--str-- (* 3 P))))
Point (2, 2) Point (3, 3)

This is a much better approach to extending the multiplication operator! If I continue this path in the future I would probably take this one. This could be useful to make elisp more like some more popular contemporary languages like Python, as well as to add linear algebra like notation or mathematical operations on objects in elisp. It kind of feels like these operations ought to be generic functions to start with to make this kind of overloading easier from the beginning. Functions like "*" are currently defined in the C source code though, maybe for performance reasons. It is not obvious what the consequences of making them generic might be.

1 Addendum

Christopher Wellons pointed out an important limitation of advice: they don't work on byte-compiled functions. Let's see what he means. Here is a simple function that will just multiply a Point object by an integer:

(defun to-be-bytten (p1 n)
  (* p1 n))
to-be-bytten

Here it is in action, and here it works fine.

(to-be-bytten (Point :x 1 :y 1) 2)
[eieio-class-tag--Point 2 2]

Now, let's byte-compile that function and try it again:

(byte-compile 'to-be-bytten)

(condition-case err
    (to-be-bytten (Point :x 1 :y 1) 2)
  ((error r)
   (message "Doh! Christopher was right. It did not work...\n%s" err)))
Doh! Christopher was right. It did not work...
(wrong-type-argument number-or-marker-p [eieio-class-tag--Point 1 1])

So the advice is pretty limited since most of the functions in Emacs core are likely to be byte-compiled, and it might mean you have to redefine * completely, or define some new function that looks like it. Too bad, the advice was pretty easy!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.7

Read and Post Comments

Linear algebra in Emacs using MKL and dynamic modules

| categories: dynamic-module, emacs | tags: | View Comments

In a previous post I integrated some linear algebra into Emacs using the GNU Scientific library and a dynamic module. In this post, I use a similar approach that uses the Intel MKL library in conjunction with some helper elisp functions to mimic the array broadcasting features in Numpy. I thought this might be easier and lead to at least a complementary set of functionalities.

Note: I had to follow the directions here to disable some security feature on my Mac so that it would use the MKL libraries. Thanks Apple.

It is convenient to use vectors for the representation of arrays in Emacs because there are nice functions in the emacs-module.h for accessing vector properties. Also vectors sound closer to an array than a list. So what about array broadcasting, e.g. the way numpy lets you multiply a 2d array with a 1d array? If you multiply two arrays with size (m1, n1) * (m2, n2), it is required that the number of columns in the first array (n1) be equal to the number of rows in the second one (m2), and the resulting size of the array product will be (m1, n2). What should happen though when we have 1d array? This is neither a row or column vector itself, but we can treat as either one if we choose too. For example the vector [1 2 3] can be thought of as an array with the shape (1, 3), e.g. a single row with three columns, or (3, 1), i.e. three rows in a single column. We will build this capability into the module for convenience.

I still find it moderately tedious to write c functions that take emacs arguments, transform them to c arguments, do some c computations, and convert the results back to emacs values. So, we only implement one c function for this that multiplies two 2d arrays together using the cblas_dgemm routine in the MKL library. Then, we will create a complementary elisp library that will provide some additional functionality to get the shapes of vector arrays, dimensions, and allow us to multiply 1d and 2d vectors together the same way Numpy does array broadcasting.

The dynamic module code is listed in The module code. The elisp code is listed in Elisp helper functions. In the following sections we just demonstrate how to use the results.

1 Convenience functions to get array properties

I found it convenient to do array shape and dimension analysis in elisp before sending arrays to the dynamic module. The shape of an array is just the number of elements in each dimension. Here we look at a 2× 3 array.

(vector-shape [[1 2 3]
               [3 4 5]])
[2 3]

You see it returns a vector showing two rows and three columns. There are two convenience commands to get the number of rows (vector-nrows) and columns (vector-ncols). Here is one of them.

(vector-ncols [[1 2 3]
               [3 4 5]])
3

2 Matrix multiplication

The main problem we want to calculate is the matrix multiplication \(A\cdotB\) where \(A\) and \(B\) are either 1d vectors or 2d arrays. Here we examine several representative cases of matrix multiplication.

2.1 1d * 1d

This is a simple dot-product that is actually calculated in elisp.

\([1 1 1] \cdot [2 2 2] = 6\)

(matrix-multiply [1 1 1] [2 2 2])
6.0

Note we get a float. That is because we initialize the sum with 0.0 to be consistent with all the other cases which are done with floats. dgemm is a double routine in MKL, which means it should return floats. Internally in the module, we cast all numbers as doubles for the multiplication.

2.2 2d * 1d

This is a matrix multiplication that is typically like \(A b\) where \(b\) is a column vector. We return a 1d array as a result, rather than a 2d array of nrows and 1 column.

\[ \left[\begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array}\right] \left [ \begin{array}{c} 1 \\ 1 \end{array}\right] = \left[\begin{array}{c}3\\7\end{array}\right]\]

(let ((A [[1 2]
          [3 4]])
      (b [1 1]))
  (matrix-multiply  A b))
[3.0 7.0]

2.3 1d * 2d

This case is \(b A\) where \(b\) is a row vector. For example:

\[\left[\begin{array}{cc}1 & 1\end{array}\right] \left[\begin{array}{cc} 1 & 2\\ 3 & 4\end{array}\right] = \left[\begin{array}{cc} 4 & 6 \end{array}\right ]\]

(matrix-multiply [1 1]
                 [[1 2]
                  [3 4]])
[4.0 6.0]

As with the previous case, we return a 1d vector result rather than a 2d array with 1 row and ncolumns.

2.4 2d * 2d

Finally we have the case of \(A B\). The number of columns in A must be the same as the number of rows in B, and the result has a size that is the number of rows in A and the number of columns in B. Here is one example:

\[\left[\begin{array}{cc} 0 & 1\\ 0 & 0\end{array}\right] \left[\begin{array}{cc} 0 & 0\\ 1 & 0\end{array}\right] = \left[\begin{array}{cc} 1 & 0\\ 0 & 0\end{array}\right] \]

(matrix-multiply [[0 1]
                  [0 0]]
                 [[0 0]
                  [1 0]])
[[1.0 0.0] [0.0 0.0]]

This example is adapted from here. The correct answer is at the bottom of that page, and shown here.

\[\left[\begin{array}{cccc} 1 & 2 & -2 & 0 \\ -3 & 4 & 7 & 2 \\ 6 & 0 & 3 & 1\end{array}\right] \left[\begin{array}{cc} -1 & 3 \\ 0 & 9 \\ 1 & -11 \\ 4 & -5 \end{array}\right] = \left[\begin{array}{cc} -3 & 43 \\ 18 & -60 \\ 4 & -5\end{array}\right] \]

For readability we use temporary variables here, and pretty-print the result.

(let ((A [[1 2 -2 0]
          [-3 4 7 2]
          [6 0 3 1]])
      (B [[-1 3]
          [0  9]
          [1 -11]
          [4 -5]]))
  (pp (matrix-multiply A B)))
[[-3.0 43.0]
 [18.0 -60.0]
 [1.0 -20.0]]

So, all these example work as we expect. The elisp function for matrix-multiply does a lot of work for you to make these cases work, including error checking for dimensional consistency.

3 Summary thoughts

It was not any easier to write this dynamic module than the previous one I used with the Gnu Scientific Library. The approach and code are remarkably similar. In one way the GSL was easier to use; it worked out of the box, whereas I had to fiddle with a security option in my OS to get it to run MKL! My Anaconda Python distribution must get around that somehow since it ships with an MKL compiled Numpy and scipy.

The idea of using elisp for analysis of the inputs and making sure they are correct is a good one and helps prevent segfaults. Of course it is a good idea to write defensive c-code to avoid that too. Overall, this is another good example of expanding the capabilities of Emacs with a dynamic module.

4 The module code

The c-code is loosely adapted from https://software.intel.com/en-us/node/529735. We do not implement the full dgemm behavior which is able to calculate \(C = \alpha A * B + \beta*C\). We set α=1, and β=0 in this example. We should do some dimension checking here, but it is easier to do it in emacs in a helper function. As long as you use the helper function there should not be an issue, but it is possible to segfault Emacs if you use the module function incorrectly.

#include "emacs-module.h"
#include "emacs-module-helpers.h"
#include <mkl.h>

int plugin_is_GPL_compatible;

emacs_value Fmkl_dgemm (emacs_env *env, ptrdiff_t nargs, emacs_value args[], void *data)
{
  double *A, *B, *C;
  int m, n, k, i, j;
  double alpha = 1.0;
  double beta = 0.0;
  
  // These will be 2d vectors
  emacs_value M0 = args[0]; // array 1 - A (m x k)
  emacs_value M1 = args[1]; // array 2 - B (k x n)

  // I need to get the number of rows and columns of each one.
  m = env->vec_size(env, M0);
  k  = 0;
  // We assume that we have a 2d array.
  emacs_value el1 = env->vec_get (env, M0, 0);
  k = env->vec_size(env, el1);
  
  // Now we know A has dimensions (m, k)
 
  emacs_value el2 = env->vec_get (env, M1, 0);
  n = env->vec_size(env, el2);
  
  // Now we know M1 had dimensions (k, n)
  
  // Now we have to build up arrays.
  // We are looking at a * b = c
  A = (double *)mkl_malloc( m*k*sizeof( double ), 64 );
  B = (double *)mkl_malloc( k*n*sizeof( double ), 64 );
  C = (double *)mkl_malloc( m*n*sizeof( double ), 64 );
  if (A == NULL || B == NULL || C == NULL) {
    printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n");
    mkl_free(A);
    mkl_free(B);
    mkl_free(C);
    return 1;
  }

  //populate A
  emacs_value row, ij;
  for (int i = 0; i < m; i++)
    {
      row = env->vec_get(env, M0, i);
      for (int j = 0; j < k; j++)
        {
          // get M0[i, j]
          ij = env->vec_get(env, row, j);
          A[k * i + j] = extract_double(env, ij);
        }
    }

  // populate B
  for (int i = 0; i < k; i++)
    {
      row = env->vec_get(env, M1, i);
      for (int j = 0; j < n; j++)
        {         
          // get M0[i, j]
          ij = env->vec_get(env, row, j);
          B[n * i + j] = extract_double(env, ij);
        }
    }

  // initialize C.  The solution will have dimensions of (rows1, cols2).
  for (int i = 0; i < m; i++)
    {
      for (int j = 0; j < n; j++)
        {
          C[n * i + j] = 0.0;
        }
    }

  // the multiplication is done here.
  cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 
                m, n, k, alpha, A, k, B, n, beta, C, n);

  // now we build up the vector to return
  emacs_value vector = env->intern(env, "vector");
  emacs_value *array = malloc(sizeof(emacs_value) * m);
  emacs_value *row1;
  emacs_value vec;
  for (int i = 0; i < m; i++)
    {
      row1 = malloc(sizeof(emacs_value) * n);
      for (int j = 0; j < n; j++)
        {
          row1[j] = env->make_float(env, C[j + i * n]);
        }
      vec = env->funcall(env, vector, n, row1);
      array[i] = vec;
      free(row1);
    }

  emacs_value result = env->funcall(env, vector, m, array);
  free(array);
  return result;
}


int emacs_module_init(struct emacs_runtime *ert)
{
  emacs_env *env = ert->get_environment(ert);
  
  DEFUN("mkl-dgemm", Fmkl_dgemm, 2, 2,
        "(mkl-dgemm A B)\n"\
        "Multiply the matrices A and B. A and B must both be 2d vectors.\n" \
        "Returns the product as a vector.",
        NULL);
  provide(env, "mkl");
  
  return 0;
}

To build this we have to run org-babel-tangle to generate the mkl.c file, and then run this shell block to compile it.

sh /opt/intel/mkl/bin/mklvars.sh intel64
gcc -Wall -m64 -I${MKLROOT}/include -fPIC -c mkl.c
gcc -shared -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -lmkl_rt -lpthread -lm -ldl -L. -lemacs-module-helpers -o mkl.so mkl.o

5 Elisp helper functions

We will often want to know the shape of our arrays. The shape is how many elements there are in each dimension. Here we define a recursive function that gets the shape of arbitrarily nested vectors and returns a vector of the shape. We define some helper functions to get the number of dimensions, elements, rows and columns.

The main function is a helper elisp function that multiplies two arrays. The function analyzes the shapes and transforms 1d vectors into the right 2d shape to multiply them together, and then returns the shape that makes sense. The c-code is not very robust to mistakes in the array dimensions. It tends to make emacs segfault if you get it wrong. So we try to avoid that if possible.

We have four cases to consider for multiplication:

2d * 2d
(assert (= m1 n2)) return (n1, m2)
1d * 2d
1d is a row vector (1, n1) (assert (= n1 m2)) return vector with n2 elements
2d * 1d
1d is a column vector (m2, 1) (assert (= n1 m2)) return vector with m2 elements
1d * 1d
(assert (= (length m1) (length m2)) return a scalar

Here is the

(add-to-list 'load-path (expand-file-name "."))
(require 'mkl)
(require 'cl)
(require 'seq)

(defun vector-shape (vec)
  "Return a vector of the shape of VEC."
  (let ((shape (vector (length vec))))
    (if (vectorp (aref vec 0))
        (vconcat shape (vector-shape (aref vec 0)))
      shape)))

(defun vector-ndims (vec)
  "Returns the number of dimensions in VEC."
  (length (vector-shape vec)))


(defun vector-numel (vec)
  "Returns the number of elements in VEC."
  (if (> (length vec) 0)
      (seq-reduce '* (vector-shape vec) 1)
    0))


(defun vector-nrows (vec)
 "Return the number of rows in VEC."
 (aref (vector-shape vec) 0))


(defun vector-ncols (vec)
 "Return the number of columns in VEC."
 (aref (vector-shape vec) 1))


(defun matrix-multiply (A B)
  "Return A * B in the matrix multiply sense."
  (cond
   ;; 1d * 1d i.e. a dot-product
   ((and (= 1 (vector-ndims A))
         (= 1 (vector-ndims B))
         (= (length A) (length B)))
    ;; this is easy to compute so we don't use dgemm.
    (seq-reduce '+ (mapcar* (lambda (a b) (* a b)) A B) 0.0))

   ;; 2d * 1d (m1, n1) * (n2, 1)
   ((and (= 2 (vector-ndims A))
         (= 1 (vector-ndims B))
         ;; ncols-A = len-B
         (= (vector-ncols A) (length B)))
    ;; transform B into a 2d column vector
    (let* ((B2d (apply 'vector (mapcar 'vector B)))
           (result  (mkl-dgemm A B2d)))
      ;; Now call (dgemm A B2d) -> (m2, 1) column vector
      ;; and convert it back to a 1d result
      (cl-map 'vector (lambda (v) (aref v 0)) result)))

   ;; 1d * 2d (1, n1) * (m2, n2) len-A = nrows-B
   ((and (= 1 (vector-ndims A))
         (= 2 (vector-ndims B))
         (= (length A) (vector-nrows B)))
    ;; transform B into a 2d row vector
    (let* ((A2d (vector A))
           (result  (mkl-dgemm A2d B)))
      ;; should be a 2d row vector
      (aref result 0)))

   ;; 2d * 2d (m1, n1) * (m2, n2) rows-A = ncols-B
   ((and (= 2 (vector-ndims A))
         (= 2 (vector-ndims B))
         (= (vector-ncols A)
            (vector-nrows B)))
    ;; call (dgemm A B) and return result
    (mkl-dgemm A B))
   (t
    ;; Error checking, getting here means none of the cases above were caught.
    ;; something is probably wrong.
    (cond
     ((or (> (vector-ndims A) 2)
          (> (vector-ndims B) 2))
      (error "One of your arrays has more than 2 dimensions. Only 1 or 2d arrays are supported"))
     ((and (= 1 (vector-ndims A))
           (= 1 (vector-ndims B))
           (not (= (length A) (length B))))
      (error "A and B must be the same length.
len(A) = %d
len(B) = %d" (length A) (length B)))
     ((and
       (= (vector-ndims A) 2)
       (= (vector-ndims B) 2)
       (not (= (vector-nrows A) (vector-ncols B))))
      (error "Your array shapes are not correct.
The number of rows in array A must equal the number of columns in array B.
There are %d rows in A and %d columns in B" (vector-nrows A) (vector-ncols B)))
     ((and
       (= (vector-ndims A) 2)
       (= (vector-ndims B) 1)
       (not (= (vector-nrows A) (length B))))
      (error "Your array shapes are not correct.
The number of rows in array A must equal the number of columns in array B.
There are %d rows in A and %d columns in B" (vector-nrows A) (length B)))
     (t
      (error "Unknown error"))))))
matrix-multiply

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.7

Read and Post Comments

An Emacs zeromq library using an ffi

| categories: ffi, dynamic-module, zeromq, emacs | tags: | View Comments

An alternative approach to writing your own dynamic module (which requires some proficiency in c) is to use a foreign function interface (ffi). There is one for emacs at https://github.com/tromey/emacs-ffi, and it is also a dynamic module itself that uses libffi. This lets you use elisp to create functions in Emacs that actually call functions in some other library installed on your system. Here, we use this module to recreate our zeromq bindings that I previously posted.

The emacs-ffi module works fine as it is, but I found it useful to redefine one of the main macros (define-ffi-function) with the following goals in mind:

  1. Allow me to specify the argument names and docstrings for each arg that contain its type and a description of the arg.
  2. Document what each function returns (type and description).
  3. Combine those two things into an overall docstring on the function.

These are important to me because it allows Emacs to work at its fullest potential while writing elisp code, including having the right function signatures in eldoc, and easy access to documentation of each function. You can see the new definition here. For example, here is a docstring for zmq-send using that new macro:

zmq-send is a Lisp function.

(zmq-send *SOCKET *MSG LEN FLAGS)

For more information check the manuals.

send a message part on a socket.
http://api.zeromq.org/4-2:zmq-send

*SOCKET (:pointer) Pointer to a socket.
*MSG (:pointer) Pointer to a C-string to send
LEN (:size_t) Number of bytes to send
FLAGS (:int) 

Returns: Number of bytes sent or -1 on failure. (:int)

That has everything you need to know

(define-ffi-function zmq-send-ori "zmq_send" :int (:pointer :pointer :size_t :int) zmq)
zmq-send-ori

Compare that to this docstring from the original macro.

zmq-send-ori is a Lisp function.

(zmq-send-ori G251 G252 G253 G254)

For more information check the manuals.

You can see the zeromq function definitions in elisp here. Here is a list of the functions we have created:

Type RET on a type label to view its full documentation.

zmq
  Function: Returns a pointer to the libzmq library.
zmq-close
  Function: close ØMQ socket.
zmq-connect
  Function: create outgoing connection from socket.
zmq-ctx-destroy
  Function: terminate a ØMQ context.
zmq-ctx-new
  Function: create new ØMQ context.
zmq-recv
  Function: receive a message part from a socket.
zmq-send
  Function: send a message part on a socket.
zmq-socket
  Function: create ØMQ socket.

Now we can use these to create the client, this time in elisp. Just as in the last post, you need to run the hwserver in a terminal for this to work. Here is the client code.

(let* ((context (zmq-ctx-new))
       (socket (zmq-socket context ZMQ-REQ)))

  (with-ffi-string (endpoint "tcp://localhost:5555")
    (zmq-connect socket endpoint))

  (with-ffi-string (msg "Hi there")
    (zmq-send socket msg 5 0))

  (with-ffi-string (recv (make-string 10 ""))
    (let ((status -1))
      (cl-loop do (setq status (zmq-recv socket recv 10 0)) until (not (= -1 status)))) 
    (print (ffi-get-c-string recv)))

  (zmq-close socket)
  (zmq-ctx-destroy context))

"World     "

This client basically performs the same as the previously one we built. You can see we are mixing some programming styles here. For example, we have to create pointers to string variables in advance that the ffi will be writing to later like we would do in c. We use the with-ffi-string macro which frees the pointer when we are done with it. It basically just avoids me having to create, use, and destroy the pointers myself. So, there it is, a working elisp zeromq client!

1 Summary thoughts

For this example, I feel like the ffi approach here (with my modified function making macro) was much easier than what I previously did with a compiled c-library (although it benefited a lot from my recent work on the c-library). I really like working in elisp, which is a much greater strength of mine than programming in c. It is pretty clear, however, that you have to know how c works to use this, otherwise it isn't so obvious that some functions will return a status, and do something by side effect, e.g. put results in one of the arguments. The signatures of the ffi functions are basically limited by the signatures of the c-functions. If you want to change the signature in Emacs, you have to write wrapper functions to do that.

The macro I used here to create the functions creates really good (the kind I like anyway) docstrings when you use it fully. That isn't a totally new idea, I tried it out here before. In contrast, the original version not only didn't have a docstring, but every arg had a gensym (i.e. practically random) name! I think it would be very difficult to get the same level of documentation when writing c-code to make a module. In the c-code, there is a decoupling of the definition of a c-function (which always has the same signature) that gets data from the Emacs env, e.g. arguments, does stuff with them, and creates data to put back into the env, and the emacs_module_init function where you declare these functions to Emacs and tell it what to call the function in emacs, about how many arguments it takes, etc… The benefit of this is you define what the Emacs signature will look like, and then write the c-function that does the required work. The downside of this is the c-function and Emacs declarations are often far apart in the editor, and there is no easy way to auto-generate docstrings like I can with lisp macros. You would have to manually build them up yourself, and keep them synchronized. Also, I still have not figured out how to get emacs to show the right signature for c-generated functions.

The ffi approach still uses a dynamic module approach, so it still requires a modern Emacs with the module compiled and working. It still requires (in this case) the zeromq library to be installed on the system too. Once you have those, however, the elisp zeromq bindings by this approach is done completely in elisp!

It will be interesting in the coming weeks to see how this approach works with the GNU Scientific Library, particularly with arrays. Preliminary work shows that while the elisp ffi code is much shorter and easier to write than the corresponding c-code for some examples (e.g. a simple mathematical function), it is not as fast. So if performance is crucial, it may still pay off to write the c-code.

2 Modified ffi-define-function macro

Here are two macros I modified to add docstrings and named arguments too.

(defmacro define-ffi-library (symbol name)
  "Create a pointer named to the c library."
  (let ((library (cl-gensym))
        (docstring (format "Returns a pointer to the %s library." name)))
    (set library nil)
    `(defun ,symbol ()
       ,docstring
       (or ,library
           (setq ,library (ffi--dlopen ,name))))))

(defmacro define-ffi-function (name c-name return args library &optional docstring)
  "Create an Emacs function from a c-function.
NAME is a symbol for  the emacs function to create.
C-NAME is a string of the c-function to use.
RETURN is a type-keyword or (type-keyword docstring)
ARGS is a list of type-keyword or (type-keyword name &optional arg-docstring)
LIBRARY is a symbol usually defined by `define-ffi-library'
DOCSTRING is a string for the function to be created.

An overall docstring is created for the function from the arg and return docstrings.
"
  ;; Turn variable references into actual types; while keeping
  ;; keywords the same.
  (let* ((return-type (if (keywordp return)
                          return
                        (car return)))
         (return-docstring (format "Returns: %s (%s)"
                                   (if (listp return)
                                       (second return)
                                     "")
                                   return-type))
         (arg-types (vconcat (mapcar (lambda (arg)
                                       (if (keywordp arg)
                                           (symbol-value arg)
                                         ;; assume list (type-keyword name &optional doc)
                                         (symbol-value (car arg))))
                                     args)))
         (arg-names (mapcar (lambda (arg)
                              (if (keywordp arg)
                                  (cl-gensym)
                                ;; assume list (type-keyword name &optional doc)
                                (second arg)))
                            args))
         (arg-docstrings (mapcar (lambda (arg)
                                   (cond
                                    ((keywordp arg)
                                     "")
                                    ((and (listp arg) (= 3 (length arg)))
                                     (third arg))
                                    (t "")))
                                 args))
         ;; Combine all the arg docstrings into one string
         (arg-docstring (mapconcat 'identity
                                   (mapcar* (lambda (name type arg-doc)
                                              (format "%s (%s) %s"
                                                      (upcase (symbol-name name))
                                                      type
                                                      arg-doc))
                                            arg-names arg-types arg-docstrings)
                                   "\n"))
         (function (cl-gensym))
         (cif (ffi--prep-cif (symbol-value return-type) arg-types)))
    (set function nil)
    `(defun ,name (,@arg-names)
       ,(concat docstring "\n\n" arg-docstring "\n\n" return-docstring)
       (unless ,function
         (setq ,function (ffi--dlsym ,c-name (,library))))
       ;; FIXME do we even need a separate prep?
       (ffi--call ,cif ,function ,@arg-names))))
define-ffi-function

3 The zeromq bindings

These define the ffi functions we use in this post. I use a convention that pointer args start with a * so they look more like the c arguments. I also replace all _ with - so it looks more lispy, and the function names are easier to type.

(add-to-list 'load-path (expand-file-name "."))
(require 'ffi)

(define-ffi-library zmq "libzmq")


(define-ffi-function zmq-ctx-new "zmq_ctx_new"
  (:pointer "Pointer to a context")
  nil zmq
  "create new ØMQ context.
http://api.zeromq.org/4-2:zmq-ctx-new")


(define-ffi-function zmq-ctx-destroy "zmq_ctx_destroy"
  (:int "status")
  ((:pointer *context)) zmq
  "terminate a ØMQ context.
http://api.zeromq.org/4-2:zmq-ctx-destroy")


(define-ffi-function zmq-socket "zmq_socket"
  (:pointer "Pointer to a socket.")
  ((:pointer *context "Created by `zmq-ctx-new '.") (:int type)) zmq
  "create ØMQ socket.
http://api.zeromq.org/4-2:zmq-socket")


(define-ffi-function zmq-close "zmq_close"
  (:int "Status")
  ((:pointer *socket "Socket pointer created by `zmq-socket'")) zmq
  "close ØMQ socket.
http://api.zeromq.org/4-2:zmq-close")


(define-ffi-function zmq-connect "zmq_connect" 
  (:int "Status")
  ((:pointer *socket "Socket pointer created by `zmq-socket'")
   (:pointer *endpoint "Char pointer, e.g. (ffi-make-c-string \"tcp://localhost:5555\")"))
  zmq
  "create outgoing connection from socket.
http://api.zeromq.org/4-2:zmq-connect")


(define-ffi-function zmq-send "zmq_send"
  (:int "Number of bytes sent or -1 on failure.")
  ((:pointer *socket "Pointer to a socket.")
   (:pointer *msg "Pointer to a C-string to send")
   (:size_t len "Number of bytes to send")
   (:int flags)) 
  zmq
   "send a message part on a socket.
http://api.zeromq.org/4-2:zmq-send")


(define-ffi-function zmq-recv "zmq_recv"
  (:int "Number of bytes received or -1 on failure.")
  ((:pointer *socket)
   (:pointer *buf "Pointer to c-string to put result in.")
   (:size_t len "Length to truncate message at.")
   (:int flags)) 
  zmq
   "receive a message part from a socket.
http://api.zeromq.org/4-2:zmq-recv")


;; We cannot get these through a ffi because the are #define'd for the CPP and
;; invisible in the library. They only exist in the zmq.h file.
(defconst ZMQ-REQ 3
  "A socket of type ZMQ_REQ is used by a client to send requests
  to and receive replies from a service. This socket type allows
  only an alternating sequence of zmq_send(request) and
  subsequent zmq_recv(reply) calls. Each request sent is
  round-robined among all services, and each reply received is
  matched with the last issued request.")

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.7

Read and Post Comments

Zeromq bindings for Emacs via dynamic modules

| categories: dynamic-module, emacs | tags: | View Comments

I do a lot of scientific programming, and it is one of the reasons I have been learning to extend Emacs with dynamic modules. They have allowed me to add physical constants, numerical integration, root finding and linear algebra from established c-libraries to Emacs. Today I am taking a break from that and finally getting to another one of the reasons I started playing around with dynamic modules: zeromq. Zeromq is a messaging library that Jupyter uses to communicate with kernels. I thought we might get a smoother integration with Emacs and Jupyter if we could use zeromq directly to communicate between org-mode and the kernel. Currently we have to run a web server that does the communication for us via http requests. We won't solve the Jupyter problem today, but we will look at communicating with a Zeromq server from Emacs.

This might have lots of other useful applications. Suppose Emacs could communicate directly with other zeromq servers to retrieve data from, perhaps scientific data. It might even be possible for Emacs to run its own zeromq server, and other instances of Emacs could then communicate with it. Collaborative editing anyone?

Here we just implement the "Hello world" client example in the zeromq guide. The code for the server, a c-client, the mod-zmq library, a makefile and tests can be found at https://github.com/jkitchin/emacs-modules/tree/master/zeromq. All the server does is receive a string, and then send a response (in this case just the string "World") back to the client.

To run this, make sure to run the hwserver executable in a terminal. I wrapped the zeromq commands required to implement the client into a dynamic module. Since this example focuses on strings, the module returns strings to Emacs. I am not sure if that is always the right thing to do, as zeromq more generically uses bytes, but I will have to wait until I know more about zeromq to know if this is an issue.

This dynamic module uses a new feature that none of the previous posts used, and that is the user_ptr. These allow you to essentially return a reference pointer back to emacs that you can pass back to another function. That way they stay alive between function calls. For example, here we have to create a context and socket and pass these items to functions like zmq_send and zmq_recv.

The directory this library is in is not on my path, so we load it like this:

(add-to-list 'load-path (expand-file-name "."))
(require 'mod-zmq)

Here are the functions and their signatures that have been implemented so far. I only implemented the ones I needed for the client. The signatures may change in the future; this is just a proof of concept for now for the purpose of building the client.

(apropos-command "zmq*" t)
(with-current-buffer "*Apropos*" (buffer-string))
Type RET on a type label to view its full documentation.

zmq-close
  Function: (zmq-close SOCKET)
zmq-connect
  Function: (zmq-connect SOCKET ENDPOINT)
zmq-ctx-destroy
  Function: (zmq-ctx-destroy CONTEXT)
zmq-ctx-new
  Function: (zmq-ctx-new)
zmq-recv
  Function: (zmq-recv SOCKET LEN FLAGS)
zmq-send
  Function: (zmq-send SOCKET BUF FLAGS)
zmq-socket
  Function: (zmq-socket CONTEXT TYPE)

You can see the c code for the client here: hwclient.c . Here is a simple elisp version of the hwclient that basically does the same thing! The main difference is I added a while loop around the zmq-recv because sometimes it returns -1 and no result. So, here we loop until the return result is not -1. That seems to do the right thing.

(let* ((context (zmq-ctx-new))
       (socket (zmq-socket context ZMQ-REQ))
       (recv-ret -1)
       (result))

  (zmq-connect socket "tcp://localhost:5555")
  (zmq-send socket "Hello" 0)

  (while (= recv-ret -1)
    (setq result (zmq-recv socket 10 0)
          recv-ret (second result)))

  (print result)

  (zmq-close socket)
  (zmq-ctx-destroy context))

("World" 5)

Basically this creates the context, then the socket, and connects to it on port 5555 of the localhost where the server is running. Then we send the string "Hello". The server returns the string "World" and tells us it sent 5 bytes. Then we close the socket and destroy the context. There is a lot of code in the module to make this happen. A lot of it is converting args in emacs functions to things we can use in c, running a few lines of zmq commands, and then code to convert those results back to emacs values. Finally, there is code to register each function and define docstrings for them. I am not totally convinced this is the best way to do this, but it does work! An alternative might be emacs-ffi, which might enable most of this to be developed in just elisp.

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.7

Read and Post Comments

Adding linear algebra to Emacs with the GSL and dynamic modules

| categories: dynamic-module, emacs | tags: | View Comments

The goal of this post is to be able to solve equations like this one:

\[\left(\begin{array}{cccc} 0.18& 0.60& 0.57& 0.96 \\ 0.41& 0.24& 0.99& 0.58 \\ 0.14& 0.30& 0.97& 0.66 \\ 0.51& 0.13& 0.19& 0.85 \end{array} \right ) \left ( \begin{array}{c} x_0 \\ x_1 \\ x_2 \\ x_3 \end{array} \right ) = \left ( \begin{array}{c} 1.0 \\ 2.0 \\ 3.0 \\ 4.0 \end{array} \right ) \]

The answer is given as

\[x = \left ( \begin{array}{c} -4.05205 \\ -12.6056 \\ 1.66091 \\ 8.69377 \end{array} \right ) \]

The syntax we want to use is shown below, and we want it to return a vector containing the solution:

(let ((A [[0.18 0.60 0.57 0.96]
          [0.41 0.24 0.99 0.58]
          [0.14 0.30 0.97 0.66]
          [0.51 0.13 0.19 0.85]])
      (b [1.0 2.0 3.0 4.0]))
  (gsl-linalg-LU-solve A b))

Rather than put all the code in here like I have for the past several posts, I started a git repo at https://github.com/jkitchin/emacs-modules that contains this code.

The module for this post can be found here: https://github.com/jkitchin/emacs-modules/blob/master/gsl-linalg.c. There are a few notable features in it. First, I started writing/collecting some helper functions to make these modules simpler to write. For example, look how nice this looks to declare the functions and provide the feature.

int emacs_module_init(struct emacs_runtime *ert)
{
  emacs_env *env = ert->get_environment(ert);
  
  DEFUN("gsl-linalg-LU-solve", Fgsl_linalg_LU_solve, 2, 2,
        "(gsl-linalg-LU-solve A b).\n" \
        "Solve A x = b for x.\n" \
        "Returns a vector containing the solution x.",
        NULL);
  provide(env, "gsl-linalg");
  
  return 0;
}

The DEFUN and provide function are defined in https://github.com/jkitchin/emacs-modules/blob/master/emacs-module-helpers.c.

Within the module itself, we have to loop over the inputs to create the arrays that GSL wants to solve this problem. Second, after the solution is obtained, we have to build up a vector to return. The solution is in a gsl_vector, and we need to create an array of emacs_value elements containing each element of the gsl_vector as a float, and then create a vector to return to emacs. I use vectors here because it was easy to get the size of the b vector, which is also related to the size of the A matrix.

The repo has a Makefile in it, so we can build this module with:

make gsl-linalg.so

Once it is compiled, we load it like this. In this post we are in the emacs-modules directory where the gsl-linalg.so library is, and it is not on my load-path, so I add it here.

(add-to-list 'load-path (expand-file-name "."))
(require 'gsl-linalg)
gsl-linalg

Here is one function in the module:

(describe-function 'gsl-linalg-LU-solve)
gsl-linalg-LU-solve is a Lisp function.

(gsl-linalg-LU-solve &rest ARGS)

For more information check the manuals.

(gsl-linalg-LU-solve A b).
Solve A x = b for x.
Returns a vector containing the solution x.

Now, we can solve linear equations like this:

(gsl-linalg-LU-solve
 [[0.18 0.60 0.57 0.96]
  [0.41 0.24 0.99 0.58]
  [0.14 0.30 0.97 0.66]
  [0.51 0.13 0.19 0.85]]
 [1.0 2.0 3.0 4.0])
[-4.052050229573973 -12.605611395906903 1.6609116267088417 8.693766928795227]

We have a limited ability to confirm this answer. I have written a function that uses blas for multiplication of 2d vectors. You can see from this:

(gsl-blas-dgemm [[0.18 0.60 0.57 0.96]
                 [0.41 0.24 0.99 0.58]
                 [0.14 0.30 0.97 0.66]
                 [0.51 0.13 0.19 0.85]]
                [[-4.052050229573973]
                 [-12.605611395906903]
                 [1.6609116267088417]
                 [8.693766928795227]])
[[1.0] [1.9999999999999991] [2.9999999999999996] [4.0]]

That within float that indeed \(A x = b\).

The main limitation of this module at the moment is that you have to use vectors; you cannot put in a list of numbers. It is possible to make it take lists and vectors, but for now I am leaving it at vectors. Also, it only produces solutions of float numbers (not integers).

The module does not handle 1d vectors well,, e.g. in gsl-linalg-LU-solve example, the right hand side is implied to be a column vector, and we don't have the array broadcasting features of Python yet. Those are doable things for some future day perhaps. For now I am happy to have figured out how to handle arrays!

Copyright (C) 2017 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 9.0.7

Read and Post Comments

Next Page »