Getting a dictionary of counts

| categories: programming | tags:

I frequently want to take a list and get a dictionary of keys that have the count of each element in the list. Here is how I have typically done this countless times in the past.

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']

d = {}
for el in L:
    if el in d:
        d[el] += 1
    else:
        d[el] = 1

print d
{'a': 3, 'b': 2, 'e': 2, 'd': 1}

That seems like too much code, and that there must be a list comprehension approach combined with a dictionary constructor.

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']

print dict((el,L.count(el)) for el in L)
{'a': 3, 'b': 2, 'e': 2, 'd': 1}

Wow, that is a lot simpler! I suppose for large lists this might be slow, since count must look through the list for each element, whereas the longer code looks at each element once, and does one conditional analysis.

Here is another example of much shorter and cleaner code.

from collections import Counter
L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']
print Counter(L)
print Counter(L)['a']
Counter({'a': 3, 'b': 2, 'e': 2, 'd': 1})
3

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Is your ice cream float bigger than mine

| categories: math | tags:

Float numbers (i.e. the ones with decimals) cannot be perfectly represented in a computer. This can lead to some artifacts when you have to compare float numbers that on paper should be the same, but in silico are not. Let us look at some examples. In this example, we do some simple math that should result in an answer of 1, and then see if the answer is “equal” to one.

print 3.0 * (1.0/3.0)
print 1.0 == 3.0 * (1.0/3.0)
1.0
True

Everything looks fine. Now, consider this example.

print 49.0 * (1.0/49.0)
print 1.0 == 49.0 * (1.0/49.0)
1.0
False

The first line looks like everything is find, but the equality fails!

1.0
False

You can see here why the equality statement fails. We will print the two numbers to sixteen decimal places.

print '{0:1.16f}'.format(49.0 * (1.0/49.0) )
print '{0:1.16f}'.format(1.0)
print 1 - 49.0 * (1.0/49.0)
0.9999999999999999
1.0000000000000000
1.11022302463e-16

The two numbers actually are not equal to each other because of float math. They are very, very close to each other, but not the same.

This leads to the idea of asking if two numbers are equal to each other within some tolerance. The question of what tolerance to use requires thought. Should it be an absolute tolerance? a relative tolerance? How large should the tolerance be? We will use the distance between 1 and the nearest floating point number (this is eps in Matlab). numpy can tell us this number with the np.spacing command.

Below, we implement a comparison function from 10.1107/S010876730302186X that allows comparisons with tolerance.

# Implemented from Acta Crystallographica A60, 1-6 (2003). doi:10.1107/S010876730302186X

import numpy as np
print np.spacing(1)

def feq(x, y, epsilon):
    'x == y'
    return not((x < (y - epsilon)) or (y < (x - epsilon)))

print feq(1.0, 49.0 * (1.0/49.0), np.spacing(1))
2.22044604925e-16
True

For completeness, here are the other float comparison operators from that paper. We also show a few examples.

import numpy as np

def flt(x, y, epsilon):
    'x < y'
    return x < (y - epsilon)

def fgt(x, y, epsilon):
    'x > y'
    return y < (x - epsilon)

def fle(x, y, epsilon):
    'x <= y'
    return not(y < (x - epsilon))

def fge(x, y, epsilon):
    'x >= y'
    return not(x < (y - epsilon))

print fge(1.0, 49.0 * (1.0/49.0), np.spacing(1))
print fle(1.0, 49.0 * (1.0/49.0), np.spacing(1))

print fgt(1.0 + np.spacing(1), 49.0 * (1.0/49.0), np.spacing(1))
print flt(1.0 - 2 * np.spacing(1), 49.0 * (1.0/49.0), np.spacing(1))
True
True
True
True

As you can see, float comparisons can be tricky. You have to give a lot of thought to how to make the comparisons, and the functions shown above are not the only way to do it. You need to build in testing to make sure your comparisons are doing what you want.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Zhongnan Xu receives an ICMR International Research Fellowship

| categories: news | tags:

Zhongnan will be visiting the Denmark Technical University to collaborate with Jan Rossmeisl in the next year! This fellowship is supported by the IMI Program of the National Science Foundation under Award No. DMR 08-43934 through UC Santa Barbara. Congratulations Zhongnan!

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Calling lapack directly from scipy

| categories: linear algebra | tags:

If the built in linear algebra functions in numpy and scipy do not meet your needs, it is often possible to directly call lapack functions. Here we call a function to solve a set of complex linear equations. The lapack function for this is ZGBSV. The description of this function (http://linux.die.net/man/l/zgbsv) is:

ZGBSV computes the solution to a complex system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

The python signature is (http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lapack.zgbsv.html#scipy.linalg.lapack.zgbsv):

lub,piv,x,info = zgbsv(kl,ku,ab,b,[overwrite_ab,overwrite_b])

We will look at an example from http://www.nag.com/lapack-ex/node22.html.

We solve \(A x = b\) with

\( A = \left(

\begin{array}{cccc} -1.65 + 2.26 i & -2.05 - 0.85 i & 0.97 - 2.84 i & 0 \\ 6.30 i & -1.48 - 1.75 i & -3.99 + 4.01 i & 0.59 - 0.48 i \\ 0 & -0.77 + 2.83 i & -1.06 + 1.94 i & 3.33 - 1.04 i \\ 0 & 0 & 4.48 - 1.09 i & -0.46 - 1.72 i \end{array}

\right) \)

\( b = \left(

\begin{array}{cc} -1.06 + 21.50 i \\ -22.72 - 53.90 i \\ 28.24 - 38.60 i \\ -34.56 + 16.73 i \end{array}

\right). \)

The \(A\) matrix has one lower diagonal (kl = 1) and two upper diagonals (ku = 2), four equations (n = 4) and one right-hand side.

import scipy.linalg.lapack as la

# http://www.nag.com/lapack-ex/node22.html
import numpy as np
A = np.array([[-1.65 + 2.26j, -2.05 - 0.85j,  0.97 - 2.84j,  0.0         ],
              [6.30j,         -1.48 - 1.75j, -3.99 + 4.01j,  0.59 - 0.48j],
              [0.0,           -0.77 + 2.83j, -1.06 + 1.94j,  3.33 - 1.04j],
              [0.0,            0.0,           4.48 - 1.09j, -0.46 - 1.72j]])

# construction of Ab is tricky.  Fortran indexing starts at 1, not
# 0. This code is based on the definition of Ab at
# http://linux.die.net/man/l/zgbsv. First, we create the Fortran
# indices based on the loops, and then subtract one from them to index
# the numpy arrays.
Ab = np.zeros((5,4),dtype=np.complex)
n, kl, ku = 4, 1, 2

for j in range(1, n + 1):
    for i in range(max(1, j - ku), min(n, j + kl) + 1):
        Ab[kl + ku + 1 + i - j - 1, j - 1] = A[i-1, j-1]

b = np.array([[-1.06  + 21.50j],
              [-22.72 - 53.90j],
              [28.24 - 38.60j],
              [-34.56 + 16.73j]])

lub, piv, x, info = la.flapack.zgbsv(kl, ku, Ab, b)

# compare to results at http://www.nag.com/lapack-ex/examples/results/zgbsv-ex.r
print 'x = ',x
print 'info = ',info

# check solution
print 'solved: ',np.all(np.dot(A,x) - b < 1e-12)

# here is the easy way!!!
print '\n\nbuilt-in solver'
print np.linalg.solve(A,b)
x =  [[-3.+2.j]
 [ 1.-7.j]
 [-5.+4.j]
 [ 6.-8.j]]
info =  0
solved:  True


built-in solver
[[-3.+2.j]
 [ 1.-7.j]
 [-5.+4.j]
 [ 6.-8.j]]

Some points of discussion.

  1. Kind of painful! but, nevertheless, possible. You have to do a lot more work figuring out the dimensions of the problem, how to setup the problem, keeping track of indices, etc…

But, one day it might be helpful to know this can be done, e.g. to debug an installation, to validate an approach against known results, etc…

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Lambda Lambda Lambda

| categories: programming | tags:

Is that some kind of fraternity? of anonymous functions? What is that!? There are many times where you need a callable, small function in python, and it is inconvenient to have to use def to create a named function. Lambda functions solve this problem. Let us look at some examples. First, we create a lambda function, and assign it to a variable. Then we show that variable is a function, and that we can call it with an argument.

f = lambda x: 2*x
print f
print f(2)
<function <lambda> at 0x0000000001E6AAC8>
4

We can have more than one argument:

f = lambda x,y: x + y
print f
print f(2, 3)
<function <lambda> at 0x0000000001E3AAC8>
5

And default arguments:

f = lambda x, y=3: x + y
print f
print f(2)
print f(4, 1)
<function <lambda> at 0x0000000001E9AAC8>
5
5

It is also possible to have arbitrary numbers of positional arguments. Here is an example that provides the sum of an arbitrary number of arguments.

import operator
f = lambda *x: reduce(operator.add, x)
print f

print f(1)
print f(1, 2)
print f(1, 2, 3)
<function <lambda> at 0x0000000001DFAAC8>
1
3
6

You can also make arbitrary keyword arguments. Here we make a function that simply returns the kwargs as a dictionary. This feature may be helpful in passing kwargs to other functions.

f = lambda **kwargs: kwargs

print f(a=1, b=3)
{'a': 1, 'b': 3}

Of course, you can combine these options. Here is a function with all the options.

f = lambda a, b=4, *args, **kwargs: (a, b, args, kwargs)

print f('required', 3, 'optional-positional', g=4)
('required', 3, ('optional-positional',), {'g': 4})

One of the primary limitations of lambda functions is they are limited to single expressions. They also do not have documentation strings, so it can be difficult to understand what they were written for later.

1 Applications of lambda functions

Lambda functions are used in places where you need a function, but may not want to define one using def. For example, say you want to solve the nonlinear equation \(\sqrt{x} = 2.5\).

from scipy.optimize import fsolve
import numpy as np

sol, = fsolve(lambda x: 2.5 - np.sqrt(x), 8)
print sol
6.25

Another time to use lambda functions is if you want to set a particular value of a parameter in a function. Say we have a function with an independent variable, \(x\) and a parameter \(a\), i.e. \(f(x; a)\). If we want to find a solution \(f(x; a) = 0\) for some value of \(a\), we can use a lambda function to make a function of the single variable \(x\). Here is a example.

from scipy.optimize import fsolve
import numpy as np

def func(x, a):
    return a * np.sqrt(x) - 4.0

sol, = fsolve(lambda x: func(x, 3.2), 3)
print sol
1.5625

Any function that takes a function as an argument can use lambda functions. Here we use a lambda function that adds two numbers in the reduce function to sum a list of numbers.

print reduce(lambda x, y: x + y, [0, 1, 2, 3, 4])
10

We can evaluate the integral \(\int_0^2 x^2 dx\) with a lambda function.

from scipy.integrate import quad

print quad(lambda x: x**2, 0, 2)
(2.666666666666667, 2.960594732333751e-14)

2 Summary

Lambda functions can be helpful. They are never necessary. You can always define a function using def, but for some small, single-use functions, a lambda function could make sense. Lambda functions have some limitations, including that they are limited to a single expression, and they lack documentation strings.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter
« Previous Page -- Next Page »