Getting a dictionary of counts

| categories: programming | tags:

I frequently want to take a list and get a dictionary of keys that have the count of each element in the list. Here is how I have typically done this countless times in the past.

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']

d = {}
for el in L:
    if el in d:
        d[el] += 1
    else:
        d[el] = 1

print d
{'a': 3, 'b': 2, 'e': 2, 'd': 1}

That seems like too much code, and that there must be a list comprehension approach combined with a dictionary constructor.

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']

print dict((el,L.count(el)) for el in L)
{'a': 3, 'b': 2, 'e': 2, 'd': 1}

Wow, that is a lot simpler! I suppose for large lists this might be slow, since count must look through the list for each element, whereas the longer code looks at each element once, and does one conditional analysis.

Here is another example of much shorter and cleaner code.

from collections import Counter
L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']
print Counter(L)
print Counter(L)['a']
Counter({'a': 3, 'b': 2, 'e': 2, 'd': 1})
3

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Lambda Lambda Lambda

| categories: programming | tags:

Is that some kind of fraternity? of anonymous functions? What is that!? There are many times where you need a callable, small function in python, and it is inconvenient to have to use def to create a named function. Lambda functions solve this problem. Let us look at some examples. First, we create a lambda function, and assign it to a variable. Then we show that variable is a function, and that we can call it with an argument.

f = lambda x: 2*x
print f
print f(2)
<function <lambda> at 0x0000000001E6AAC8>
4

We can have more than one argument:

f = lambda x,y: x + y
print f
print f(2, 3)
<function <lambda> at 0x0000000001E3AAC8>
5

And default arguments:

f = lambda x, y=3: x + y
print f
print f(2)
print f(4, 1)
<function <lambda> at 0x0000000001E9AAC8>
5
5

It is also possible to have arbitrary numbers of positional arguments. Here is an example that provides the sum of an arbitrary number of arguments.

import operator
f = lambda *x: reduce(operator.add, x)
print f

print f(1)
print f(1, 2)
print f(1, 2, 3)
<function <lambda> at 0x0000000001DFAAC8>
1
3
6

You can also make arbitrary keyword arguments. Here we make a function that simply returns the kwargs as a dictionary. This feature may be helpful in passing kwargs to other functions.

f = lambda **kwargs: kwargs

print f(a=1, b=3)
{'a': 1, 'b': 3}

Of course, you can combine these options. Here is a function with all the options.

f = lambda a, b=4, *args, **kwargs: (a, b, args, kwargs)

print f('required', 3, 'optional-positional', g=4)
('required', 3, ('optional-positional',), {'g': 4})

One of the primary limitations of lambda functions is they are limited to single expressions. They also do not have documentation strings, so it can be difficult to understand what they were written for later.

1 Applications of lambda functions

Lambda functions are used in places where you need a function, but may not want to define one using def. For example, say you want to solve the nonlinear equation \(\sqrt{x} = 2.5\).

from scipy.optimize import fsolve
import numpy as np

sol, = fsolve(lambda x: 2.5 - np.sqrt(x), 8)
print sol
6.25

Another time to use lambda functions is if you want to set a particular value of a parameter in a function. Say we have a function with an independent variable, \(x\) and a parameter \(a\), i.e. \(f(x; a)\). If we want to find a solution \(f(x; a) = 0\) for some value of \(a\), we can use a lambda function to make a function of the single variable \(x\). Here is a example.

from scipy.optimize import fsolve
import numpy as np

def func(x, a):
    return a * np.sqrt(x) - 4.0

sol, = fsolve(lambda x: func(x, 3.2), 3)
print sol
1.5625

Any function that takes a function as an argument can use lambda functions. Here we use a lambda function that adds two numbers in the reduce function to sum a list of numbers.

print reduce(lambda x, y: x + y, [0, 1, 2, 3, 4])
10

We can evaluate the integral \(\int_0^2 x^2 dx\) with a lambda function.

from scipy.integrate import quad

print quad(lambda x: x**2, 0, 2)
(2.666666666666667, 2.960594732333751e-14)

2 Summary

Lambda functions can be helpful. They are never necessary. You can always define a function using def, but for some small, single-use functions, a lambda function could make sense. Lambda functions have some limitations, including that they are limited to a single expression, and they lack documentation strings.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Redirecting the print function

| categories: programming | tags:

Ordinarily a print statement prints to stdout, or your terminal/screen. You can redirect this so that printing is done to a file, for example. This might be helpful if you use print statements for debugging, and later want to save what is printed to a file. Here we make a simple function that prints some things.

def debug():
    print 'step 1'
    print 3 + 4
    print 'finished'

debug()
... ... ... >>> step 1
7
finished

Now, let us redirect the printed lines to a file. We create a file object, and set sys.stdout equal to that file object.

import sys
print >> sys.__stdout__, '__stdout__ before = ', sys.__stdout__
print >> sys.__stdout__, 'stdout before = ', sys.stdout

f = open('data/debug.txt', 'w')
sys.stdout = f

# note that sys.__stdout__ does not change, but stdout does.
print >> sys.__stdout__, '__stdout__ after = ', sys.__stdout__
print >> sys.__stdout__, 'stdout after = ', sys.stdout

debug()

# reset stdout back to console
sys.stdout = sys.__stdout__

print f
f.close() # try to make it a habit to close files
print f
__stdout__ before =  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
stdout before =  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
>>> >>> >>> >>> ... __stdout__ after =  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
stdout after =  <open file 'data/debug.txt', mode 'w' at 0x2ae7dbcbbb70>
>>> >>> >>> ... >>> >>> >>> <open file 'data/debug.txt', mode 'w' at 0x2ae7dbcbbb70>
>>> <closed file 'data/debug.txt', mode 'w' at 0x2ae7dbcbbb70>

Note it can be important to close files. If you are looping through large numbers of files, you will eventually run out of file handles, causing an error. We can use a context manager to automatically close the file like this

import sys

# use the open context manager to automatically close the file
with open('data/debug.txt', 'w') as f:
    sys.stdout = f
    debug()
    print >> sys.__stdout__, f

# reset stdout
sys.stdout = sys.__stdout__
print f
>>> ... ... ... ... ... <open file 'data/debug.txt', mode 'w' at 0x0000000002071C00>
... >>> <closed file 'data/debug.txt', mode 'w' at 0x0000000002071C00>

See, the file is closed for us! We can see the contents of our file like this.

cat data/debug.txt
step 1
7
finished

The approaches above are not fault safe. Suppose our debug function raised an exception. Then, it could be possible the line to reset the stdout would not be executed. We can solve this with try/finally code.

import sys

print 'before: ', sys.stdout
try:
    with open('data/debug-2.txt', 'w') as f:
        sys.stdout = f
        # print to the original stdout
        print >> sys.__stdout__, 'during: ', sys.stdout
        debug()
        raise Exception('something bad happened')
finally:
    # reset stdout
    sys.stdout = sys.__stdout__

print 'after: ', sys.stdout
print f # verify it is closed
print sys.stdout # verify this is reset
>>> before:  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
... ... ... ... ... ... ... ... ... ... during:  <open file 'data/debug-2.txt', mode 'w' at 0x2ae7dbcbbf60>
Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
Exception: something bad happened
after:  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
<closed file 'data/debug-2.txt', mode 'w' at 0x2ae7dbcbbf60>
<open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
cat data/debug-2.txt
step 1
7
finished

This is the kind of situation where a context manager is handy. Context managers are typically a class that executes some code when you “enter” the context, and then execute some code when you “exit” the context. Here we want to change sys.stdout to a new value inside our context, and change it back when we exit the context. We will store the value of sys.stdout going in, and restore it on the way out.

import sys

class redirect:
    def __init__(self, f=sys.stdout):
        "redirect print statement to f. f must be a file-like object"
        self.f = f
        self.stdout = sys.stdout
        print >> sys.__stdout__, 'init stdout: ', sys.stdout        
    def __enter__(self): 
        sys.stdout = self.f
        print >> sys.__stdout__,  'stdout in context-manager: ',sys.stdout
    def __exit__(self, *args):
        sys.stdout = self.stdout
        print '__stdout__ at exit = ',sys.__stdout__        

# regular printing
with redirect():
    debug()

# write to a file
with open('data/debug-3.txt', 'w') as f:
    with redirect(f):
        debug()

# mixed regular and 
with open('data/debug-4.txt', 'w') as f:
    with redirect(f):
        print 'testing redirect'
        with redirect():
            print 'temporary console printing'
            debug()
        print 'Now outside the inner context. this should go to data/debug-4.txt'
        debug()
        raise Exception('something else bad happened')

print sys.stdout
>>> ... ... ... ... ... ... ... ... ... ... ... ... >>> ... ... ... init stdout:  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
stdout in context-manager:  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
step 1
7
finished
__stdout__ at exit =  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
... ... ... ... init stdout:  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
stdout in context-manager:  <open file 'data/debug-3.txt', mode 'w' at 0x2ae7dbcbbb70>
__stdout__ at exit =  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
... ... ... ... ... ... ... ... ... ... init stdout:  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
stdout in context-manager:  <open file 'data/debug-4.txt', mode 'w' at 0x2ae7dca4d030>
init stdout:  <open file 'data/debug-4.txt', mode 'w' at 0x2ae7dca4d030>
stdout in context-manager:  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
temporary console printing
step 1
7
finished
__stdout__ at exit =  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
Traceback (most recent call last):
  File "<stdin>", line 10, in <module>
Exception: something else bad happened
<open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>

Here are the contents of the debug file.

cat data/debug-3.txt
step 1
7
finished

The contents of the other debug file have some additional lines, because we printed some things while in the redirect context.

cat data/debug-4.txt
testing redirect
__stdout__ at exit =  <open file '<stdout>', mode 'w' at 0x2ae7d70e01e0>
Now outside the inner context. this should go to data/debug-4.txt
step 1
7
finished

See http://www.python.org/dev/peps/pep-0343/ (number 5) for another example of redirecting using a function decorator. I think it is harder to understand, because it uses a generator.

There were a couple of points in this section:

  1. You can control where things are printed in your programs by modifying the value of sys.stdout
  2. You can use try/except/finally blocks to make sure code gets executed in the event an exception is raised
  3. You can use context managers to make sure files get closed, and code gets executed if exceptions are raised.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Mail merge with python

| categories: email, programming | tags:

Suppose you are organizing some event, and you have a mailing list of email addresses and people you need to send a mail to telling them what room they will be in. You would like to send a personalized email to each person, and you do not want to type each one by hand. Python can automate this for you. All you need is the mailing list in some kind of structured format, and then you can go through it line by line to create and send emails.

We will use an org-table to store the data in.

First name Last name email address Room number
Jane Doe jane-doe@gmail.com 1
John Doe john-doe@gmail.com 2
Jimmy John jimmy-john@gmail.com 3

We pass that table into an org-mode source block as a variable called data, which will be a list of lists, one for each row of the table. You could alternatively read these from an excel spreadsheet, a csv file, or some kind of python data structure.

We do not actually send the emails in this example. To do that you need to have access to a mail server, which could be on your own machine, or it could be a relay server you have access to.

We create a string that is a template with some fields to be substituted, e.g. the firstname and room number in this case. Then we loop through each row of the table, and format the template with those values, and create an email message to the person. First we print each message to check that they are correct.

import smtplib
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText
from email.Utils import  formatdate

template = '''
Dear {firstname:s},

I am pleased to inform you that your talk will be in room {roomnumber:d}.

Sincerely,
John
'''

for firstname, lastname, emailaddress, roomnumber in data:
    msg = MIMEMultipart()
    msg['From'] = "youremail@gmail.com"
    msg['To'] = emailaddress
    msg['Date'] = formatdate(localtime=True)

    msgtext = template.format(**locals())
    print msgtext

    msg.attach(MIMEText(msgtext))

    ## Uncomment these lines and fix 
    #server = smtplib.SMTP('your.relay.server.edu')
    #server.sendmail('your_email@gmail.com', # from
    #                emailaddress,
    #                msg.as_string())
    #server.quit()

    print msg.as_string()
    print '------------------------------------------------------------------'
Dear Jane,

I am pleased to inform you that your talk will be in room 1.

Sincerely,
John

Content-Type: multipart/mixed; boundary="===============1191311863=="
MIME-Version: 1.0
From: youremail@gmail.com
To: jane-doe@gmail.com
Date: Tue, 16 Apr 2013 16:10:23 -0400

--===============1191311863==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit


Dear Jane,

I am pleased to inform you that your talk will be in room 1.

Sincerely,
John

--===============1191311863==--
------------------------------------------------------------------

Dear John,

I am pleased to inform you that your talk will be in room 2.

Sincerely,
John

Content-Type: multipart/mixed; boundary="===============1713881863=="
MIME-Version: 1.0
From: youremail@gmail.com
To: john-doe@gmail.com
Date: Tue, 16 Apr 2013 16:10:23 -0400

--===============1713881863==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit


Dear John,

I am pleased to inform you that your talk will be in room 2.

Sincerely,
John

--===============1713881863==--
------------------------------------------------------------------

Dear Jimmy,

I am pleased to inform you that your talk will be in room 3.

Sincerely,
John

Content-Type: multipart/mixed; boundary="===============0696685580=="
MIME-Version: 1.0
From: youremail@gmail.com
To: jimmy-john@gmail.com
Date: Tue, 16 Apr 2013 16:10:23 -0400

--===============0696685580==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit


Dear Jimmy,

I am pleased to inform you that your talk will be in room 3.

Sincerely,
John

--===============0696685580==--
------------------------------------------------------------------

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Working with lists

| categories: programming | tags:

It is not too uncommon to have a list of data, and then to apply a function to every element, to filter the list, or extract elements that meet some criteria. In this example, we take a string and split it into words. Then, we will examine several ways to apply functions to the words, to filter the list to get data that meets some criteria. Here is the string splitting.

text = '''
 As we have seen, handling units with third party functions is fragile, and often requires additional code to wrap the function to handle the units. An alternative approach that avoids the wrapping is to rescale the equations so they are dimensionless. Then, we should be able to use all the standard external functions without modification. We obtain the final solutions by rescaling back to the answers we want.

Before doing the examples, let us consider how the quantities package handles dimensionless numbers.

import quantities as u

a = 5 * u.m
L = 10 * u.m # characteristic length

print a/L
print type(a/L)

'''

words = text.split()
print words
... ... ... ... ... ... ... ... ... ... ... ... >>> >>> >>> ['As', 'we', 'have', 'seen,', 'handling', 'units', 'with', 'third', 'party', 'functions', 'is', 'fragile,', 'and', 'often', 'requires', 'additional', 'code', 'to', 'wrap', 'the', 'function', 'to', 'handle', 'the', 'units.', 'An', 'alternative', 'approach', 'that', 'avoids', 'the', 'wrapping', 'is', 'to', 'rescale', 'the', 'equations', 'so', 'they', 'are', 'dimensionless.', 'Then,', 'we', 'should', 'be', 'able', 'to', 'use', 'all', 'the', 'standard', 'external', 'functions', 'without', 'modification.', 'We', 'obtain', 'the', 'final', 'solutions', 'by', 'rescaling', 'back', 'to', 'the', 'answers', 'we', 'want.', 'Before', 'doing', 'the', 'examples,', 'let', 'us', 'consider', 'how', 'the', 'quantities', 'package', 'handles', 'dimensionless', 'numbers.', 'import', 'quantities', 'as', 'u', 'a', '=', '5', '*', 'u.m', 'L', '=', '10', '*', 'u.m', '#', 'characteristic', 'length', 'print', 'a/L', 'print', 'type(a/L)']

Let us get the length of each word.

print [len(word) for word in words]

# functional approach with a lambda function
print map(lambda word: len(word), words)

# functional approach with a builtin function
print map(len, words)

# functional approach with a user-defined function
def get_length(word):
    return len(word)

print map(get_length, words)
[2, 2, 4, 5, 8, 5, 4, 5, 5, 9, 2, 8, 3, 5, 8, 10, 4, 2, 4, 3, 8, 2, 6, 3, 6, 2, 11, 8, 4, 6, 3, 8, 2, 2, 7, 3, 9, 2, 4, 3, 14, 5, 2, 6, 2, 4, 2, 3, 3, 3, 8, 8, 9, 7, 13, 2, 6, 3, 5, 9, 2, 9, 4, 2, 3, 7, 2, 5, 6, 5, 3, 9, 3, 2, 8, 3, 3, 10, 7, 7, 13, 8, 6, 10, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 3, 1, 14, 6, 5, 3, 5, 9]
>>> ... [2, 2, 4, 5, 8, 5, 4, 5, 5, 9, 2, 8, 3, 5, 8, 10, 4, 2, 4, 3, 8, 2, 6, 3, 6, 2, 11, 8, 4, 6, 3, 8, 2, 2, 7, 3, 9, 2, 4, 3, 14, 5, 2, 6, 2, 4, 2, 3, 3, 3, 8, 8, 9, 7, 13, 2, 6, 3, 5, 9, 2, 9, 4, 2, 3, 7, 2, 5, 6, 5, 3, 9, 3, 2, 8, 3, 3, 10, 7, 7, 13, 8, 6, 10, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 3, 1, 14, 6, 5, 3, 5, 9]
>>> [2, 2, 4, 5, 8, 5, 4, 5, 5, 9, 2, 8, 3, 5, 8, 10, 4, 2, 4, 3, 8, 2, 6, 3, 6, 2, 11, 8, 4, 6, 3, 8, 2, 2, 7, 3, 9, 2, 4, 3, 14, 5, 2, 6, 2, 4, 2, 3, 3, 3, 8, 8, 9, 7, 13, 2, 6, 3, 5, 9, 2, 9, 4, 2, 3, 7, 2, 5, 6, 5, 3, 9, 3, 2, 8, 3, 3, 10, 7, 7, 13, 8, 6, 10, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 3, 1, 14, 6, 5, 3, 5, 9]
>>> ... ... ... >>> [2, 2, 4, 5, 8, 5, 4, 5, 5, 9, 2, 8, 3, 5, 8, 10, 4, 2, 4, 3, 8, 2, 6, 3, 6, 2, 11, 8, 4, 6, 3, 8, 2, 2, 7, 3, 9, 2, 4, 3, 14, 5, 2, 6, 2, 4, 2, 3, 3, 3, 8, 8, 9, 7, 13, 2, 6, 3, 5, 9, 2, 9, 4, 2, 3, 7, 2, 5, 6, 5, 3, 9, 3, 2, 8, 3, 3, 10, 7, 7, 13, 8, 6, 10, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 3, 1, 14, 6, 5, 3, 5, 9]

Now let us get all the words that start with the letter “a”. This is sometimes called filtering a list. We use a string function startswith to check for upper and lower-case letters. We will use list comprehension with a condition.

print [word for word in words if word.startswith('a') or word.startswith('A')]

# make word lowercase to simplify the conditional statement
print [word for word in words if word.lower().startswith('a')]
['As', 'and', 'additional', 'An', 'alternative', 'approach', 'avoids', 'are', 'able', 'all', 'answers', 'as', 'a', 'a/L']
['As', 'and', 'additional', 'An', 'alternative', 'approach', 'avoids', 'are', 'able', 'all', 'answers', 'as', 'a', 'a/L']

A slightly harder example is to find all the words that are actually numbers. We could use a regular expression for that, but we will instead use a function we create. We use a function that tries to cast a word as a float. If this fails, we know the word is not a float, so we return False.

def float_p(word):
    try:
        float(word)
        return True
    except ValueError:
        return False

print [word for word in words if float_p(word)]

# here is a functional approach
print filter(float_p, words)
... ... ... ... ... >>> ['5', '10']
['5', '10']

Finally, we consider filtering the list to find all words that contain certain symbols, say any character in this string “./=*#”. Any of those characters will do, so we search each word for one of them, and return True if it contains it, and False if none are contained.

def punctuation_p(word):
    S = './=*#'
    for s in S:
        if s in word:
            return True
    return False

print [word for word in words if punctuation_p(word)]
print filter(punctuation_p, words)
... ... ... ... ... >>> ['units.', 'dimensionless.', 'modification.', 'want.', 'numbers.', '=', '*', 'u.m', '=', '*', 'u.m', '#', 'a/L', 'type(a/L)']
['units.', 'dimensionless.', 'modification.', 'want.', 'numbers.', '=', '*', 'u.m', '=', '*', 'u.m', '#', 'a/L', 'type(a/L)']

In this section we examined a few ways to interact with lists using list comprehension and functional programming. These approaches make it possible to work on arbitrary size lists, without needing to know in advance how big the lists are. New lists are automatically generated as results, without the need to preallocate lists, i.e. you do not need to know the size of the output. This can be handy as it avoids needing to write loops in some cases and leads to more compact code.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter
« Previous Page -- Next Page »