The and-or trick in python

| categories: programming, logic | tags:

The boolean logic commands and and or have return values in python. Let us first review briefly what these operators do by examples. The typical usage is in conditional statements. First, we look at what kind of values evaluate to "True" or "False" in python. Anything that is "empty" usually evaluates to False, along with the integer 0 and the boolean value of False.

for value in ('', 0, None, [], (), {}, False):
    if value:
        print value, "True"
    else:
        print value, "False"
 False
0 False
None False
[] False
() False
{} False
False False

Objects that are not empty evaluate to "True", along with numbers not equal to 0, and the boolean value True.

for value in (' ', 2, 1, "a", [1], (3,), True):
    if value:
        print value, "True"
    else:
        print value, "False"
  True
2 True
1 True
a True
[1] True
(3,) True
True True

The and and or operators compare two objects. and evaluates to "True" if both objects evaluate to "True" and or evaluates to "True" if either object evaluates to "True". Here are some examples.

a = None
b = 5

if a and b:
    print True
else:
    print False
False
a = None
b = 5

if a or b:
    print True
else:
    print False
True

Now the interesting part. The and and or operators actually return values! With the and operator, each argument is evaluated, and if they all evaluate to True, the last argument is returned. Otherwise the first False argument is returned.

a = 1
b = 5
print a and b
print b and a
print a and False
print a and True
print a and None
print False and a
print None and a
print True and 'a' and 0 and True # first False item is zero
5
1
False
True
None
False
None
0

The or operator returns the first True value or the last value if nothing is True. Note that if a True value is found, the values in the expressions after that value are not evaluated.

print 2 or False
print 0 or False
print 0 or False or 4 or {}
2
False
4

One way you might see this is to set variables depending on what command-line arguments were used in a script. For example:

import sys

# replace this:
if 'plot' in sys.argv:
    PLOT = True
else:
    PLOT = False

# with this
PLOT = 'plot' in sys.argv or False

# later in your code:
if PLOT: 
    # do something to make a plot

Now we get to the and-or trick. The trick is to assign a variable one value if some boolean value is True, and another value if the expression is False.

a = True
b = True

if a and b:
    c = "value1"
else: 
    c = "value2"

print c
value1

We can replace the if/else code above with this one line expression:

a = True
b = True

c = (a and b) and "value1" or "value2"
print c
value1

There is a problem. If the first value evaluates to False, you will not get what you expect:

a = True
b = True

c = (a and b) and None or "value2"
print c
value2

In this case, (a and b) evaluates to True, so we would expect the value of c to be the first value. However, None evaluates to False, so the or operator returns the first "True" value, which is the second value. We have to modify the code so that both the or arguments are True. We do this by putting both arguments inside a list, which will then always evaluate to True. This will assign the first list to c if the expression is True, and the second list if it is False. We wrap the whole thing in parentheses, and then index the returned list to get the contents of the list.

a = True
b = True

c = ((a and b) and [None] or ["value2"])[0]

print c
None
a = True
b = True

c = (not (a and b) and [None] or ["value2"])[0]

print c
value2

This is definitely a trick. I find the syntax difficult to read, especially compared to the more verbose if/else statements. It is interesting though, and there might be places where the return value of the boolean operators might be useful, now that you know you can get them.

Here is a tough example of using this to update a dictionary entry. Previously we used a dictionary to count unique entries in a list.

d = {}

d['key'] = (d.get('key', None) and [d['key'] + 1] or [1])[0]

print d

d['key'] = (d.get('key', None) and [d['key'] + 1] or [1])[0]
print d
{'key': 1}
{'key': 2}

This works because the .get function on a dictionary returns None if the key does not exist, resulting in assigning the value of 1 to that key, or it returns something that evaluates to True if the key does exist, so the key gets incremented.

Let us see this trick in action. Before we used if/else statements to achieve our goal, checking if the key is in the dictionary and incrementing by one if it is, and if not, setting the key to 1.

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']

# old method
d = {}
for el in L:
    if el in d:
        d[el] += 1
    else:
        d[el] = 1

print d
{'a': 3, 'b': 2, 'e': 2, 'd': 1}

Here is the new method:

# new method:
L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']
d = {}
for el in L:
    d[el] = (d.get(el, None) and [d[el] + 1] or [1])[0]
print d
{'a': 3, 'b': 2, 'e': 2, 'd': 1}

We have in (an admittedly hard to read) a single single line replaced the if/else statement! I have for a long time thought this should possible. I am somewhat disappointed that it is not easier to read though.

Update 7/8/2013

1 Using more modern python syntax than the and-or trick

@Mark_ pointed out in a comment the more modern syntax in python is "value1" if a else "value2". Here is how it works.

a = True
c = "value1" if a else "value2"
print c
value1
a = ''
c = "value1" if a else "value2"
print c
value2

This is indeed very clean to read. This leads to a cleaner and easier to read implementation of the counting code.

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']
d = {}
for el in L:
    d[el] = (d[el] + 1) if (el in d) else 1
print d
{'a': 3, 'b': 2, 'e': 2, 'd': 1}

See the next section for an even cleaner implementation.

2 using defaultdict

@Mark_ also suggested the use of defaultdict for the counting code. That is pretty concise! It is not obvious that the default value is equal to zero, but int() returns zero. This is much better than the and-or trick.

from collections import defaultdict
print int()

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']
d = defaultdict(int)
for el in L:
    d[el] += 1
print d
0
defaultdict(<type 'int'>, {'a': 3, 'b': 2, 'e': 2, 'd': 1})

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Memoizing instance methods in a class

| categories: programming | tags:

Suppose you have a module that you import a class from, and the class defines some methods that you want to memoize. You do not want to modify the source code, maybe because it is not your code, or because you do not want to maintain it, etc… Here is one way to modify the class functions at runtime. We will use the memoize decorator and replace the class function definition with the wrapped function that caches the results. We also allow arbitrary arguments and keyword arguments. A subtle wrinkle here is that you cannot use a dictionary as a key to a dictionary because dictionaries are not hashable. We use the pickle module to created a string that should uniquely represent the args and keyword args, and we use that string as the key.

class Calculator:
    def __init__(self):
        pass

    def calculate(self, a):
        'returns the answer to everything'
        return 42

    def method_2(self, *args, **kwargs):
        return (args, kwargs)


import pickle

from functools import wraps
def memoize(func):
    cache = {}
    @wraps(func)
    def wrap(*args,**kwargs):
        key = pickle.dumps((args, kwargs))
        if key not in cache:
            print 'Running func with ', args
            cache[key] = func(*args, **kwargs)
        else:
            print 'result in cache'
        return cache[key]
    return wrap

# now monkey patch/decorate the class function
Calculator.calculate = memoize(Calculator.calculate)
Calculator.method_2 = memoize(Calculator.method_2)

calc = Calculator()
print calc.calculate(3)
print calc.calculate(3)
print calc.calculate(4)
print calc.calculate(3)


print calc.method_2()
print calc.method_2()

print calc.method_2(1,2)
print calc.method_2(1,2)

print calc.method_2(1,2,a=5)
print calc.method_2(1,2,a=5)
Running func with  (<__main__.Calculator instance at 0x0000000001E9B3C8>, 3)
42
result in cache
42
Running func with  (<__main__.Calculator instance at 0x0000000001E9B3C8>, 4)
42
result in cache
42
Running func with  (<__main__.Calculator instance at 0x0000000001E9B3C8>,)
((), {})
result in cache
((), {})
Running func with  (<__main__.Calculator instance at 0x0000000001E9B3C8>, 1, 2)
((1, 2), {})
result in cache
((1, 2), {})
Running func with  (<__main__.Calculator instance at 0x0000000001E9B3C8>, 1, 2)
((1, 2), {'a': 5})
result in cache
((1, 2), {'a': 5})

This particular memoize decorator is not persistent; the data is only stored in memory. You would have to write the data out to a file and reread the file to make it persistent.

It is not obvious this practice is good; you have in essence changed the behavior of the original function in a way that may be hard to debug, and could conceivably be incompatible with the documentation of the function.

An alternative approach is writing another function that wraps the code you want, and memoize that function.

class Calculator:
    def __init__(self):
        pass

    def calculate(self, a):
        'returns the answer to everything'
        return 42



from functools import wraps
def memoize(func):
    cache = {}
    @wraps(func)
    def wrap(*args):
        if args not in cache:
            print 'Running func with ', args
            cache[args] = func(*args)
        else:
            print 'result in cache'
        return cache[args]
    return wrap

calc = Calculator()

@memoize
def my_calculate(a):
    return calc.calculate(a)

print my_calculate(3)
print my_calculate(3)
print my_calculate(4)
print my_calculate(3)
Running func with  (3,)
42
result in cache
42
Running func with  (4,)
42
result in cache
42

It is debatable whether this is cleaner. One argument for this is that it does not monkey with the original code at all.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Memoizing expensive functions in python and saving results

| categories: programming | tags:

Sometimes a function is expensive (time-consuming) to run, and you would like to save all the results of the function having been run to avoid having to rerun them. This is called memoization. A wrinkle on this problem is to save the results in a file so that later you can come back to a function and not have to run simulations over again.

In python, a good way to do this is to "decorate" your function. This way, you write the function to do what you want, and then "decorate" it. The decoration wraps your function and in this case checks if the arguments you passed to the function are already stored in the cache. If so, it returns the result, if not it runs the function. The memoize decorator below was adapted from here.

from functools import wraps
def memoize(func):
    cache = {}
    @wraps(func)
    def wrap(*args):
        if args not in cache:
            print 'Running func'
            cache[args] = func(*args)
        else:
            print 'result in cache'
        return cache[args]
    return wrap

@memoize
def myfunc(a):
    return a**2

print myfunc(2)
print myfunc(2)

print myfunc(3)
print myfunc(2)
Running func
4
result in cache
4
Running func
9
result in cache
4

The example above shows the principle, but each time you run that script you start from scratch. If those were expensive calculations that would not be desirable. Let us now write out the cache to a file. We use a simple pickle file to store the results.

import os, pickle
from functools import wraps
def memoize(func):
    if os.path.exists('memoize.pkl'):
        print 'reading cache file'
        with open('memoize.pkl') as f:
            cache = pickle.load(f)
    else:
        cache = {}
    @wraps(func)
    def wrap(*args):
        if args not in cache:
            print 'Running func'
            cache[args] = func(*args)
            # update the cache file
            with open('memoize.pkl', 'wb') as f:
                pickle.dump(cache, f)
        else:
            print 'result in cache'
        return cache[args]
    return wrap

@memoize
def myfunc(a):
    return a**2


print myfunc(2)
print myfunc(2)

print myfunc(3)
print myfunc(2)
reading cache file
result in cache
4
result in cache
4
result in cache
9
result in cache
4

Now you can see if we run this script a few times, the results are read from the cache file.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Using an external solver with Aspen

| categories: programming | tags: aspen

One reason to interact with Aspen via python is to use external solvers to drive the simulations. Aspen has some built-in solvers, but it does not have everything. You may also want to integrate additional calculations, e.g. capital costs, water usage, etc… and integrate those results into a report.

Here is a simple example where we use fsolve to find the temperature of the flash tank that will give a vapor phase mole fraction of ethanol of 0.8. It is a simple example, but it illustrates the possibility.

import os
import win32com.client as win32
aspen = win32.Dispatch('Apwn.Document')

aspen.InitFromArchive2(os.path.abspath('data\Flash_Example.bkp'))

from scipy.optimize import fsolve

def func(flashT):
    flashT = float(flashT) # COM objects do not understand numpy types
    aspen.Tree.FindNode('\Data\Blocks\FLASH\Input\TEMP').Value = flashT
    aspen.Engine.Run2()
    y = aspen.Tree.FindNode('\Data\Streams\VAPOR\Output\MOLEFRAC\MIXED\ETHANOL').Value
    return y - 0.8

sol, = fsolve(func, 150.0)
print 'A flash temperature of {0:1.2f} degF will have y_ethanol = 0.8'.format(sol)
A flash temperature of 157.38 degF will have y_ethanol = 0.8

One unexpected detail was that the Aspen COM objects cannot be assigned numpy number types, so it was necessary to recast the argument as a float. Otherwise, this worked about as expected for an fsolve problem.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter

Automatic, temporary directory changing

| categories: programming | tags:

If you are doing some analysis that requires you to change directories, e.g. to read a file, and then change back to another directory to read another file, you have probably run into problems if there is an error somewhere. You would like to make sure that the code changes back to the original directory after each error. We will look at a few ways to accomplish that here.

The try/except/finally method is the traditional way to handle exceptions, and make sure that some code "finally" runs. Let us look at two examples here. In the first example, we try to change into a directory that does not exist.

import os, sys

CWD = os.getcwd() # store initial position
print 'initially inside {0}'.format(os.getcwd())
TEMPDIR = 'data/run1' # this does not exist

try:
    os.chdir(TEMPDIR)
    print 'inside {0}'.format(os.getcwd())
except:
    print 'Exception caught: ',sys.exc_info()[0]
finally:
    print 'Running final code'
    os.chdir(CWD)
    print 'finally inside {0}'.format(os.getcwd())
initially inside c:\users\jkitchin\Dropbox\pycse
Exception caught:  <type 'exceptions.WindowsError'>
Running final code
finally inside c:\users\jkitchin\Dropbox\pycse

Now, let us look at an example where the directory does exist. We will change into the directory, run some code, and then raise an Exception.

import os, sys

CWD = os.getcwd() # store initial position
print 'initially inside {0}'.format(os.getcwd())
TEMPDIR = 'data'

try:
    os.chdir(TEMPDIR)
    print 'inside {0}'.format(os.getcwd())
    print os.listdir('.')
    raise Exception('boom')
except:
    print 'Exception caught: ',sys.exc_info()[0]
finally:
    print 'Running final code'
    os.chdir(CWD)
    print 'finally inside {0}'.format(os.getcwd())
initially inside c:\users\jkitchin\Dropbox\pycse
inside c:\users\jkitchin\Dropbox\pycse\data
['antoine_data.dat', 'antoine_database.mat', 'commonshellsettings.xml', 'cstr-zeroth-order.xlsx', 'debug-2.txt', 'debug-3.txt', 'debug-4.txt', 'debug.txt', 'example.xlsx', 'example2.xls', 'example3.xls', 'example4.xls', 'example4.xlsx', 'Flash_Example.apw', 'Flash_Example.bkp', 'Flash_Example.def', 'gc-data-21.txt', 'PT.txt', 'raman.txt', 'testdata.txt']
Exception caught:  <type 'exceptions.Exception'>
Running final code
finally inside c:\users\jkitchin\Dropbox\pycse

You can see that we changed into the directory, ran some code, and then caught an exception. Afterwards, we changed back to our original directory. This code works fine, but it is somewhat verbose, and tedious to write over and over. We can get a cleaner syntax with a context manager. The context manager uses the with keyword in python. In a context manager some code is executed on entering the "context", and code is run on exiting the context. We can use that to automatically change directory, and when done, change back to the original directory. We use the contextlib.contextmanager decorator on a function. With a function, the code up to a yield statement is run on entering the context, and the code after the yield statement is run on exiting. We wrap the yield statement in try/except/finally block to make sure our final code gets run.

import contextlib
import os, sys

@contextlib.contextmanager
def cd(path):
    print 'initially inside {0}'.format(os.getcwd())
    CWD = os.getcwd()
    
    os.chdir(path)
    print 'inside {0}'.format(os.getcwd())
    try:
        yield
    except:
        print 'Exception caught: ',sys.exc_info()[0]
    finally:
        print 'finally inside {0}'.format(os.getcwd())
        os.chdir(CWD)

# Now we use the context manager
with cd('data'):
    print os.listdir('.')
    raise Exception('boom')

print
with cd('data/run2'):
    print os.listdir('.')

One case that is not handled well with this code is if the directory you want to change into does not exist. In that case an exception is raised on entering the context when you try change into a directory that does not exist. An alternative class based context manager can be found here.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter
« Previous Page -- Next Page »