Programming

Some of this, sum of that

Matlab plot

Python provides a sum function to compute the sum of a list. However, the sum function does not work on every arrangement of numbers, and it certainly does not work on nested lists. We will solve this problem with recursion.

Here is a simple example.

v = [1, 2, 3, 4, 5, 6, 7, 8, 9] # a list
print(sum(v))

v = (1, 2, 3, 4, 5, 6, 7, 8, 9)  # a tuple
print(sum(v))
45
45

If you have data in a dictionary, sum works by default on the keys. You can give the sum function the values like this.

v = {'a':1, 'b':3, 'c':4}
print(sum(v.values()))
8

Nested lists

Suppose now we have nested lists. This kind of structured data might come up if you had grouped several things together. For example, suppose we have 5 departments, with 1, 5, 15, 7 and 17 people in them, and in each department they are divided into groups.

Department 1: 1 person Department 2: group of 2 and group of 3 Department 3: group of 4 and 11, with a subgroups of 5 and 6 making up the group of 11. Department 4: 7 people Department 5: one group of 8 and one group of 9.

We might represent the data like this nested list. Now, if we want to compute the total number of people, we need to add up each group. We cannot simply sum the list, because some elements are single numbers, and others are lists, or lists of lists. We need to recurse through each entry until we get down to a number, which we can add to the running sum.

v = [1,
    [2, 3],
    [4, [5, 6]],
    7,
    [8,9]]

def recursive_sum(X):
    'compute sum of arbitrarily nested lists'
    s = 0 # initial value of the sum

    for i in range(len(X)):
        import types  # we use this to test if we got a number
        if isinstance(X[i], (int, float, complex)):
            # this is the terminal step
            s += X[i]
        else:
            # we did not get a number, so we recurse
            s += recursive_sum(X[i])
    return s

print(recursive_sum(v))
print(recursive_sum([1, 2, 3, 4, 5, 6, 7, 8, 9])) # test on non-nested list
45
45

In Post 1970 we examined recursive functions that could be replaced by loops. Here we examine a function that can only work with recursion because the nature of the nested data structure is arbitrary. There are arbitrary branches and depth in the data structure. Recursion is nice because you do not have to define that structure in advance.

Sorting in python

Matlab post

Occasionally it is important to have sorted data. Python has a few sorting options.

a = [4, 5, 1, 6, 8, 3, 2]
print(a)
a.sort()  # inplace sorting
print(a)

a.sort(reverse=True)
print(a)
[4, 5, 1, 6, 8, 3, 2]
[1, 2, 3, 4, 5, 6, 8]
[8, 6, 5, 4, 3, 2, 1]

If you do not want to modify your list, but rather get a copy of a sorted list, use the sorted command.

a = [4, 5, 1, 6, 8, 3, 2]
print('sorted a = ',sorted(a))  # no change to a
print('sorted a = ',sorted(a, reverse=True))  # no change to a
print('a        = ',a)
sorted a =  [1, 2, 3, 4, 5, 6, 8]
sorted a =  [8, 6, 5, 4, 3, 2, 1]
a        =  [4, 5, 1, 6, 8, 3, 2]

This works for strings too:

a = ['b', 'a', 'c', 'tree']
print(sorted(a))
['a', 'b', 'c', 'tree']

Here is a subtle point though. A capitalized letter comes before a lowercase letter. We can pass a function to the sorted command that is called on each element prior to the sort. Here we make each word lower case before sorting.

a = ['B', 'a', 'c', 'tree']
print(sorted(a))

# sort by lower case letter
print(sorted(a, key=str.lower))
['B', 'a', 'c', 'tree']
['a', 'B', 'c', 'tree']

Here is a more complex sorting problem. We have a list of tuples with group names and the letter grade. We want to sort the list by the letter grades. We do this by creating a function that maps the letter grades to the position of the letter grades in a sorted list. We use the list.index function to find the index of the letter grade, and then sort on that.

groups = [('group1', 'B'),
          ('group2', 'A+'),
          ('group3', 'A')]

def grade_key(gtup):
    '''gtup is a tuple of ('groupname', 'lettergrade')'''
    lettergrade = gtup[1]

    grades = ['A++', 'A+', 'A', 'A-', 'A/B'
              'B+', 'B', 'B-', 'B/C',
              'C+', 'C', 'C-', 'C/D',
              'D+', 'D', 'D-', 'D/R',
              'R+', 'R', 'R-', 'R--']

    return grades.index(lettergrade)

print(sorted(groups, key=grade_key))
[('group2', 'A+'), ('group3', 'A'), ('group1', 'B')]

Unique entries in a vector

Matlab post

It is surprising how often you need to know only the unique entries in a vector of entries. In python, we create a “set” from a list, which only contains unique entries. Then we convert the set back to a list.

a = [1, 1, 2, 3, 4, 5, 3, 5]

b = list(set(a))
print(b)
[1, 2, 3, 4, 5]
a = ['a',
    'b',
    'abracadabra',
    'b',
    'c',
    'd',
    'b']

print(list(set(a)))
['abracadabra', 'b', 'd', 'c', 'a']

Lather, rinse and repeat

Matlab post

Recursive functions are functions that call themselves repeatedly until some exit condition is met. Today we look at a classic example of recursive function for computing a factorial. The factorial of a non-negative integer n is denoted n!, and is defined as the product of all positive integers less than or equal to n.

The key ideas in defining a recursive function is that there needs to be some logic to identify when to terminate the function. Then, you need logic that calls the function again, but with a smaller part of the problem. Here we recursively call the function with n-1 until it gets called with n=0. 0! is defined to be 1.

def recursive_factorial(n):
    '''compute the factorial recursively. Note if you put a negative
    number in, this function will never end. We also do not check if
    n is an integer.'''
    if n == 0:
        return 1
    else:
        return n * recursive_factorial(n - 1)

print(recursive_factorial(5))
120
from scipy.special import factorial
print(factorial(5))
120.0

Compare to a loop solution

This example can also be solved by a loop. This loop is easier to read and understand than the recursive function. Note the recursive nature of defining the variable as itself times a number.

n = 5
factorial_loop = 1
for i in range(1, n + 1):
    factorial_loop *= i

print(factorial_loop)
120

There are some significant differences in this example than in Matlab.

  1. the syntax of the for loop is quite different with the use of the in operator.

  2. python has the nice *= operator to replace a = a * i

  3. We have to loop from 1 to n+1 because the last number in the range is not returned.

Conclusions

Recursive functions have a special niche in mathematical programming. There is often another way to accomplish the same goal. That is not always true though, and in a future post we will examine cases where recursion is the only way to solve a problem.

Brief intro to regular expressions

Matlab post

This example shows how to use a regular expression to find strings matching the pattern :cmd:`datastring`. We want to find these strings, and then replace them with something that depends on what cmd is, and what datastring is.

Let us define some commands that will take datasring as an argument, and return the modified text. The idea is to find all the cmds, and then run them. We use python’s eval command to get the function handle from a string, and the cmd functions all take a datastring argument (we define them that way). We will create commands to replace :cmd:`datastring` with html code for a light gray background, and :red:`some text` with html code making the text red.

text = r'''Here is some text. use the :cmd:`open` to get the text into
          a variable. It might also be possible to get a multiline
            :red:`line
     2` directive.'''

print(text)
print('---------------------------------')
Here is some text. use the :cmd:`open` to get the text into
          a variable. It might also be possible to get a multiline
            :red:`line
     2` directive.
---------------------------------

Now, we define our functions.

def cmd(datastring):
    ' replace :cmd:`datastring` with html code with light gray background'
    s = '<FONT style="BACKGROUND-COLOR: LightGray">%{0}</FONT>';
    html = s.format(datastring)
    return html

def red(datastring):
    'replace :red:`datastring` with html code to make datastring in red font'
    html = '<font color=red>{0}</font>'.format(datastring)
    return html

Finally, we do the regular expression. Regular expressions are hard. There are whole books on them. The point of this post is to alert you to the possibilities. I will break this regexp down as follows. 1. we want everything between :*: as the directive. ([^:]*) matches everything not a :. :([^:]*): matches the stuff between two :. 2. then we want everything between `*`. ([^`]*) matches everything not a `. 3. The () makes a group that python stores so we can refer to them later.

import re
regex = ':([^:]*):`([^`]*)`'
matches = re.findall(regex, text)
for directive, datastring in matches:
    directive = eval(directive) # get the function
    text = re.sub(regex, directive(datastring), text)

print('Modified text:')
print(text)
Modified text:
Here is some text. use the <FONT style="BACKGROUND-COLOR: LightGray">%open</FONT> to get the text into
          a variable. It might also be possible to get a multiline
            <FONT style="BACKGROUND-COLOR: LightGray">%open</FONT> directive.

Working with lists

It is not too uncommon to have a list of data, and then to apply a function to every element, to filter the list, or extract elements that meet some criteria. In this example, we take a string and split it into words. Then, we will examine several ways to apply functions to the words, to filter the list to get data that meets some criteria. Here is the string splitting.

text = '''
 As we have seen, handling units with third party functions is fragile, and often requires additional code to wrap the function to handle the units. An alternative approach that avoids the wrapping is to rescale the equations so they are dimensionless. Then, we should be able to use all the standard external functions without modification. We obtain the final solutions by rescaling back to the answers we want.

Before doing the examples, let us consider how the quantities package handles dimensionless numbers.

import quantities as u

a = 5 * u.m
L = 10 * u.m # characteristic length

print a/L
print type(a/L)

'''

words = text.split()
print(words)
['As', 'we', 'have', 'seen,', 'handling', 'units', 'with', 'third', 'party', 'functions', 'is', 'fragile,', 'and', 'often', 'requires', 'additional', 'code', 'to', 'wrap', 'the', 'function', 'to', 'handle', 'the', 'units.', 'An', 'alternative', 'approach', 'that', 'avoids', 'the', 'wrapping', 'is', 'to', 'rescale', 'the', 'equations', 'so', 'they', 'are', 'dimensionless.', 'Then,', 'we', 'should', 'be', 'able', 'to', 'use', 'all', 'the', 'standard', 'external', 'functions', 'without', 'modification.', 'We', 'obtain', 'the', 'final', 'solutions', 'by', 'rescaling', 'back', 'to', 'the', 'answers', 'we', 'want.', 'Before', 'doing', 'the', 'examples,', 'let', 'us', 'consider', 'how', 'the', 'quantities', 'package', 'handles', 'dimensionless', 'numbers.', 'import', 'quantities', 'as', 'u', 'a', '=', '5', '*', 'u.m', 'L', '=', '10', '*', 'u.m', '#', 'characteristic', 'length', 'print', 'a/L', 'print', 'type(a/L)']

Let us get the length of each word.

print([len(word) for word in words])

# functional approach with a lambda function
print(list(map(lambda word: len(word), words)))

# functional approach with a builtin function
print(list(map(len, words)))

# functional approach with a user-defined function
def get_length(word):
    return len(word)

print(list(map(get_length, words)))
[2, 2, 4, 5, 8, 5, 4, 5, 5, 9, 2, 8, 3, 5, 8, 10, 4, 2, 4, 3, 8, 2, 6, 3, 6, 2, 11, 8, 4, 6, 3, 8, 2, 2, 7, 3, 9, 2, 4, 3, 14, 5, 2, 6, 2, 4, 2, 3, 3, 3, 8, 8, 9, 7, 13, 2, 6, 3, 5, 9, 2, 9, 4, 2, 3, 7, 2, 5, 6, 5, 3, 9, 3, 2, 8, 3, 3, 10, 7, 7, 13, 8, 6, 10, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 3, 1, 14, 6, 5, 3, 5, 9]
[2, 2, 4, 5, 8, 5, 4, 5, 5, 9, 2, 8, 3, 5, 8, 10, 4, 2, 4, 3, 8, 2, 6, 3, 6, 2, 11, 8, 4, 6, 3, 8, 2, 2, 7, 3, 9, 2, 4, 3, 14, 5, 2, 6, 2, 4, 2, 3, 3, 3, 8, 8, 9, 7, 13, 2, 6, 3, 5, 9, 2, 9, 4, 2, 3, 7, 2, 5, 6, 5, 3, 9, 3, 2, 8, 3, 3, 10, 7, 7, 13, 8, 6, 10, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 3, 1, 14, 6, 5, 3, 5, 9]
[2, 2, 4, 5, 8, 5, 4, 5, 5, 9, 2, 8, 3, 5, 8, 10, 4, 2, 4, 3, 8, 2, 6, 3, 6, 2, 11, 8, 4, 6, 3, 8, 2, 2, 7, 3, 9, 2, 4, 3, 14, 5, 2, 6, 2, 4, 2, 3, 3, 3, 8, 8, 9, 7, 13, 2, 6, 3, 5, 9, 2, 9, 4, 2, 3, 7, 2, 5, 6, 5, 3, 9, 3, 2, 8, 3, 3, 10, 7, 7, 13, 8, 6, 10, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 3, 1, 14, 6, 5, 3, 5, 9]
[2, 2, 4, 5, 8, 5, 4, 5, 5, 9, 2, 8, 3, 5, 8, 10, 4, 2, 4, 3, 8, 2, 6, 3, 6, 2, 11, 8, 4, 6, 3, 8, 2, 2, 7, 3, 9, 2, 4, 3, 14, 5, 2, 6, 2, 4, 2, 3, 3, 3, 8, 8, 9, 7, 13, 2, 6, 3, 5, 9, 2, 9, 4, 2, 3, 7, 2, 5, 6, 5, 3, 9, 3, 2, 8, 3, 3, 10, 7, 7, 13, 8, 6, 10, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 3, 1, 14, 6, 5, 3, 5, 9]

Now let us get all the words that start with the letter “a”. This is sometimes called filtering a list. We use a string function startswith to check for upper and lower-case letters. We will use list comprehension with a condition.

print([word for word in words if word.startswith('a') or word.startswith('A')])

# make word lowercase to simplify the conditional statement
print([word for word in words if word.lower().startswith('a')])
['As', 'and', 'additional', 'An', 'alternative', 'approach', 'avoids', 'are', 'able', 'all', 'answers', 'as', 'a', 'a/L']
['As', 'and', 'additional', 'An', 'alternative', 'approach', 'avoids', 'are', 'able', 'all', 'answers', 'as', 'a', 'a/L']

A slightly harder example is to find all the words that are actually numbers. We could use a regular expression for that, but we will instead use a function we create. We use a function that tries to cast a word as a float. If this fails, we know the word is not a float, so we return False.

def float_p(word):
    try:
        float(word)
        return True
    except ValueError:
        return False

print([word for word in words if float_p(word)])

# here is a functional approach
print(list(filter(float_p, words)))
['5', '10']
['5', '10']

Finally, we consider filtering the list to find all words that contain certain symbols, say any character in this string “./=*#”. Any of those characters will do, so we search each word for one of them, and return True if it contains it, and False if none are contained.

def punctuation_p(word):
    S = './=*#'
    for s in S:
        if s in word:
            return True
    return False

print([word for word in words if punctuation_p(word)])
print(list(filter(punctuation_p, words)))
['units.', 'dimensionless.', 'modification.', 'want.', 'numbers.', '=', '*', 'u.m', '=', '*', 'u.m', '#', 'a/L', 'type(a/L)']
['units.', 'dimensionless.', 'modification.', 'want.', 'numbers.', '=', '*', 'u.m', '=', '*', 'u.m', '#', 'a/L', 'type(a/L)']

In this section we examined a few ways to interact with lists using list comprehension and functional programming. These approaches make it possible to work on arbitrary size lists, without needing to know in advance how big the lists are. New lists are automatically generated as results, without the need to preallocate lists, i.e. you do not need to know the size of the output. This can be handy as it avoids needing to write loops in some cases and leads to more compact code.

Ordinarily a print statement prints to stdout, or your terminal/screen. You can redirect this so that printing is done to a file, for example. This might be helpful if you use print statements for debugging, and later want to save what is printed to a file. Here we make a simple function that prints some things.

Getting a dictionary of counts

I frequently want to take a list and get a dictionary of keys that have the count of each element in the list. Here is how I have typically done this countless times in the past.

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']

d = {}
for el in L:
    if el in d:
        d[el] += 1
    else:
        d[el] = 1

print(d)
{'a': 3, 'b': 2, 'd': 1, 'e': 2}

That seems like too much code, and that there must be a list comprehension approach combined with a dictionary constructor.

L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']

print(dict((el,L.count(el)) for el in L))
{'a': 3, 'b': 2, 'd': 1, 'e': 2}

Wow, that is a lot simpler! I suppose for large lists this might be slow, since count must look through the list for each element, whereas the longer code looks at each element once, and does one conditional analysis.

Here is another example of much shorter and cleaner code.

from collections import Counter
L = ['a', 'a', 'b','d', 'e', 'b', 'e', 'a']
print(Counter(L))
print(Counter(L)['a'])
Counter({'a': 3, 'b': 2, 'e': 2, 'd': 1})
3

About your python

import sys

print(sys.version)

print(sys.executable)

print(sys.platform)

# where the platform independent Python files are installed
print(sys.prefix)
3.8.18 (default, Aug 28 2023, 08:27:22) 
[GCC 11.4.0]
/opt/hostedtoolcache/Python/3.8.18/x64/bin/python
linux
/opt/hostedtoolcache/Python/3.8.18/x64

The platform module provides similar, complementary information.

import platform

print(platform.uname())
print(platform.system())
print(platform.architecture())
print(platform.machine())
print(platform.node())
print(platform.platform())
print(platform.processor())
print(platform.python_build())
print(platform.python_version())
uname_result(system='Linux', node='fv-az338-771', release='6.2.0-1011-azure', version='#11~22.04.1-Ubuntu SMP Wed Aug 23 19:26:19 UTC 2023', machine='x86_64', processor='x86_64')
Linux
('64bit', 'ELF')
x86_64
fv-az338-771
Linux-6.2.0-1011-azure-x86_64-with-glibc2.34
x86_64
('default', 'Aug 28 2023 08:27:22')
3.8.18

If you are doing some analysis that requires you to change directories, e.g. to read a file, and then change back to another directory to read another file, you have probably run into problems if there is an error somewhere. You would like to make sure that the code changes back to the original directory after each error. We will look at a few ways to accomplish that here.

The try/except/finally method is the traditional way to handle exceptions, and make sure that some code “finally” runs. Let us look at two examples here. In the first example, we try to change into a directory that does not exist.

import os, sys

CWD = os.getcwd() # store initial position
print('initially inside {0}'.format(os.getcwd()))
TEMPDIR = 'data/run1' # this does not exist

try:
    os.chdir(TEMPDIR)
    print('inside {0}'.format(os.getcwd()))
except:
    print('Exception caught: ',sys.exc_info()[0])
finally:
    print('Running final code')
    os.chdir(CWD)
    print('finally inside {0}'.format(os.getcwd()))
initially inside /home/runner/work/pycse/pycse/pycse-jb/pycse___python_computations_in_science_and_engineering/blog
Exception caught:  <class 'FileNotFoundError'>
Running final code
finally inside /home/runner/work/pycse/pycse/pycse-jb/pycse___python_computations_in_science_and_engineering/blog

Now, let us look at an example where the directory does exist. We will change into the directory, run some code, and then raise an Exception.

import os, sys

CWD = os.getcwd() # store initial position
print('initially inside {0}'.format(os.getcwd()))
TEMPDIR = 'data'

try:
    os.chdir(TEMPDIR)
    print('inside {0}'.format(os.getcwd()))
    print(os.listdir('.'))
    raise Exception('boom')
except:
    print('Exception caught: ',sys.exc_info()[0])
finally:
    print('Running final code')
    os.chdir(CWD)
    print('finally inside {0}'.format(os.getcwd()))
initially inside /home/runner/work/pycse/pycse/pycse-jb/pycse___python_computations_in_science_and_engineering/blog
inside /home/runner/work/pycse/pycse/pycse-jb/pycse___python_computations_in_science_and_engineering/blog/data
['debug-3.txt', 'example.xlsx', 'example2.xls', 'example4.xlsx', 'test.docx', 'gc-data-21.txt', 'testdata.txt', 'debug-4.txt', 'antoine_database.mat', 'example4.xls', 'raman.txt', 'debug.txt', 'PT.txt', 'debug-2.txt', 'example3.xls', 'antoine_data.dat']
Exception caught:  <class 'Exception'>
Running final code
finally inside /home/runner/work/pycse/pycse/pycse-jb/pycse___python_computations_in_science_and_engineering/blog

You can see that we changed into the directory, ran some code, and then caught an exception. Afterwards, we changed back to our original directory. This code works fine, but it is somewhat verbose, and tedious to write over and over. We can get a cleaner syntax with a context manager. The context manager uses the with keyword in python. In a context manager some code is executed on entering the “context”, and code is run on exiting the context. We can use that to automatically change directory, and when done, change back to the original directory. We use the contextlib.contextmanager decorator on a function. With a function, the code up to a yield statement is run on entering the context, and the code after the yield statement is run on exiting. We wrap the yield statement in try/except/finally block to make sure our final code gets run.

import contextlib
import os, sys

@contextlib.contextmanager
def cd(path):
    print('initially inside {0}'.format(os.getcwd()))
    CWD = os.getcwd()

    os.chdir(path)
    print('inside {0}'.format(os.getcwd()))
    try:
        yield
    except:
        print('Exception caught: ',sys.exc_info()[0])
    finally:
        print('finally inside {0}'.format(os.getcwd()))
        os.chdir(CWD)

# Now we use the context manager
with cd('../data'):
    print(os.listdir('.'))
    raise Exception('boom')

print()
with cd('../data/run2'):
    print(os.listdir('.'))
initially inside /home/runner/work/pycse/pycse/pycse-jb/pycse___python_computations_in_science_and_engineering/blog
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[29], line 20
     17         os.chdir(CWD)
     19 # Now we use the context manager
---> 20 with cd('../data'):
     21     print(os.listdir('.'))
     22     raise Exception('boom')

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/contextlib.py:113, in _GeneratorContextManager.__enter__(self)
    111 del self.args, self.kwds, self.func
    112 try:
--> 113     return next(self.gen)
    114 except StopIteration:
    115     raise RuntimeError("generator didn't yield") from None

Cell In[29], line 9, in cd(path)
      6 print('initially inside {0}'.format(os.getcwd()))
      7 CWD = os.getcwd()
----> 9 os.chdir(path)
     10 print('inside {0}'.format(os.getcwd()))
     11 try:

FileNotFoundError: [Errno 2] No such file or directory: '../data'

One case that is not handled well with this code is if the directory you want to change into does not exist. In that case an exception is raised on entering the context when you try change into a directory that does not exist. An alternative class based context manager can be found here.