Using tags to filter lists in Python
Posted January 29, 2014 at 12:52 PM | categories: python | tags:
Updated January 29, 2014 at 12:52 PM
Suppose you have a collection of items in a list, and you want to filter the list based on some properties of the items, and then accumulate some other property on the filtered items. We will look at some strategies for this here.
The particular application is that I have a list of courses that make up a curriculum, and I want to summarize the curriculum in a variety of ways. For example, I might want to know how many Gen Ed courses there are, or how many math, chemistry, biology and physics courses there are. I may want to know how may units overall are required.
A course will be represented by a class, which simply holds the data about the course. Here we consider the course number (which is really a string), the number of units of the course, and what category the course fits into. There will be 7 categories here: chemistry, biology, physics, math, engineering, general education, and free elective.
We will use some binary math to represent the categories. Essentially we define tags as if they are binary numbers, and then we can use binary operators to tell if an item is tagged a particular way. We use & to do a logical AND between a variable and a TAG. If it comes out True, the variable has that tag.
This works basically by defining a TAG like a binary number, e.g. TAG1 = 100, TAG2 = 010, TAG3 = 001. Then, if you have a number like 110, you know it is tagged with TAG1 and TAG2, but not TAG3. We can figure that out with code too.
100 & 110 = 100 = 1 010 & 110 = 010 = 2
print 1 & 3 print 2 & 3
1 2
Let us try out an example. The easiest way to define the tags, is as powers of two.
# define some tags TAG1 = 2**0 # 100 TAG2 = 2**1 # 010 # Now define a variable that is "tagged" a = TAG1 print a & TAG1 # remember that 0 = False, everything else is true print a & TAG2
1 0
We can use multiple tags by adding them together.
# define some tags TAG1 = 2**0 # 100 TAG2 = 2**1 # 010 TAG3 = 2**2 # 001 # Now define a variable that is "tagged" a = TAG1 + TAG2 # 1 + 2 = 3 = 110 in binary print a & TAG1 print a & TAG2 print a & TAG3
1 2 0
You can see that the variable is not tagged by TAG3, but is tagged with TAG1 and TAG2. We might want to tag an item with more than one tag. We create groups of tags by simply adding them together. We can still check if a variable has a particular tag like we did before.
# define some tags TAG1 = 2**0 # 100 TAG2 = 2**1 # 010 TAG3 = 2**2 # 001 # Now define a variable that is "tagged" a = TAG1 + TAG2 # 1 + 2 = 3 = 110 in binary print a & TAG1 print a & TAG2 print a & TAG3
1 2 0
It is trickier to say if a variable is tagged with a particular set of tags. Let us consider why. The binary representation of TAG1 + TAG2 is 110. The binary representation of TAG2 + TAG3 is 011. If we simply consider (TAG1 + TAG2) & (TAG2 & TAG3) we get 010. That actually tells us that we do not have a match, because 010 is not equal to (TAG2 & TAG3 = 011). In other words, the logical AND of the tag with some sum of tags is equal to the sum of tags when there is a match. So, we can check if that is the case with an equality comparison.
# define some tags TAG1 = 2**0 # 100 TAG2 = 2**1 # 010 TAG3 = 2**2 # 001 # Now define a variable that is "tagged" a = TAG1 + TAG2 # 1 + 2 = 3 = 110 in binary print (a & (TAG1 + TAG2)) == TAG1 + TAG2 print (a & (TAG1 + TAG3)) == TAG1 + TAG3 print (a & (TAG2 + TAG3)) == TAG2 + TAG3
True False False
Ok, enough binary math, let us see an application. Below we create a set of tags indicating the category a course falls into, a class definition to store course data in attributes of an object, and a list of courses. Then, we show some examples of list comprehension filtering based on the tags to summarize properties of the list. The logical comparisons are simple below, as the courses are not multiply tagged at this point.
CHEMISTRY = 2**0 BIOLOGY = 2**1 PHYSICS = 2**2 MATH = 2**3 ENGINEERING = 2**4 GENED = 2**5 FREE = 2**6 class Course: '''simple container for course information''' def __init__(self, number, units, category): self.number = number self.units = units self.category = category def __repr__(self): return self.number courses = [Course('09-105', 9, CHEMISTRY), Course('09-106', 9, CHEMISTRY), Course('33-105', 12, PHYSICS), Course('33-106', 12, PHYSICS), Course('21-120', 10, MATH), Course('21-122', 10, MATH), Course('21-259', 10, MATH), Course('06-100', 12, ENGINEERING), Course('xx-xxx', 9, GENED), Course('xx-xxx', 9, FREE), Course('03-232', 9, BIOLOGY)] # print the total units print ' Total units = {0}'.format(sum([x.units for x in courses])) # get units of math required math_units = sum([x.units for x in courses if x.category & MATH]) # get total units of math, chemistry, physics and biology a | b is a # logical OR. This gives a prescription for tagged with MATH OR # CHEMISTRY OR PHYSICS OR BIOLOGY BASIC_MS = MATH | CHEMISTRY | PHYSICS | BIOLOGY # total units in those categories basic_math_science = sum([x.units for x in courses if x.category & BASIC_MS]) print 'We require {0} units of math out of {1} units of basic math and science courses.'.format(math_units, basic_math_science) # We are required to have at least 96 units of Math and Sciences. print 'We are compliant on number of Math and science: ',basic_math_science >= 96
Total units = 111 We require 30 units of math out of 81 units of basic math and science courses. We are compliant on number of Math and science: False
That is all for this example. With more data for each course, you could see what courses are taken in what semesters, how many units are in each semester, maybe create a prerequisite map, and view the curriculum by categories of courses, etc…
Copyright (C) 2014 by John Kitchin. See the License for information about copying.
Org-mode version = 8.2.5g