Using tags searches on objects in python

| categories: python | tags:

I am exploring the possibility of using tags on python objects in conjunction with searches to find sets of objects. Here I want to explore some syntax and methods for doing that.

In org-mode there is a syntax like '+boss+urgent-project1' for and and not operators and 'A|B' for or operators. I think we need pyparsing to untangle this kind of syntax. See http://pyparsing.wikispaces.com/file/view/simpleBool.py for an example. Another alternative might be the natural language toolkit (nltk ). Before we dig into those, let us see some python ways of doing the logic.

Below we define some lists containing tags (strings). We

a = ['A', 'B', 'C']

b = ['A', 'B']

c = ['A', 'C']

d = [ 'B', 'C']

all_lists = [a, b, c, d]

# get functions with tags A and B
print 'A and B ',[x for x in all_lists if ('A' in x) & ('B' in x)]

# A not B
print 'A not B ',[x for x in all_lists if ('A' in x) & ('B' not in x)]

# B or C
print 'B or C ', [x for x in all_lists if ('B' in x) | ('C' in x)]

# B or C but not both
print 'B xor C ',[x for x in all_lists if ('B' in x) ^ ('C' in x)]
A and B  [['A', 'B', 'C'], ['A', 'B']]
A not B  [['A', 'C']]
B or C  [['A', 'B', 'C'], ['A', 'B'], ['A', 'C'], ['B', 'C']]
B xor C  [['A', 'B'], ['A', 'C']]

Those are not too bad. Somehow I would have to get pyparsing to generate that syntax. That will take a lot of studying. There are some other ways to do this too. Let us try that out with itertools.

a = ['A', 'B', 'C']

b = ['A', 'B']

c = ['A', 'C']

d = [ 'B', 'C']

all_lists = [a, b, c, d]

import itertools as it

# ifilter returns an iterator
print 'A and B ', list(it.ifilter(lambda x: ('A' in x) & ('B' in x), all_lists))
A and B  [['A', 'B', 'C'], ['A', 'B']]

I do not like this syntax better. The iterator is lazy, so we have to wrap it in a list to get the results. Eventually, I want to do something like these:

filter('A and B', all_lists)
A or B
A xor B
not A and B
not(A and B)

I think that calls for pyparsing. I think the syntax above is better (more readable) than this:

filter('A & B', all_lists)
A | B
A ^ B
~A & B
~(A & B)

It is not that obvious though how to get from that syntax to the code I illustrated above though.

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5h

Discuss on Twitter