Using tags searches on objects in python
Posted March 24, 2014 at 09:52 PM | categories: python | tags:
I am exploring the possibility of using tags on python objects in conjunction with searches to find sets of objects. Here I want to explore some syntax and methods for doing that.
In org-mode there is a syntax like '+boss+urgent-project1' for and
and not
operators and 'A|B' for or
operators. I think we need pyparsing to untangle this kind of syntax. See http://pyparsing.wikispaces.com/file/view/simpleBool.py for an example. Another alternative might be the natural language toolkit (nltk ). Before we dig into those, let us see some python ways of doing the logic.
Below we define some lists containing tags (strings). We
a = ['A', 'B', 'C'] b = ['A', 'B'] c = ['A', 'C'] d = [ 'B', 'C'] all_lists = [a, b, c, d] # get functions with tags A and B print 'A and B ',[x for x in all_lists if ('A' in x) & ('B' in x)] # A not B print 'A not B ',[x for x in all_lists if ('A' in x) & ('B' not in x)] # B or C print 'B or C ', [x for x in all_lists if ('B' in x) | ('C' in x)] # B or C but not both print 'B xor C ',[x for x in all_lists if ('B' in x) ^ ('C' in x)]
A and B [['A', 'B', 'C'], ['A', 'B']] A not B [['A', 'C']] B or C [['A', 'B', 'C'], ['A', 'B'], ['A', 'C'], ['B', 'C']] B xor C [['A', 'B'], ['A', 'C']]
Those are not too bad. Somehow I would have to get pyparsing to generate that syntax. That will take a lot of studying. There are some other ways to do this too. Let us try that out with itertools.
a = ['A', 'B', 'C'] b = ['A', 'B'] c = ['A', 'C'] d = [ 'B', 'C'] all_lists = [a, b, c, d] import itertools as it # ifilter returns an iterator print 'A and B ', list(it.ifilter(lambda x: ('A' in x) & ('B' in x), all_lists))
A and B [['A', 'B', 'C'], ['A', 'B']]
I do not like this syntax better. The iterator is lazy, so we have to wrap it in a list to get the results. Eventually, I want to do something like these:
filter('A and B', all_lists) A or B A xor B not A and B not(A and B)
I think that calls for pyparsing. I think the syntax above is better (more readable) than this:
filter('A & B', all_lists) A | B A ^ B ~A & B ~(A & B)
It is not that obvious though how to get from that syntax to the code I illustrated above though.
Copyright (C) 2014 by John Kitchin. See the License for information about copying.
Org-mode version = 8.2.5h