** Brief intro to regular expressions
:PROPERTIES:
:categories: regular expressions
:date: 2013/03/03 15:04:31
:updated: 2013/03/03 15:04:31
:END:
[[http://matlab.cheme.cmu.edu/2012/05/07/1701/][Matlab post]]
This example shows how to use a regular expression to find strings matching the pattern :cmd:`datastring`. We want to find these strings, and then replace them with something that depends on what cmd is, and what datastring is.
Let us define some commands that will take datasring as an argument, and return the modified text. The idea is to find all the cmds, and then run them. We use python's =eval= command to get the function handle from a string, and the cmd functions all take a datastring argument (we define them that way). We will create commands to replace :cmd:`datastring` with html code for a light gray background, and :red:`some text` with html code making the text red.
#+BEGIN_SRC python :session
text = r'''Here is some text. use the :cmd:`open` to get the text into
a variable. It might also be possible to get a multiline
:red:`line
2` directive.'''
print text
print '---------------------------------'
#+END_SRC
#+RESULTS:
:
: ... ... >>> >>> Here is some text. use the :cmd:`open` to get the text into
: a variable. It might also be possible to get a multiline
: :red:`line
: 2` directive.
: ---------------------------------
Now, we define our functions.
#+BEGIN_SRC python :session
def cmd(datastring):
' replace :cmd:`datastring` with html code with light gray background'
s = '%{0}';
html = s.format(datastring)
return html
def red(datastring):
'replace :red:`datastring` with html code to make datastring in red font'
html = '{0}'.format(datastring)
return html
#+END_SRC
#+RESULTS:
Finally, we do the regular expression. Regular expressions are hard. There are whole books on them. The point of this post is to alert you to the possibilities. I will break this regexp down as follows. 1. we want everything between :*: as the directive. ([^:]*) matches everything not a :. :([^:]*): matches the stuff between two :. 2. then we want everything between `*`. ([^`]*) matches everything not a `. 3. The () makes a group that python stores so we can refer to them later.
#+BEGIN_SRC python :session
import re
regex = ':([^:]*):`([^`]*)`'
matches = re.findall(regex, text)
for directive, datastring in matches:
directive = eval(directive) # get the function
text = re.sub(regex, directive(datastring), text)
print 'Modified text:'
print text
#+END_SRC
#+RESULTS:
:
: >>> >>> ... ... ... >>> Modified text:
: Here is some text. use the %open to get the text into
: a variable. It might also be possible to get a multiline
: %open directive.