## Brief intro to regular expressions

| categories: regular expressions | tags: | View Comments

This example shows how to use a regular expression to find strings matching the pattern :cmd:datastring. We want to find these strings, and then replace them with something that depends on what cmd is, and what datastring is.

Let us define some commands that will take datasring as an argument, and return the modified text. The idea is to find all the cmds, and then run them. We use python's eval command to get the function handle from a string, and the cmd functions all take a datastring argument (we define them that way). We will create commands to replace :cmd:datastring with html code for a light gray background, and :red:some text with html code making the text red.

text = r'''Here is some text. use the :cmd:open to get the text into
a variable. It might also be possible to get a multiline
:red:line
2 directive.'''

print text
print '---------------------------------'

... ... >>> >>> Here is some text. use the :cmd:open to get the text into
a variable. It might also be possible to get a multiline
:red:line
2 directive.
---------------------------------


Now, we define our functions.

def cmd(datastring):
' replace :cmd:datastring with html code with light gray background'
s = '<FONT style="BACKGROUND-COLOR: LightGray">%{0}</FONT>';
html = s.format(datastring)
return html

def red(datastring):
'replace :red:datastring with html code to make datastring in red font'
html = '<font color=red>{0}</font>'.format(datastring)
return html


Finally, we do the regular expression. Regular expressions are hard. There are whole books on them. The point of this post is to alert you to the possibilities. I will break this regexp down as follows. 1. we want everything between :*: as the directive. ([^:]*) matches everything not a :. :([^:]*): matches the stuff between two :. 2. then we want everything between *. ([^]*) matches everything not a . 3. The () makes a group that python stores so we can refer to them later.

import re
regex = ':([^:]*):([^]*)'
matches = re.findall(regex, text)
for directive, datastring in matches:
directive = eval(directive) # get the function
text = re.sub(regex, directive(datastring), text)

print 'Modified text:'
print text

>>> >>> ... ... ... >>> Modified text:
Here is some text. use the <FONT style="BACKGROUND-COLOR: LightGray">%open</FONT> to get the text into
a variable. It might also be possible to get a multiline
<FONT style="BACKGROUND-COLOR: LightGray">%open</FONT> directive.
`