Brief intro to regular expressions

| categories: regular expressions | tags:

Matlab post

This example shows how to use a regular expression to find strings matching the pattern :cmd:`datastring`. We want to find these strings, and then replace them with something that depends on what cmd is, and what datastring is.

Let us define some commands that will take datasring as an argument, and return the modified text. The idea is to find all the cmds, and then run them. We use python's eval command to get the function handle from a string, and the cmd functions all take a datastring argument (we define them that way). We will create commands to replace :cmd:`datastring` with html code for a light gray background, and :red:`some text` with html code making the text red.

text = r'''Here is some text. use the :cmd:`open` to get the text into
          a variable. It might also be possible to get a multiline
            :red:`line     
     2` directive.'''

print text
print '---------------------------------'
... ... >>> >>> Here is some text. use the :cmd:`open` to get the text into
          a variable. It might also be possible to get a multiline
            :red:`line     
     2` directive.
---------------------------------

Now, we define our functions.

def cmd(datastring):
    ' replace :cmd:`datastring` with html code with light gray background'
    s = '<FONT style="BACKGROUND-COLOR: LightGray">%{0}</FONT>';
    html = s.format(datastring)
    return html

def red(datastring):
    'replace :red:`datastring` with html code to make datastring in red font'
    html = '<font color=red>{0}</font>'.format(datastring)
    return html

Finally, we do the regular expression. Regular expressions are hard. There are whole books on them. The point of this post is to alert you to the possibilities. I will break this regexp down as follows. 1. we want everything between :*: as the directive. ([^:]*) matches everything not a :. :([^:]*): matches the stuff between two :. 2. then we want everything between `*`. ([^`]*) matches everything not a `. 3. The () makes a group that python stores so we can refer to them later.

import re
regex = ':([^:]*):`([^`]*)`'
matches = re.findall(regex, text)
for directive, datastring in matches:
    directive = eval(directive) # get the function
    text = re.sub(regex, directive(datastring), text)

print 'Modified text:'
print text
>>> >>> ... ... ... >>> Modified text:
Here is some text. use the <FONT style="BACKGROUND-COLOR: LightGray">%open</FONT> to get the text into
          a variable. It might also be possible to get a multiline
            <FONT style="BACKGROUND-COLOR: LightGray">%open</FONT> directive.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter