Skip to content

Instantly share code, notes, and snippets.

@johanlindberg
Created June 2, 2010 18:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johanlindberg/422771 to your computer and use it in GitHub Desktop.
Save johanlindberg/422771 to your computer and use it in GitHub Desktop.
# Experimental extensions to Python's re module.
import re
def int_class_expansion(match):
return "(%s)" % \
('|'.join(reversed( [str(x) for x \
in xrange(int(match.group(1)),
int(match.group(2))+ 1)] )))
int_class_rule = ( r"\[(\d+)\.\.(\d+)\]", int_class_expansion)
rules = [int_class_rule,]
def expand(pattern):
result = pattern
for rule in rules:
match = re.search(rule[0], result)
while match:
result = result[:match.start()] + \
rule[1](match) + \
result[match.end():]
match = re.search(rule[0], result)
return result
def match(pattern, string, flags=0):
return re.match(expand(pattern), string, flags)
def search(pattern, string, flags=0):
return re.search(expand(pattern), string, flags)
#
@johanlindberg
Copy link
Author

This is a rewrite (from memory) of functionality that I toyed around with a few years ago. I cannot find the original code but this shows off the gist of the idea I was working on, applied to Staffan Nötebergs blog post on the Integer class extension for regex (see http://bit.ly/aN2tk1).

The idea is basically to wrap the re module and add a pre-processing stage for regex patterns (the expand function). I've chosen the absolutely simplest way of implementing integer matching as discussed in Staffan's post. It can be made better.

To use it:
~ $ cd Projects/extre/
~/Projects/extre $ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import extre

You can use expand to see what the regex turns into before being sent into the re module.

extre.expand(r'[0..10]')
'(0|1|2|3|4|5|6|7|8|9|10)'

as expected, expand doesn't touch plain regexes

extre.expand(r'[0.10]')
'[0.10]'

let's try matching an IP address:

m = extre.match(r'([0..255].){3}[0..255]', '123.125.126.107')
m.group()
'123.125.126.1'

Hm. I'm not sure what to make of this. It's not what Staffan intended but I'm not sure it's "wrong" either. Anyway, it's "easily" solved but that is left as an exercise for the reader.

m = extre.match(r'([0..255].){3}[0..255]', '999.888.777.666')
m.group()
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'NoneType' object has no attribute 'group'

The reason we get the error message is because m is None.

@johanlindberg
Copy link
Author

I re-read Staffan's post and realized that he actually specifies that it should be greedy so my example above is clearly wrong. I've updated the code to make it behave as it should:

m = extre.match(r'([0..255].){3}[0..255]', '123.125.126.107')
m.group() '123.125.126.107'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment