Skip to content

Instantly share code, notes, and snippets.

@alisey
Created May 25, 2012 19:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alisey/2789897 to your computer and use it in GitHub Desktop.
Save alisey/2789897 to your computer and use it in GitHub Desktop.
Python port of regexp matcher from 'The Practice of Programming'
# Python port of Rob Pike's regexp matcher from 'The Practice of Programming'
# Idea for match() stolen from https://github.com/darius/sketchbook/blob/master/misc/pike.py
def match(re, text):
"""
Search for regexp anywhere in text
c - any literal character
. - any single character
^ - the beginning of the input string
$ - the end of the input string
* - zero or more occurrences of the previous character
"""
return (matchhere(re[1:], text) if re[0] == '^' else
matchstar('.', re, text))
def matchhere(re, text):
"""Search for regexp at beginning of text"""
if re == '':
return True
if re == '$':
return text == ''
if len(re) > 1 and re[1] == '*':
return matchstar(re[0], re[2:], text)
if text and (re[0] == '.' or re[0] == text[0]):
return matchhere(re[1:], text[1:])
return False
def matchstar(c, re, text):
"""Search for c*regexp at beginning of text"""
for i in range(len(text)) or [0]:
if matchhere(re, text[i:]):
return True
if text[i] != c and c != '.':
return False
return False
@darius
Copy link

darius commented Jun 6, 2012

I like how you turned matchstar into a for loop -- nice and Pythonic! But if it reaches the end doesn't it need a final "return matchhere(regexp, '')"? Pike's version avoids that by putting the loop test at the bottom.

Same issue with match(). I'd use explicit "return False" instead of implicit "return None" generally for this sort of thing -- I noticed the issue above because of thinking of that change and then going "wait, is false the right answer there?"

I'll post my version (which for some reason I hadn't committed) at https://github.com/darius/sketchbook/tree/master/misc

@alisey
Copy link
Author

alisey commented Jun 6, 2012

Hi Darius! The reason for Pike's do-while loop is to allow zero-width matches such as match('aX*z', 'az'). I too start with null match when i=0. And the last iteration is matchhere(regexp, ''). Or do I miss something?

You're right about explicit return False. When writing production code I'm very strict about casting data to the correct type. But here I took the liberty for better effect.

Love your idea of return matchstar('.', re, text).

@darius
Copy link

darius commented Jun 6, 2012

The problem comes up with match('x*', '') returning None instead of True.

Thanks and I'm pleased to find you here -- you posted some great code for the class.

@alisey
Copy link
Author

alisey commented Jun 7, 2012

Empty text, damn. Thanks for pointing this out. If you don't mind I'll steal your idea to fix match().

@darius
Copy link

darius commented Nov 21, 2012

This is only very slightly related, but would you like to review https://github.com/darius/peglet ? I'm trying to improve on the grammar parser from cs212 -- to make something more useful without making it less accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment