Skip to content

Instantly share code, notes, and snippets.

@benauthor
Created August 22, 2014 04:16
Show Gist options
  • Save benauthor/775928408f87618076c3 to your computer and use it in GitHub Desktop.
Save benauthor/775928408f87618076c3 to your computer and use it in GitHub Desktop.
Simulate a right-to-left regex with negative lookahead and backref
import re
def get_matches(regex, mystring):
return [i.group(0) for i in re.finditer(regex, mystring)]
input1 = 'A Foo, A Foo qux: Foo qux: Foo qux:'
regex1 = r'Foo.*?qux:'
print get_matches(regex1, input1)
# >>> ['Foo, Foo qux:', 'Foo qux:', 'Foo qux:']
# Oops, not quite.
input2 = input1[::-1] # reverse the string
regex2 = r':xuq.*?ooF'
print [i[::-1] for i in get_matches(regex2, input2)]
# >>> ['Foo qux:', 'Foo qux:', 'Foo qux:']
# OK... but that's a dumb, ugly hack.
# Better: use negative lookahead with the backreference
regex2 = r'(Foo)(?:(?!\1).)*qux:'
# (?: ) is a non-capturing group... a capturing group would
# work just as well but is less efficient.
# (?! ) is a negative lookahead
# \1 is the backreference to the first captured group
#
# Net result: Capture all Foo.*qux: except those that have
# a second 'Foo' in between.
print get_matches(regex2, input1)
# >>> ['Foo qux:', 'Foo qux:', 'Foo qux:']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment