Created
August 22, 2014 04:16
-
-
Save benauthor/775928408f87618076c3 to your computer and use it in GitHub Desktop.
Simulate a right-to-left regex with negative lookahead and backref
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
def get_matches(regex, mystring): | |
return [i.group(0) for i in re.finditer(regex, mystring)] | |
input1 = 'A Foo, A Foo qux: Foo qux: Foo qux:' | |
regex1 = r'Foo.*?qux:' | |
print get_matches(regex1, input1) | |
# >>> ['Foo, Foo qux:', 'Foo qux:', 'Foo qux:'] | |
# Oops, not quite. | |
input2 = input1[::-1] # reverse the string | |
regex2 = r':xuq.*?ooF' | |
print [i[::-1] for i in get_matches(regex2, input2)] | |
# >>> ['Foo qux:', 'Foo qux:', 'Foo qux:'] | |
# OK... but that's a dumb, ugly hack. | |
# Better: use negative lookahead with the backreference | |
regex2 = r'(Foo)(?:(?!\1).)*qux:' | |
# (?: ) is a non-capturing group... a capturing group would | |
# work just as well but is less efficient. | |
# (?! ) is a negative lookahead | |
# \1 is the backreference to the first captured group | |
# | |
# Net result: Capture all Foo.*qux: except those that have | |
# a second 'Foo' in between. | |
print get_matches(regex2, input1) | |
# >>> ['Foo qux:', 'Foo qux:', 'Foo qux:'] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment