Skip to content

Instantly share code, notes, and snippets.

@InPermutation
Created October 15, 2014 20:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save InPermutation/a419fa89542e39ef35bc to your computer and use it in GitHub Desktop.
Save InPermutation/a419fa89542e39ef35bc to your computer and use it in GitHub Desktop.
Shorten a regex, allowing false positives
s = 'amsterdam|london|tokyo|indianapolis|new york|shanghai|toronto|san francisco'
rg = s.split('|')
# count: the length of the shortest word
count = len(min(rg, key=len))
# v: list of sets. v[0] is the set of all the first letters, etc.
v = []
for i in range(0,count):
v.append(set(w[i] for w in rg))
# now generate a regex from v by combining the sets in v
def make_block(chars):
return '[' + str.join('', chars) + ']'
print str.join('', map(make_block, v)) # '[ailnst][aehmon][adknsrw][ dionty][aegfony]'
@InPermutation
Copy link
Author

Use case: You have a bunch of IDs in your Google Analytics that you want to match on. It's very long and doesn't fit anymore.

If false positives are OK, you can try to make it shorter by using character classes. Some false positives from this example: words starting with lenin, santa, toddy, nnnnn, thane, and 8562 other unintended prefixes. If that's acceptable, you can use this toy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment