Skip to content

Instantly share code, notes, and snippets.

@EricDuminil
Forked from atiking/regexp-trie.py
Last active December 31, 2023 06:04
  • Star 20 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save EricDuminil/8faabc2f3de82b24e5a371b6dc0fd1e0 to your computer and use it in GitHub Desktop.
#!/usr/bin/python
# -*- coding: utf-8 -*-
#
#author: rex
#blog: http://iregex.org
#filename trie.py
#created: 2010-08-01 20:24
#source uri: http://iregex.org/blog/trie-in-python.html
# escape bug fix by fcicq @ 2012.8.19
# python3 compatible by EricDuminil @ 2017.03.
import re
class Trie():
"""Regexp::Trie in python. Creates a Trie out of a list of words. The trie can be exported to a Regexp pattern.
The corresponding Regexp should match much faster than a simple Regexp union."""
def __init__(self):
self.data = {}
def add(self, word):
ref = self.data
for char in word:
ref[char] = char in ref and ref[char] or {}
ref = ref[char]
ref[''] = 1
def dump(self):
return self.data
def quote(self, char):
return re.escape(char)
def _pattern(self, pData):
data = pData
if "" in data and len(data.keys()) == 1:
return None
alt = []
cc = []
q = 0
for char in sorted(data.keys()):
if isinstance(data[char], dict):
try:
recurse = self._pattern(data[char])
alt.append(self.quote(char) + recurse)
except:
cc.append(self.quote(char))
else:
q = 1
cconly = not len(alt) > 0
if len(cc) > 0:
if len(cc) == 1:
alt.append(cc[0])
else:
alt.append('[' + ''.join(cc) + ']')
if len(alt) == 1:
result = alt[0]
else:
result = "(?:" + "|".join(alt) + ")"
if q:
if cconly:
result += "?"
else:
result = "(?:%s)?" % result
return result
def pattern(self):
return self._pattern(self.dump())
if __name__ == '__main__':
t = Trie()
for w in ['foobar', 'foobah', 'fooxar', 'foozap', 'fooza']:
t.add(w)
print(t.pattern())
#=> "foo(?:ba[hr]|xar|zap?)"
@p16i
Copy link

p16i commented Aug 6, 2019

Hi Eric,

I'm currently working on a project and using this gist. Comparing to other Tries libraries, yours is quite minimal and sufficient for our work.

It would be good if we could import your gist as a module instead of this snippet.

Would you mind creating a pip module for this snippet? I could also do that if you don't have time and later transfer to your Github.

Thanks!
Pat

@EricDuminil
Copy link
Author

EricDuminil commented Aug 6, 2019 via email

@ddelange
Copy link

ddelange commented Apr 9, 2020

@EricDuminil I took the liberty and mentioned you in the MIT License! https://pypi.org/project/retrie

cc @heytitle

@EricDuminil
Copy link
Author

@ddelange: Thank you very much. The project looks good!

@Yoric
Copy link

Yoric commented Dec 11, 2020

Trying to port this code to Rust :)

What is q, exactly?

@ddelange
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment