Skip to content

Instantly share code, notes, and snippets.

@AniX
Last active July 4, 2016 21:10
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AniX/33dddf2a1993ca941692 to your computer and use it in GitHub Desktop.
Save AniX/33dddf2a1993ca941692 to your computer and use it in GitHub Desktop.
Tokenize text-field values work-around for Google App Engine Search API
def partialize(phrase, shortest=5):
"""Tokenize the string `phrase` argument for all possible sub-strings
at least `shortest` length of characters.
This is a work-around for Google App Engine's Search API not supporting
partial full-text search (as of time of writing, April 2013
In case of BBCode-formatted phrase, you should first strip() away all
BBCode tags before passing the string to this method.
"""
# See http://stackoverflow.com/questions/12899083/partial-matching-gae-search-api
# for original pattern (with-out shortest keyword)
if shortest < 1:
shortest = 1
if phrase is None:
return [u'']
tokens = []
for w in phrase.split():
j = shortest
while True:
if len(w) <= j:
tokens.append(w)
break
for i in range(len(w) - j + 1):
tokens.append(w[i:i + j])
if j == len(w):
break
j += 1
return tokens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment