Skip to content

Instantly share code, notes, and snippets.

@anna-hope
Last active November 5, 2015 23:12
Show Gist options
  • Save anna-hope/163dab5b73d7f6625f17 to your computer and use it in GitHub Desktop.
Save anna-hope/163dab5b73d7f6625f17 to your computer and use it in GitHub Desktop.
get all possible n-grams from an iterable
def get_ngrams(iterable):
length = len(iterable)
if length == 1:
yield iterable[0]
return
# the 'starting position' loop
for n in range(length):
# the 'skip step' loop
for step in range(1, length):
# make substrings
# starting at position 'n' and going up to the length of the string
for index in range(n, length, step):
# go from the next character
index += 1
# don't emit duplicates
# (if the step is greater than one
# and than the length of the would be substring,
# it's a duplicate)
if step > 1 and index - n < step:
continue
else:
ngram = iterable[n:index:step]
yield ngram
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment