Skip to content

Instantly share code, notes, and snippets.

@vphill
Created May 2, 2018 18:31
Show Gist options
  • Save vphill/dd4b6c71e3a503295bb04b355079c203 to your computer and use it in GitHub Desktop.
Save vphill/dd4b6c71e3a503295bb04b355079c203 to your computer and use it in GitHub Desktop.
simple stemmer
def lemmatize(token_list):
"""very simple implementation"""
out_tokens = []
for t in token_list:
if t.endswith('ies'):
t = t[:-3] + 'y'
elif t.endswith("'s"):
t = t[:-2]
elif (t.endswith('s') or t.endswith("s'")) and not t.endswith('us') and not t.endswith('as') and not t.endswith('ss') and not t.endswith('is'):
if t.endswith('s'):
t = t[:-1]
elif t.endswith("s'"):
t = t[:-2]
elif t.endswith("'"):
t = t[:-1]
out_tokens.append(t)
return out_tokens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment