Skip to content

Instantly share code, notes, and snippets.

@MaryamZi
Last active July 7, 2016 16:17
Show Gist options
  • Save MaryamZi/71aee229c67fed45dba6 to your computer and use it in GitHub Desktop.
Save MaryamZi/71aee229c67fed45dba6 to your computer and use it in GitHub Desktop.
Blog - NLP - Stemming Example
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps = PorterStemmer();
test_sentence = " learn learned learning learns"
tokenized_words = word_tokenize(test_sentence)
stemmed_words = []
for i in tokenized_words:
stemmed_words.append(ps.stem(i))
print stemmed_words
@sherpanee
Copy link

Thanks for sharing!

Do you know how to remove the prefix that is returned with each word?

[u'learn', u'learn', u'learn', u'learn']

In the above example, how to remove the u in front of each word?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment