Skip to content

Instantly share code, notes, and snippets.

@mlai-demo
Last active August 11, 2019 22:22
Show Gist options
  • Save mlai-demo/396bb9894da0674533bdcf9d67538cd3 to your computer and use it in GitHub Desktop.
Save mlai-demo/396bb9894da0674533bdcf9d67538cd3 to your computer and use it in GitHub Desktop.
NLTK stemmer and lemmatizer
from nltk.stem.porter import PorterStemmer
with open(fpath + '/Plutarch_tokens.txt') as f, open(fpath + '/Plutarch_stem.txt', 'w') as out_f:
text = f.read()
tokens = word_tokenize(text)
porter = PorterStemmer()
stemmed = [porter.stem(word) for word in tokens]
print(stemmed[:100])
new_stem_text = ' '.join(stemmed)
fd_stemmed = nltk.FreqDist(stemmed)
out_f.write(new_stem_text)
nltk.download('wordnet') #need if using Google Colab
from nltk.stem import WordNetLemmatizer
with open(fpath + '/Plutarch_tokens.txt') as f, open(fpath + '/Plutarch_lemma.txt', 'w') as out_f:
text = f.read()
tokens = word_tokenize(text)
lemma = WordNetLemmatizer()
lemmed = [lemma.lemmatize(word) for word in tokens]
print(lemmed[:100])
new_lem_text = ' '.join(lemmed)
out_f.write(new_lem_text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment