Skip to content

Instantly share code, notes, and snippets.

@morkapronczay
Created October 14, 2019 12:18
Show Gist options
  • Save morkapronczay/bd83b1822102140602e92a9d7fcff95c to your computer and use it in GitHub Desktop.
Save morkapronczay/bd83b1822102140602e92a9d7fcff95c to your computer and use it in GitHub Desktop.
from nltk.stem import SnowballStemmer
# define stemmer objects by language
stemmers = {lan: SnowballStemmer(languages_long[lan]) for lan in languages}
# stem text
text_bylang_stemmed = {lan: [stemmers[lan].stem(word) for word in text_bylang[lan]] for lan in languages}
# stem and remove stopwords
text_bylang_stop_stemmed = {lan: [stemmers[lan].stem(word) for word in text_bylang_stop[lan]] for lan in languages}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment