Skip to content

Instantly share code, notes, and snippets.

@abhishek-shrm
Created August 4, 2020 03:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abhishek-shrm/6b3d72988e87ac12e9bb2e9a0715b855 to your computer and use it in GitHub Desktop.
Save abhishek-shrm/6b3d72988e87ac12e9bb2e9a0715b855 to your computer and use it in GitHub Desktop.
# Stopwords removal & Lemmatizing tokens using SpaCy
import spacy
nlp = spacy.load('en_core_web_sm',disable=['ner','parser'])
nlp.max_length=5000000
# Removing Stopwords and Lemmatizing words
training_corpus['lemmatized']=training_corpus['cleaned'].progress_apply(lambda x: ' '.join([token.lemma_ for token in list(nlp(x)) if (token.is_stop==False)]))
testing_corpus['lemmatized']=testing_corpus['cleaned'].progress_apply(lambda x: ' '.join([token.lemma_ for token in list(nlp(x)) if (token.is_stop==False)]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment