Skip to content

Instantly share code, notes, and snippets.

@amankharwal
Created December 1, 2020 04:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amankharwal/c7776439449600c3e095577c122c5230 to your computer and use it in GitHub Desktop.
Save amankharwal/c7776439449600c3e095577c122c5230 to your computer and use it in GitHub Desktop.
from sklearn.feature_extraction.text import CountVectorizer
#docs = docs.tolist()
#create a vocabulary of words,
cv=CountVectorizer(max_df=0.95, # ignore words that appear in 95% of documents
max_features=10000, # the size of the vocabulary
ngram_range=(1,3) # vocabulary contains single words, bigrams, trigrams
)
word_count_vector=cv.fit_transform(docs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment