Skip to content

Instantly share code, notes, and snippets.

@sharma-ji
Created July 16, 2018 06:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sharma-ji/a0ea008e54768b874b8004496f4450b4 to your computer and use it in GitHub Desktop.
Save sharma-ji/a0ea008e54768b874b8004496f4450b4 to your computer and use it in GitHub Desktop.
Pipeline for Bag of Words
from sklearn.feature_extraction.text import CountVectorizer
# list of text documents
text = ["this is test doc", "this is another test doc"]
# create the transform
vector = CountVectorizer()
# tokenize and build vocab
vector.fit(text)
# Print the summary
print(vectorizer.vocabulary_)
# Transform document
X_Train = vector.transform(text)
# Print summary of transformed vector
print(X_Train.shape)
print(type(X_Train))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment