Skip to content

Instantly share code, notes, and snippets.

@kovid-rathee
Created May 27, 2017 12:09
Show Gist options
  • Save kovid-rathee/3bdd9334409d583e6ee6430b8e45516d to your computer and use it in GitHub Desktop.
Save kovid-rathee/3bdd9334409d583e6ee6430b8e45516d to your computer and use it in GitHub Desktop.
How to vectorize sentences using a Pandas and sklearn's CountVectorizer
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = [ 'This is a sentence',
'Another sentence is here',
'Wait for another sentence',
'The sentence is coming',
'The sentence has come'
]
x = vectorizer.fit_transform(corpus)
print(pd.DataFrame(x.A, columns=vectorizer.get_feature_names()).to_string())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment