Skip to content

Instantly share code, notes, and snippets.

@andy51002000
Created July 1, 2019 07:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andy51002000/59d7337d526da4dcfecab43307be399a to your computer and use it in GitHub Desktop.
Save andy51002000/59d7337d526da4dcfecab43307be399a to your computer and use it in GitHub Desktop.
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
corpus = ['The cat sat on the mat', 'The dog sat on the mat', 'The goat sat on the mat']
vectorizer = CountVectorizer(lowercase=True, analyzer='word', binary=False)
representation = vectorizer.fit_transform(corpus)
representation_df = pd.DataFrame(data = representation.toarray(), columns=sorted(vectorizer.vocabulary_.keys()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment