Skip to content

Instantly share code, notes, and snippets.

@abhishek-shrm
Last active April 1, 2021 10:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abhishek-shrm/aab022b47ea0f05794504c8df38e62b5 to your computer and use it in GitHub Desktop.
Save abhishek-shrm/aab022b47ea0f05794504c8df38e62b5 to your computer and use it in GitHub Desktop.
# Creating Document Term Matrix
from sklearn.feature_extraction.text
import CountVectorizer
cv=CountVectorizer(analyzer='word')
data=cv.fit_transform(df_grouped['lemmatized'])
df_dtm = pd.DataFrame(data.toarray(), columns=cv.get_feature_names())
df_dtm.index=df_grouped.index
df_dtm.head(3)
@leidmarfesta
Copy link

leidmarfesta commented Mar 31, 2021

just a note about this code: in Python you need to separate it (two lines)

from sklearn.feature_extraction.text import CountVectorizer
cv=CountVectorizer(analyzer='word')

@abhishek-shrm
Copy link
Author

Thank you for pointing it out. I have updated the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment