Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
# create document term matrix for your data
# you can use TfidfVectorizer instead of CountVectorizer as well
from sklearn.feature_extraction.text import CountVectorizer
cvec = CountVectorizer()
docTermMat = cvec.fit_transform(data['text'].values)
# truncated SVD to preserve 20 topics
from sklearn.decomposition import TruncatedSVD
lsa = TruncatedSVD(n_components = 20, n_iter = 500)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment