Skip to content

Instantly share code, notes, and snippets.

@KhyatiMahendru
Created July 17, 2019 10:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KhyatiMahendru/1d58a8321bd2524d39efb04fcc2958c1 to your computer and use it in GitHub Desktop.
Save KhyatiMahendru/1d58a8321bd2524d39efb04fcc2958c1 to your computer and use it in GitHub Desktop.
# create document term matrix for your data
# you can use TfidfVectorizer instead of CountVectorizer as well
from sklearn.feature_extraction.text import CountVectorizer
cvec = CountVectorizer()
docTermMat = cvec.fit_transform(data['text'].values)
# truncated SVD to preserve 20 topics
from sklearn.decomposition import TruncatedSVD
lsa = TruncatedSVD(n_components = 20, n_iter = 500)
lsa.fit(docTermMat)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment