Skip to content

Instantly share code, notes, and snippets.

@Keiku
Last active April 11, 2017 07:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Keiku/a5735432a3182c135fc60788c7bdeb32 to your computer and use it in GitHub Desktop.
Save Keiku/a5735432a3182c135fc60788c7bdeb32 to your computer and use it in GitHub Desktop.
Extract the tf-idf vector.
text = ['This is a string', 'This is another string', 'TFIDF computation calculation', 'TfIDF is the product of TF and IDF']
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(max_df=1.0, min_df=1, stop_words='english', norm = None)
X = vectorizer.fit_transform(text)
X_vovab = vectorizer.get_feature_names()
# Out[1]: ['calculation', 'computation', 'idf', 'product', 'string', 'tf', 'tfidf']
X_mat = X.todense()
# Out[2]:
# matrix([[ 0. , 0. , 0. , 0. , 1.51082562,
# 0. , 0. ],
# [ 0. , 0. , 0. , 0. , 1.51082562,
# 0. , 0. ],
# [ 1.91629073, 1.91629073, 0. , 0. , 0. ,
# 0. , 1.51082562],
# [ 0. , 0. , 1.91629073, 1.91629073, 0. ,
# 1.91629073, 1.51082562]])
X_idf = vectorizer.idf_
# Out[3]:
# array([ 1.91629073, 1.91629073, 1.91629073, 1.91629073, 1.51082562,
# 1.91629073, 1.51082562])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment