Skip to content

Instantly share code, notes, and snippets.

@vporiz
Last active December 3, 2020 13:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vporiz/16aa44f322b733cb1659148f78c87bac to your computer and use it in GitHub Desktop.
Save vporiz/16aa44f322b733cb1659148f78c87bac to your computer and use it in GitHub Desktop.
TF-IDF computation in Sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
'This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?',
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())
# returns ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
print(X.shape)
# returns (4, 9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment