Last active December 3, 2020 13:32
TF-IDF computation in Sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
'This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?',
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
# returns ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
# returns (4, 9)
