Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Sentiment analysis with NLTK and Scikit-learn TfidfVectorizer
# https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
from sklearn.feature_extraction.text import TfidfVectorizer
"""
min_df=2, discard words appearing in less than 2 documents
max_df=0.9, discard words appering in more than 90% of the documents
sublinear_tf=True, use sublinear weighting
use_idf=True, enable IDF
"""
vec = TfidfVectorizer(
analyzer=preprocessing,
min_df=2,
max_df=0.9,
sublinear_tf=True,
use_idf=True
)
train_vec = vec.fit_transform(train_tweets['tweet'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment