Skip to content

Instantly share code, notes, and snippets.

@yuyasugano
Created September 25, 2020 06:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yuyasugano/851641a0d37e8717e6b868fb59fae24d to your computer and use it in GitHub Desktop.
Save yuyasugano/851641a0d37e8717e6b868fb59fae24d to your computer and use it in GitHub Desktop.
Sentiment analysis with NLTK and Scikit-learn TfidfVectorizer
# https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
from sklearn.feature_extraction.text import TfidfVectorizer
"""
min_df=2, discard words appearing in less than 2 documents
max_df=0.9, discard words appering in more than 90% of the documents
sublinear_tf=True, use sublinear weighting
use_idf=True, enable IDF
"""
vec = TfidfVectorizer(
analyzer=preprocessing,
min_df=2,
max_df=0.9,
sublinear_tf=True,
use_idf=True
)
train_vec = vec.fit_transform(train_tweets['tweet'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment