Skip to content

Instantly share code, notes, and snippets.

@srang992
Created April 16, 2022 13:22
Show Gist options
  • Save srang992/b256ea08093a3da15f425536d08975e7 to your computer and use it in GitHub Desktop.
Save srang992/b256ea08093a3da15f425536d08975e7 to your computer and use it in GitHub Desktop.
converting the words into vectors
# making an object of TfidfVectorizer in which words contains only in 1 document and word repeated in 70% of documents are ignored.
tfidf = TfidfVectorizer(min_df = 2, max_df = 0.7)
# fitting the cleaned text in TfidfVectorizer
X = tfidf.fit_transform(netflix_data_copy['clean_desc'])
# making a suitable dataframe for calculating the cosine similarity and save it
tfidf_df = pd.DataFrame(X.toarray(), columns = tfidf.get_feature_names())
tfidf_df.index = netflix_data_copy['title']
tfidf_df.to_csv("data/tfidf_data.csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment