Skip to content

Instantly share code, notes, and snippets.

@mlai-demo
Created January 19, 2020 03:51
Show Gist options
  • Save mlai-demo/9445f4e09556c434c4cf6ac0695dc2d9 to your computer and use it in GitHub Desktop.
Save mlai-demo/9445f4e09556c434c4cf6ac0695dc2d9 to your computer and use it in GitHub Desktop.
TF-IDF, pairwise similarity matrix, and pandas dataframe
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
documents = [open(f).read() for f in texts]
tfidf = TfidfVectorizer(stop_words=my_stop_words).fit_transform(documents)
pairwise_similarity = tfidf * tfidf.T
pairwise_similarity_matrix = pairwise_similarity.todense()
psm_df = pd.DataFrame(pairwise_similarity_matrix, index = titles, columns = titles).round(3)
psm_df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment