Skip to content

Instantly share code, notes, and snippets.

@MaartenGr
Created October 15, 2020 13:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MaartenGr/bf6c39f98525b471d656c758857965cb to your computer and use it in GitHub Desktop.
Save MaartenGr/bf6c39f98525b471d656c758857965cb to your computer and use it in GitHub Desktop.
from sklearn.metrics.pairwise import cosine_similarity
# Get similar classes
distances = cosine_similarity(c_tf_idf, c_tf_idf)
np.fill_diagonal(distances, 0)
# For each class, extract the most similar class
result = pd.DataFrame([(newsgroups.target_names[index],
newsgroups.target_names[distances[index].argmax()])
for index in range(len(docs_per_class))],
columns=["From", "To"])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment