Skip to content

Instantly share code, notes, and snippets.

@MathiasGruber
Last active April 19, 2021 19:02
Show Gist options
  • Save MathiasGruber/d9942784f4c033cc8f8acfea4985f643 to your computer and use it in GitHub Desktop.
Save MathiasGruber/d9942784f4c033cc8f8acfea4985f643 to your computer and use it in GitHub Desktop.
Getting the best match from a set of embeddings including the query itself
import numpy as np
from sklearn.preprocessing import normalize
# Use the first question as the query
QUERY_ID = 0
# Noralize the data
norm_data = normalize(sentence_embeddings, norm='l2')
# Calculate scores as dot product between all embedding & query
scores = np.dot(norm_data, norm_data[QUERY_ID].T)
# The best match is the entry with the second highest score (the highest is the query itself)
MATCH_ID = np.argsort(scores)[-2]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment