Skip to content

Instantly share code, notes, and snippets.

@oborchers
Created June 7, 2019 16:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save oborchers/6ca39ccb1bb5a40c82ba12c2ba792b97 to your computer and use it in GitHub Desktop.
Save oborchers/6ca39ccb1bb5a40c82ba12c2ba792b97 to your computer and use it in GitHub Desktop.
def sif_embeddings(sentences, model, alpha=1e-3):
""" Precomputes the indices of the sentences and uses the numpy indexing to directly multiply and sum the vectors
"""
vlookup = model.wv.vocab
vectors = model.wv
output = []
for s in sentences:
# Pre-compute sentence indices
idx = [vlookup[w].index for w in s if w in vlookup]
# Note: vectors.sif is a pre-computed numpy array containing the weights for all the word-vectors.
v = np.sum(vectors.vectors[idx] * vectors.sif[idx][:, None], axis=0)
if len(idx) > 0:
v *= 1/len(idx)
output.append(v)
return np.vstack(output).astype(REAL)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment