Skip to content

Instantly share code, notes, and snippets.

@ashunigion
Created February 1, 2019 05:14
Show Gist options
  • Save ashunigion/cc4527638d476cf9b480cd9cc9f69cf0 to your computer and use it in GitHub Desktop.
Save ashunigion/cc4527638d476cf9b480cd9cc9f69cf0 to your computer and use it in GitHub Desktop.
def reduce_to_k_dim(M, k=2):
""" Reduce a co-occurence count matrix of dimensionality (num_corpus_words, num_corpus_words)
to a matrix of dimensionality (num_corpus_words, k) using the following SVD function from Scikit-Learn:
- http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
Params:
M (numpy matrix of shape (number of corpus words, number of corpus words)): co-occurence matrix of word counts
k (int): embedding size of each word after dimension reduction
Return:
M_reduced (numpy matrix of shape (number of corpus words, k)): matrix of k-dimensioal word embeddings.
In terms of the SVD from math class, this actually returns U * S
"""
n_iters = 10 # Use this parameter in your call to `TruncatedSVD`
M_reduced = None
print("Running Truncated SVD over %i words..." % (M.shape[0]))
#from sklearn.decomposition import TruncatedSVD
#from sklearn.random_projection import sparse_random_matrix
svd = TruncatedSVD(n_components=k, n_iter=n_iters, random_state=42)
svd.fit(M)
M_reduced = svd.transform(M)
print("Done.")
return M_reduced
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment