Skip to content

Instantly share code, notes, and snippets.

@bhushanbrb
Created August 2, 2018 20:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bhushanbrb/2634c7c81376ab8386c568bb5b8e8ab9 to your computer and use it in GitHub Desktop.
Save bhushanbrb/2634c7c81376ab8386c568bb5b8e8ab9 to your computer and use it in GitHub Desktop.
1) Create Word Embedding using Glove
model_glove = word2vec.Word2Vec(word2vec_sentences, size=200, window=10, sg=1, hs=0, min_count=1, negative=10, workers=-1, iter=5)
2)from sklearn.cluster import AgglomerativeClustering
----------------to form cluster use this ------------------
In [37]:
wv_clusters = AgglomerativeClustering(n_clusters=50, affinity="cosine", linkage="average")
In [38]:
wv_clusters.fit(model_label_specific.syn0norm)
Out[38]:
AgglomerativeClustering(affinity='cosine', compute_full_tree='auto',
connectivity=None, linkage='average',
memory=Memory(cachedir=None), n_clusters=50, n_components=None,
pooling_func=<function mean at 0x7f8f3c207cf8>)
In [39]:
wv_cluster_ids = wv_clusters.fit_predict(model_label_specific.syn0norm)
In [40]:
wv_cluster_mappings = {k: wv_cluster_ids[v.index] for k,v in model_label_specific.vocab.iteritems()}
In [41]:
filter(lambda x: x[1] == 4, wv_cluster_mappings.iteritems())[:10]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment