Skip to content

Instantly share code, notes, and snippets.

@henry16lin
Created February 13, 2020 16:27
Show Gist options
  • Save henry16lin/71ef7e2dddb768a95220f5d1eefb5909 to your computer and use it in GitHub Desktop.
Save henry16lin/71ef7e2dddb768a95220f5d1eefb5909 to your computer and use it in GitHub Desktop.
load_glove
### load word2vec model ###
#pre-train model download from: https://github.com/stanfordnlp/GloVe
#preprocess:https://stackoverflow.com/questions/51323344/cant-load-glove-6b-300d-txt
wvmodel = gensim.models.KeyedVectors.load_word2vec_format('glove.6B.100d.w2vformat.txt',binary=False, encoding='utf-8')
## map golve pretrain weight to pytorch embedding pretrain weight
embed_size = 100
weight = torch.zeros(vocab_size+1, embed_size) #given 0 if the word is not in glove
for i in range(len(wvmodel.index2word)):
try:
index = word_to_idx[wvmodel.index2word[i]] #transfer to our word2ind
except:
continue
weight[index, :] = torch.from_numpy(wvmodel.get_vector(wvmodel.index2word[i]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment