Created
February 13, 2020 16:27
-
-
Save henry16lin/71ef7e2dddb768a95220f5d1eefb5909 to your computer and use it in GitHub Desktop.
load_glove
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### load word2vec model ### | |
#pre-train model download from: https://github.com/stanfordnlp/GloVe | |
#preprocess:https://stackoverflow.com/questions/51323344/cant-load-glove-6b-300d-txt | |
wvmodel = gensim.models.KeyedVectors.load_word2vec_format('glove.6B.100d.w2vformat.txt',binary=False, encoding='utf-8') | |
## map golve pretrain weight to pytorch embedding pretrain weight | |
embed_size = 100 | |
weight = torch.zeros(vocab_size+1, embed_size) #given 0 if the word is not in glove | |
for i in range(len(wvmodel.index2word)): | |
try: | |
index = word_to_idx[wvmodel.index2word[i]] #transfer to our word2ind | |
except: | |
continue | |
weight[index, :] = torch.from_numpy(wvmodel.get_vector(wvmodel.index2word[i])) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment