Last active
April 8, 2022 01:26
-
-
Save michelkana/dad2c4121320339f1501dbbf2d9e0571 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# load GloVe files | |
glove_file = datapath('glove.6B\\glove.6B.100d.txt') | |
glove_file_300 = datapath('glove.6B\\glove.6B.300d.txt') | |
# convert from GloVe to Word2vec | |
word2vec_glove_file = get_tmpfile("glove.6B.100d.word2vec.txt") | |
glove2word2vec(glove_file, word2vec_glove_file) | |
word2vec_glove_file_300 = get_tmpfile("glove.6B.300d.word2vec.txt") | |
glove2word2vec(glove_file_300, word2vec_glove_file_300) | |
# convert word2vec vectors to keyvectors {word => 1D numpy array} | |
model = KeyedVectors.load_word2vec_format(word2vec_glove_file, binary=False) | |
model_300 = KeyedVectors.load_word2vec_format(word2vec_glove_file_300, binary=False) | |
# display a sample vector | |
with open(word2vec_glove_file) as f: | |
f.readline() | |
print(f.readline()) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment