Skip to content

Instantly share code, notes, and snippets.

@awaemmanuel
Forked from erickrf/read_embeddings.py
Created March 3, 2019 23:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save awaemmanuel/1893702923f0c5a156332a0d10c64f50 to your computer and use it in GitHub Desktop.
Save awaemmanuel/1893702923f0c5a156332a0d10c64f50 to your computer and use it in GitHub Desktop.
Read embeddings file in text format and convert to numpy
import numpy as np
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('input', help='Single embedding file')
parser.add_argument('output', help='Output basename without extension')
args = parser.parse_args()
embeddings_file = args.output + '.npy'
vocabulary_file = args.output + '.txt'
words = []
vectors = []
with open(args.input, 'rb') as f:
for line in f:
fields = line.split()
word = fields[0].decode('utf-8')
vector = np.fromiter((float(x) for x in fields[1:]),
dtype=np.float)
words.append(word)
vectors.append(vector)
matrix = np.array(vectors)
np.save(embeddings_file, matrix)
text = '\n'.join(words)
with open(vocabulary_file, 'wb') as f:
f.write(text.encode('utf-8'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment