Skip to content

Instantly share code, notes, and snippets.

@rjurney rjurney/gensim_word2vec.py
Last active Oct 22, 2019

Embed
What would you like to do?
Encoding tokenized text with gensim.models.Word2Vec
from gensim.models import Word2Vec
w2v_model = None
model_path = f'models/word2vec.model'
# Load the Word2Vec model if it exists
if os.path.exists(model_path):
w2v_model = Word2Vec.load(model_path)
else:
w2v_model = Word2Vec(
documents,
size=EMBEDDING_SIZE,
min_count=1,
window=5,
workers=NUM_CORES,
seed=1337
)
w2v_model.save(model_path)
# Show that similar words to 'program' print
w2v_model.wv.most_similar(positive='program')
# Encode the documents using the new embedding
encoded_docs = [[w2v_model.wv[word] for word in post] for post in documents]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.