Skip to content

Instantly share code, notes, and snippets.

@tmylk
Last active February 8, 2017 14:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tmylk/14f887f8585e9f89ab5896a10308447c to your computer and use it in GitHub Desktop.
Save tmylk/14f887f8585e9f89ab5896a10308447c to your computer and use it in GitHub Desktop.
code for "how to get the similarity you need" from https://speakerdeck.com/tmylk/wordrank-pydata-5-min-talk
# just run this in the end of 01_pride_and_predjudice.ipynb from https://github.com/cytora/pycon-nlp-in-10-lines
processed_sentences = [sent.lemma_.split() for sent in processed_text.sents]
interchangeable_words_model = Word2Vec(
sentences=processed_sentences,
workers=multiprocessing.cpu_count() - 1, # use your cores
window=2, sg=1)
attributes_of_model = Word2Vec(
sentences=processed_sentences,
workers=multiprocessing.cpu_count() - 1, # use your cores
window=50, sg=1)
attributes_of_model.most_similar(u'darcy')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment