Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@arielgamino

This comment has been minimized.

Copy link

commented Mar 2, 2018

Very nice tutorial!

@carltoews

This comment has been minimized.

Copy link

commented Mar 2, 2018

Thanks, this is great!

@tomnis

This comment has been minimized.

Copy link

commented Mar 2, 2018

awesome! very intuitive explanations

@marcboeker

This comment has been minimized.

Copy link

commented Mar 2, 2018

Great tutorial, thanks!

@vnhnhm

This comment has been minimized.

Copy link

commented Mar 8, 2018

Not sure why I'm getting the following error, working on macOS with Jupyter Lab, Python 2.7 and Spacy 2.0.9:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-090b6e832a74> in <module>()
      3 # It creates a list of unique words in the text
      4 tokens = list(set([w.text for w in doc if w.is_alpha]))
----> 5 print nlp.vocab['cheese'].vector

lexeme.pyx in spacy.lexeme.Lexeme.vector.__get__()

ValueError: Word vectors set to length 0. This may be because you don't have a model installed or loaded, or because your model doesn't include word vectors. For more info, see the documentation: 
https://spacy.io/usage/models
@rgibson

This comment has been minimized.

Copy link

commented Mar 12, 2018

@vnhnhm
I was getting the same error. I fixed it by downloading a different language model for spaCy than what the instructions indicated (one that includes vectors). See spaCy documentation here: https://spacy.io/usage/models

So instead of running this from the command line:
python -m spacy download en
. . . and using this command in Jupyter:
nlp = spacy.load('en')

I ran this command in bash:
python -m spacy download en_core_web_md
. . . and did this in Jupyter
nlp = spacy.load('en_core_web_md')

Hope this helps!

@razodactyl

This comment has been minimized.

Copy link

commented Mar 15, 2018

This write up is amazing, great work!

@VivekParupudi

This comment has been minimized.

Copy link

commented Feb 17, 2019

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 32764: character maps to

I get this error when I try to open the colors json file

@san-cc

This comment has been minimized.

Copy link

commented Mar 18, 2019

I was looking for word2vec. But yours helps me a lot. Nice work!

@lyons7

This comment has been minimized.

Copy link

commented Mar 22, 2019

This is the best tutorial I've ever read! So clear and easy to understand. Thanks for making this and putting it out there!

@suntaorus

This comment has been minimized.

Copy link

commented Mar 23, 2019

This is a great tutorial. Thank you! Found this one through The Coding Train channel on YouTube.

@proy251183

This comment has been minimized.

Copy link

commented Mar 23, 2019

One of the best intuitive tutorial !!

@joseberlines

This comment has been minimized.

Copy link

commented Apr 7, 2019

One of the best tutorials on word to vec. Nevertheless there is a "quantum-leap" in the explanation when it comes to "Word vectors in spaCy". Suddenly we have vectors associated to any word, of a predetermined dimension. Why? Where are those vectors coming from? how are they calculated? Based on which texts? Since wordtovec takes into account context the vector representations are going to be very different in technical papers, in literature, poetry, facebook posts etc. How do you create your own vectors related to a particular collection of concepts over a particular set of documents? I observed this problematic in many many word2vec tutorials. The explanation starts very smoothly, basic, very well explained up to details; and suddenly there is a big hole in the explanation. In any case this is one of the best explanations I have found on wordtovec theory. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.