Instantly share code, notes, and snippets.

# aparrish/understanding-word-vectors.ipynb

Last active September 7, 2024 02:18
Show Gist options
• Save aparrish/2f562e3737544cf29aaf1af30362f469 to your computer and use it in GitHub Desktop.
Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

### lewiuberg commented Jan 4, 2021

One of the best tutorials on word to vec. Nevertheless there is a "quantum-leap" in the explanation when it comes to "Word vectors in spaCy". Suddenly we have vectors associated to any word, of a predetermined dimension. Why? Where are those vectors coming from? how are they calculated? Based on which texts? Since wordtovec takes into account context the vector representations are going to be very different in technical papers, in literature, poetry, facebook posts etc. How do you create your own vectors related to a particular collection of concepts over a particular set of documents? I observed this problematic in many many word2vec tutorials. The explanation starts very smoothly, basic, very well explained up to details; and suddenly there is a big hole in the explanation. In any case this is one of the best explanations I have found on wordtovec theory. thanks

I agree! I thought I had deleted many cells and downloaded it again looking for the gap.

### jdmedenilla commented Feb 12, 2021 • edited Loading

When I ran snippets of code that access a library, it gave me errors like this: "FileNotFoundError: [Errno 2] No such file or directory: 'pg345.txt'". And same thing with the color file: "FileNotFoundError: [Errno 2] No such file or directory: 'xkcd.json'"
I ran those on jupyter notebook. Do you know what's wrong?

Note: I tried doing it in Visual Code but it gave me the same problem, even after saving it in the same directory. Also i've read online to use the absolute path, but it still would not work.

### Zaravanon commented Mar 4, 2021

Great, Thank You!

### tugcekizilltepe commented Apr 28, 2021

Great, well-explained tutorial, thank you!

### prakashr7d commented May 28, 2021

Not sure why I'm getting the following error, working on macOS with Jupyter Lab, Python 2.7 and Spacy 2.0.9:

``````---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-090b6e832a74> in <module>()
3 # It creates a list of unique words in the text
4 tokens = list(set([w.text for w in doc if w.is_alpha]))
----> 5 print nlp.vocab['cheese'].vector

lexeme.pyx in spacy.lexeme.Lexeme.vector.__get__()

ValueError: Word vectors set to length 0. This may be because you don't have a model installed or loaded, or because your model doesn't include word vectors. For more info, see the documentation:
https://spacy.io/usage/models
``````

### saiankit commented Aug 31, 2021

OMG !! Really had a great time reading this beautiful gist. Very well explained.

Thanks!

### mikeolubode commented Feb 4, 2022

I was led here by a tutorial on word vectors from youtube. Thanks for the simplicity!

very good

### robertocsa commented Sep 15, 2022

Thank you for sharing this. Excelent job!

### avneesh91 commented May 5, 2023

this is amazing, thank you for explanation!!

Thanks!!

### adebiasi commented Nov 29, 2023

Very nice tutorial!

One question:
A word near the origin (0,0,0 ...) in the n-space has less possibility to be the result of an addition among words. As opposite, a word very distant of the origin could be the result of many possible additions among many words. Does this mean that complex concepts are far for the origin and basic concepts are near?

to join this conversation on GitHub. Already have an account? Sign in to comment