Skip to content

Instantly share code, notes, and snippets.

@twentyse7en
Last active October 20, 2020 14:12
Show Gist options
  • Save twentyse7en/1ed02dad38189b184fd6c26c9e96d717 to your computer and use it in GitHub Desktop.
Save twentyse7en/1ed02dad38189b184fd6c26c9e96d717 to your computer and use it in GitHub Desktop.
Intro Standford NLP

Lecture 1 summary

How do we represent the meaning of word?

Meaning : idea that is represented by a word, phrase etc.

Wordnet

  • This contain the noun, adj, etc related to the word.
  • Impossible to keep up-to-date with new words
  • Requires human effort to create and adapt.

Traditional representation

  • one hot vector
ex: motel [0 0 0 0 0 0 0 1 0 0 0 0 ]
    hotel [0 0 0 0 0 0 0 0 0 0 0 1 ]
  • If user is searching for "Seattle hotel", it would be also good to have "Seattle motel" result too.
  • But there is no natural notion of similarity for one-hot-vectors!

Representing words by their context

  • A word's meaning is given by the words that frequently apper close-by
  • When a word w appears in a text, its context is the set of words that appear nearby (within a fixed-size window)
  • Use the many contexts of w to build up a representation of w

Word vectors

  • We will build a dense vector for each word, chosen so that it is similar to vectors of words that appear in similar contexts

Word2vec

  • Framework for learning word vectors
  • We have a large corpus of text
  • Every word in a fixed vocabulary is represented by a vector
  • Go through each position t in the text, which has a center word c and context (“outside”) words

Interesting example

TODO:

  • Write the summary of calculations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment