Skip to content

Instantly share code, notes, and snippets.

@KimMilim
Last active January 14, 2020 23:06
Show Gist options
  • Save KimMilim/db02f6446aa19a13c48931933b7e7c59 to your computer and use it in GitHub Desktop.
Save KimMilim/db02f6446aa19a13c48931933b7e7c59 to your computer and use it in GitHub Desktop.

Word2Vec_theory ( word2vec works )

Word Embedding

#What is Word Embedding I say Word2Vec Word Embedding converts natural language into a vector that computers can understand. It can make 'looks like' a computer understands natural language. It can measure similarity of between words. It makes it easier to handle by vectoring words. It can make inferences through vector operations. Because the meaning of word itself is digitized as a vector. It makes reduce memory than one hot encoding.

Word2Vec

Word2vec is a group of related models that are used to produce word embeddings. Word2Vec is a way to vector the meaning of a word to reflect similarities between words.

There ara two ways of Word2Vec. One is CBOW(Continuous Bag of Words) and the other is Skip-gram.

CBOW(Continuous Bag of Words)

CBOW predicts center word from context words.

We input one hot encoding of context words to input layer of word2vec It outputs one hot encoding of center word. There is a difference that CBOW needs to average about vectors after projection. But Skip-gram's input is one, so it does not need to.

We can decide around word's count. This range is called a 'window'. If we have set the size of the window, we can make a data set to changing center word and context words.

for example)

  • Sentece is "I want to go market because I want to eat fresh fruits."
  • window is 2

sliding_window

Picture is a total dataset for CBOW.

SmartSelectImage_2020-01-08-16-21-00

This picture shows artificial neural network of CBOW. Context words input to input layer. Center word outputs from output layer. So Word2Vec needs center word's one-hor vector.

CBOW get a average that about context words multiply with weight. Because inputs count is big than 2 but output count is only 1.

Skip-gram

Skip-gram predicts context words from center word.

We input one hot encoding of center word to input layer of word2vec. It outputs one hot encoding of surround words.

SmartSelectImage_2020-01-08-17-01-45

This picture shows artificial neural network of Skip-gram. It has one Input so there isn't a process getting average about inputs.

Skip-gram is known to perform better than CBOW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment