Skip to content

Instantly share code, notes, and snippets.

@mohdsanadzakirizvi
Created December 10, 2019 05:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mohdsanadzakirizvi/1788c8cfcb318f19898e556203b48247 to your computer and use it in GitHub Desktop.
Save mohdsanadzakirizvi/1788c8cfcb318f19898e556203b48247 to your computer and use it in GitHub Desktop.
The developers behind BERT have added a specific set of rules to represent the input text for the model. Many of these are creative design choices that make the model even better.
For starters, every input embedding is a combination of 3 embeddings:
Position Embeddings: BERT learns and uses positional embeddings to express the position of words in a sentence. These are added to overcome the limitation of Transformer which, unlike an RNN, is not able to capture “sequence” or “order” information
Segment Embeddings: BERT can also take sentence pairs as inputs for tasks (Question-Answering). That’s why it learns a unique embedding for the first and the second sentences to help the model distinguish between them. In the above example, all the tokens marked as EA belong to sentence A (and similarly for EB)
Token Embeddings: These are the embeddings learned for the specific token from the WordPiece token vocabulary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment