Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
The developers behind BERT have added a specific set of rules to represent the input text for the model. Many of these are creative design choices that make the model even better.
For starters, every input embedding is a combination of 3 embeddings:
Position Embeddings: BERT learns and uses positional embeddings to express the position of words in a sentence. These are added to overcome the limitation of Transformer which, unlike an RNN, is not able to capture “sequence” or “order” information
Segment Embeddings: BERT can also take sentence pairs as inputs for tasks (Question-Answering). That’s why it learns a unique embedding for the first and the second sentences to help the model distinguish between them. In the above example, all the tokens marked as EA belong to sentence A (and similarly for EB)
Token Embeddings: These are the embeddings learned for the specific token from the WordPiece token vocabulary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment