Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# initialize tokenizer
tokenizer = BertWordPieceTokenizer("bert-base-uncased-vocab.txt")
# sample sentence
sentence = "Language is a thing of beauty. But mastering a new language from scratch is quite a daunting prospect."
# tokenize the sample sentence
encoded_output = tokenizer.encode(sentence)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.