Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
# initialize tokenizer
tokenizer = BertWordPieceTokenizer("bert-base-uncased-vocab.txt")
# sample sentence
sentence = "Language is a thing of beauty. But mastering a new language from scratch is quite a daunting prospect."
# tokenize the sample sentence
encoded_output = tokenizer.encode(sentence)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment