Skip to content

Instantly share code, notes, and snippets.

@prateekjoshi565
Created June 27, 2020 08:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save prateekjoshi565/f893b034cbf7271e77ce749afb9a9896 to your computer and use it in GitHub Desktop.
Save prateekjoshi565/f893b034cbf7271e77ce749afb9a9896 to your computer and use it in GitHub Desktop.
# initialize tokenizer
tokenizer = BertWordPieceTokenizer("bert-base-uncased-vocab.txt")
# sample sentence
sentence = "Language is a thing of beauty. But mastering a new language from scratch is quite a daunting prospect."
# tokenize the sample sentence
encoded_output = tokenizer.encode(sentence)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment