Skip to content

Instantly share code, notes, and snippets.

@khuangaf
Created March 14, 2020 16:59
Show Gist options
  • Save khuangaf/2bf8fcb799efb2cfd76da79e9ba13e52 to your computer and use it in GitHub Desktop.
Save khuangaf/2bf8fcb799efb2cfd76da79e9ba13e52 to your computer and use it in GitHub Desktop.
all_doc_tokens=['SEP']
orig_to_tok_index=[]
for (i, word) in enumerate(words):
orig_to_tok_index.append(len(all_doc_tokens))
sub_tokens = tokenizer.tokenize(token)
all_doc_tokens.extend(sub_tokens)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment