Skip to content

Instantly share code, notes, and snippets.

@prateekjoshi565
Created July 17, 2020 14:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save prateekjoshi565/9a6d3138e1c43cef3db98909800a0377 to your computer and use it in GitHub Desktop.
Save prateekjoshi565/9a6d3138e1c43cef3db98909800a0377 to your computer and use it in GitHub Desktop.
# tokenize and encode sequences in the training set
tokens_train = tokenizer.batch_encode_plus(
train_text.tolist(),
max_length = 25,
pad_to_max_length=True,
truncation=True
)
# tokenize and encode sequences in the validation set
tokens_val = tokenizer.batch_encode_plus(
val_text.tolist(),
max_length = 25,
pad_to_max_length=True,
truncation=True
)
# tokenize and encode sequences in the test set
tokens_test = tokenizer.batch_encode_plus(
test_text.tolist(),
max_length = 25,
pad_to_max_length=True,
truncation=True
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment