Skip to content

Instantly share code, notes, and snippets.

@prateekjoshi565
Created July 17, 2020 14:55
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Embed
What would you like to do?
# tokenize and encode sequences in the training set
tokens_train = tokenizer.batch_encode_plus(
train_text.tolist(),
max_length = 25,
pad_to_max_length=True,
truncation=True
)
# tokenize and encode sequences in the validation set
tokens_val = tokenizer.batch_encode_plus(
val_text.tolist(),
max_length = 25,
pad_to_max_length=True,
truncation=True
)
# tokenize and encode sequences in the test set
tokens_test = tokenizer.batch_encode_plus(
test_text.tolist(),
max_length = 25,
pad_to_max_length=True,
truncation=True
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment