Skip to content

Instantly share code, notes, and snippets.

@radi-cho
Created February 18, 2023 17:01
Show Gist options
  • Save radi-cho/a9f7020a27f42d5ddcfb4a98c25f8b81 to your computer and use it in GitHub Desktop.
Save radi-cho/a9f7020a27f42d5ddcfb4a98c25f8b81 to your computer and use it in GitHub Desktop.
train_dataset = dataset.map(preprocess_function, batched=True, desc="Running tokenizer")
data_collator = DataCollatorForSeq2Seq(
tokenizer,
model=model,
label_pad_token_id=tokenizer.pad_token_id,
pad_to_multiple_of=64,
return_tensors="np")
tf_train_dataset = model.prepare_tf_dataset(
train_dataset,
collate_fn=data_collator,
batch_size=8,
shuffle=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment