Skip to content

Instantly share code, notes, and snippets.

@radi-cho
Created February 18, 2023 16:49
Show Gist options
  • Save radi-cho/f65b48908694c3eb4dcddd5578a99cde to your computer and use it in GitHub Desktop.
Save radi-cho/f65b48908694c3eb4dcddd5578a99cde to your computer and use it in GitHub Desktop.
dataset = load_dataset("csv", data_files="train.csv")
dataset = dataset["train"].shuffle(seed=42)
def preprocess_function(examples):
padding = "max_length"
max_length = 200
inputs = [ex for ex in examples["Text"]]
targets = [ex for ex in examples["Expected"]]
model_inputs = tokenizer(inputs, max_length=max_length, padding=padding, truncation=True)
labels = tokenizer(targets, max_length=max_length, padding=padding, truncation=True)
model_inputs["labels"] = labels["input_ids"]
return model_inputs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment