Skip to content

Instantly share code, notes, and snippets.

@abacaj
abacaj / train.py
Last active February 25, 2025 22:52
extending GRPOTrainer to run gsm8k eval during training
import tqdm
import numpy as np
import torch
import torch.distributed as dist
import transformers
def extract_xml_answer(text: str) -> str:
answer = text.split("<final_answer>")[-1]
answer = answer.split("</final_answer>")[0]
return answer.strip()