Skip to content

Instantly share code, notes, and snippets.

@sshleifer
Last active September 8, 2020 21:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sshleifer/187230aafe9651868dafdeb6a4395751 to your computer and use it in GitHub Desktop.
Save sshleifer/187230aafe9651868dafdeb6a4395751 to your computer and use it in GitHub Desktop.
took 25hr
python finetune.py \
--task summarization \
--learning_rate=3e-4 \
--do_train \
--do_predict \
--val_check_interval 0.25 --n_val 1000 \
--data_dir xsum \
--max_source_length 512 --max_target_length=56 \
--freeze_embeds \
--model_name_or_path google/pegasus-large \
--tokenizer_name google/pegasus-xsum \
--warmup_steps 500 \
--dropout 0.1 --attention_dropout 0.1 --label_smoothing 0.1 \
--train_batch_size=8 --eval_batch_size=8 --gradient_accumulation_steps=4 \
--logger_name wandb \
--sortish_sampler --gpus 1 \
--output_dir xsum_ft_ls_mask_fix --num_train_epochs 6 --adafactor
@sshleifer
Copy link
Author

Final checkpoint

finetune: {"rouge1": 46.8367, "rouge2": 24.0701, "rougeL": 38.8658, "n_obs": 11333, "runtime": 3889.4276719093323, "seconds_per_sample": 0.3432}
  • this ran in 25hr on Nvidia-RTX 24GB GPU. on transformers master commit 9336086ab. with torch 1.5.1 + apex + examples/requirements.txt current master should work.
  • since n_val=1000, the validation rouge will be lower. Mine only got to 22.5
  • Wandb logs here
  • you can pass --eval_beams=2 to go faster.
  • The paper recommends effective batch size=256, but this (accidentally) only uses ebs=8*4=32.
  • If you pass gpus>1, you cannot use sortish sampler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment