Skip to content

Instantly share code, notes, and snippets.

@shahbazsyed
Forked from tomsherborne/bart-train-example.sh
Created February 17, 2021 14:26
Show Gist options
  • Save shahbazsyed/3e876cc66d24d20470e7cfc9e4b8a14e to your computer and use it in GitHub Desktop.
Save shahbazsyed/3e876cc66d24d20470e7cfc9e4b8a14e to your computer and use it in GitHub Desktop.
fairseq-train qa_en_small-bin \
--log-interval=10 \
--log-format=json \
--tensorboard-logdir=/users/tom/ed/sp/pretrain/tests/fairseq/bart_en_small/logs \
--seed=1 \
--cpu \
--min-loss-scale=0.0001 \
--model-parallel-size=1 \
--criterion=cross_entropy \
--optimizer=adam \
--lr-scheduler=polynomial_decay \
--task=denoising \
--num-workers=4 \
--skip-invalid-size-inputs-valid-test \
--max-tokens=8192 \
--required-batch-size-multiple=1 \
--train-subset=train \
--valid-subset=valid \
--validate-interval=1 \
--max-tokens-valid=8192 \
--bucket-cap-mb=25 \
--arch=bart_base \
--max-update=100 \
--clip-norm=0.1 \
--update-freq=[2] \
--lr=[0.0006] \
--min-lr=-1 \
--save-dir=/users/tom/ed/sp/pretrain/tests/fairseq/bart_en_small/logs \
--restore-file=checkpoint_last.pt \
--optimizer-overrides={} \
--save-interval=1 \
--save-interval-updates=50 \
--keep-interval-updates=-1 \
--keep-last-epochs=-1 \
--keep-best-checkpoints=-1 \
--no-epoch-checkpoints \
--best-checkpoint-metric=loss \
--patience=-1 \
--adam-betas="(0.9, 0.98)" \
--adam-eps=1e-06 \
--weight-decay=0.01 \
--warmup-updates=15 \
--power=1 \
--total-num-update=100 \
--tokens-per-sample=512 \
--sample-break-mode=complete_doc \
--mask=0.3 \
--mask-random=0.1 \
--poisson-lambda=3.5 \
--permute-sentences=1 \
--mask-length=span-poisson \
--replace-length=1 \
--max-source-positions=1024 \
--max-target-positions=1024 \
--dropout=0.1 \
--attention-dropout=0.1 \
--share-all-embeddings \
--layernorm-embedding \
--share-decoder-input-output-embed \
--encoder-embed-dim=768 \
--encoder-ffn-embed-dim=3072 \
--encoder-layers=6 \
--encoder-attention-heads=12 \
--decoder-layers=6 --decoder-attention-heads=12 \
--encoder-learned-pos \
--decoder-embed-dim=768 \
--decoder-ffn-embed-dim=3072 \
--decoder-learned-pos \
--decoder-output-dim=768 \
--no-scale-embedding \
--activation-fn=gelu \
--pooler-activation-fn=tanh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment