Skip to content

Instantly share code, notes, and snippets.

@taylanbil
Created July 22, 2020 21:30
Show Gist options
  • Save taylanbil/aea9fcea4cac051fbbd3d490d7e4782c to your computer and use it in GitHub Desktop.
Save taylanbil/aea9fcea4cac051fbbd3d490d7e4782c to your computer and use it in GitHub Desktop.
gpu vs tpu comparison for kkissmat
On commit 1f8ccaaf71b15f22e447866233e7d5e395928cab
# GPU COMMAND - 8 gpus
```bash
python /home/taylanbil/kkissmart-fairseq/tpu_fairseq/train.py $FULLDATAPATH --encoder-normalize-before --decoder-normalize-before --arch mbart_base --layernorm-embedding --task multilingual_denoising --criterion cross_entropy --dataset-impl mmap --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --lr-scheduler polynomial_decay --lr 1e-04 --min-lr -1 --warmup-updates 0 --total-num-update 500000 --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.0 --max-tokens 4104 --seed 2 --log-format simple --log-interval 100 --add-lang-token --no-whole-word-mask-langs IMG --mask 0.35 --permute-sentences 1.0 --mask-length span-poisson --replace-length 1 --rotate 0.0 --max-source-positions 1026 --max-target-positions 1026 --tokens-per-sample 1026 --sample-break-mode complete --save-interval-updates 500 --skip-invalid-size-inputs-valid-test --langs EN,IMG --no-bos --no-input-eos --multilang-sampling-alpha 0.5 --max-sentences 4 --no-save --fp16 --num-buckets 1
```
# TPU COMMAND - v3-8
```bash
python /home/taylanbil/kkissmart-fairseq/tpu_fairseq/train.py $FULLDATAPATH --encoder-normalize-before --decoder-normalize-before --arch mbart_base --layernorm-embedding --task multilingual_denoising --criterion cross_entropy --dataset-impl mmap --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --lr-scheduler polynomial_decay --lr 1e-04 --min-lr -1 --warmup-updates 0 --total-num-update 500000 --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.0 --max-tokens 4104 --seed 2 --log-format simple --log-interval 100 --add-lang-token --no-whole-word-mask-langs IMG --mask 0.35 --permute-sentences 1.0 --mask-length span-poisson --replace-length 1 --rotate 0.0 --max-source-positions 1026 --max-target-positions 1026 --tokens-per-sample 1026 --sample-break-mode complete --save-interval-updates 500 --skip-invalid-size-inputs-valid-test --langs EN,IMG --no-bos --no-input-eos --multilang-sampling-alpha 0.5 --max-sentences 4 --no-save --tpu --num-buckets 1 --distributed-world-size 8
```
@kkissmart
Copy link

GPU Command:

python train.py /data/image/image-text-data-bin --encoder-normalize-before --decoder-normalize-before --arch mbart_base --layernorm-embedding --task multilingual_denoising --criterion cross_entropy --dataset-impl mmap --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --lr-scheduler polynomial_decay --lr 1e-04 --min-lr -1 --warmup-updates 0 --total-num-update 500000 --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.0 --max-tokens 1026 --seed 2 --log-format simple --log-interval 100 --add-lang-token --no-whole-word-mask-langs IMG --mask 0.35 --permute-sentences 1.0 --mask-length span-poisson --replace-length 1 --rotate 0.0 --max-source-positions 1026 --max-target-positions 1026 --tokens-per-sample 1026 --sample-break-mode complete --save-interval-updates 500 --skip-invalid-size-inputs-valid-test --langs EN,IMG --no-bos --no-input-eos --multilang-sampling-alpha 0.5 --max-sentences 4 --no-save --fp16 --num-buckets 1 --distributed-world-size 1 --save-dir dummy_2

TPU command:
python train.py $HOME/image-text-data-bin/ --encoder-normalize-before --decoder-normalize-before --arch mbart_base --layernorm-embedding --task multilingual_denoising --criterion cross_entropy --dataset-impl mmap --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --lr-scheduler polynomial_decay --lr 1e-04 --min-lr -1 --warmup-updates 0 --total-num-update 500000 --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.0 --max-tokens 1026 --seed 2 --log-format simple --log-interval 100 --add-lang-token --no-whole-word-mask-langs IMG --mask 0.35 --permute-sentences 1.0 --mask-length span-poisson --replace-length 1 --rotate 0.0 --max-source-positions 1026 --max-target-positions 1026 --tokens-per-sample 1026 --sample-break-mode complete --save-interval-updates 500 --skip-invalid-size-inputs-valid-test --langs EN,IMG --no-bos --no-input-eos --multilang-sampling-alpha 0.5 --max-sentences 4 --no-save --tpu --num-buckets 1 --distributed-world-size 1 --save-dir dummy_2

@taylanbil
Copy link
Author

lgtm;

diff <( cat gpu|tr ' ' '\n' ) <(cat tpu| tr ' ' '\n' )
3c3
< /data/image/image-text-data-bin
---
> $HOME/image-text-data-bin/
79c79
< --fp16
---
> --tpu

@kkissmart
Copy link

GPU:

2020-07-22 22:30:37 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 15.503 | ppl 46431.2 | wps 24473.9 | wpb 1015 | bsz 1 | num_updates 500

TPU:

2020-07-22 22:51:28 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 15.272 | ppl 39554.8 | wps 18371.1 | wpb 1015 | bsz 1 | num_updates 500

@kkissmart
Copy link

2020-07-22 21:50:41 | INFO | fairseq.optim.adam | using FusedAdam
2020-07-22 21:50:59 | INFO | train_inner | epoch 001: 100 / 5904173 loss=11.752, ppl=3448.72, wps=7067.4, ups=7.49, wpb=943.8, bsz=1, num_updates=100, lr=9.998e-05, gnorm=2.073, loss_scale=128, train_wall=14, wall=105
2020-07-22 21:51:12 | INFO | train_inner | epoch 001: 200 / 5904173 loss=10.529, ppl=1477.85, wps=7180.3, ups=7.64, wpb=940.3, bsz=1, num_updates=200, lr=9.996e-05, gnorm=1.817, loss_scale=128, train_wall=13, wall=118
2020-07-22 21:51:25 | INFO | train_inner | epoch 001: 300 / 5904173 loss=10.09, ppl=1090.27, wps=7049.6, ups=7.59, wpb=928.8, bsz=1, num_updates=300, lr=9.994e-05, gnorm=1.733, loss_scale=128, train_wall=13, wall=131
2020-07-22 21:51:38 | INFO | train_inner | epoch 001: 400 / 5904173 loss=9.999, ppl=1023.5, wps=7048.5, ups=7.54, wpb=934.8, bsz=1, num_updates=400, lr=9.992e-05, gnorm=1.746, loss_scale=128, train_wall=13, wall=145
2020-07-22 21:51:52 | INFO | train_inner | epoch 001: 500 / 5904173 loss=9.633, ppl=793.85, wps=6953.3, ups=7.51, wpb=925.6, bsz=1, num_updates=500, lr=9.99e-05, gnorm=1.718, loss_scale=128, train_wall=13, wall=158
2020-07-22 22:30:37 | INFO | fairseq.data.iterators | Data loading buffer is empty or nearly empty. This may indicate a data loading bottleneck, and increasing the number of workers (--num-workers) may help.
2020-07-22 22:30:37 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 15.503 | ppl 46431.2 | wps 24473.9 | wpb 1015 | bsz 1 | num_updates 500
2020-07-22 22:34:53 | INFO | valid_EN | epoch 001 | valid on 'valid_EN' subset | loss 9.565 | ppl 757.47 | wps 22327.5 | wpb 933.6 | bsz 1 | num_updates 500

2020-07-22 21:57:18 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:57:36 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:57:55 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:58:20 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:58:30 | INFO | train_inner | epoch 001: 100 / 5904173 loss=11.444, ppl=2785.4, wps=0, ups=0, wpb=964, bsz=1, num_updates=100, lr=9.998e-05, gnorm=1.653, train_wall=81, wall=276
2020-07-22 21:58:30 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:58:50 | INFO | train_inner | epoch 001: 200 / 5904173 loss=10.332, ppl=1288.97, wps=50, ups=0.05, wpb=992, bsz=1, num_updates=200, lr=9.996e-05, gnorm=1.7, train_wall=11, wall=296
2020-07-22 21:58:50 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:59:10 | INFO | train_inner | epoch 001: 300 / 5904173 loss=10.399, ppl=1350.54, wps=46.1, ups=0.05, wpb=925, bsz=1, num_updates=300, lr=9.994e-05, gnorm=1.88, train_wall=12, wall=316
2020-07-22 21:59:30 | INFO | train_inner | epoch 001: 400 / 5904173 loss=10.276, ppl=1239.59, wps=46.2, ups=0.05, wpb=923, bsz=1, num_updates=400, lr=9.992e-05, gnorm=2.283, train_wall=12, wall=336
2020-07-22 21:59:50 | INFO | train_inner | epoch 001: 500 / 5904173 loss=9.604, ppl=778.11, wps=35.7, ups=0.05, wpb=718, bsz=1, num_updates=500, lr=9.99e-05, gnorm=1.836, train_wall=12, wall=356
/home/xwu/tpu_fairseq/fairseq/data/denoising_dataset.py:296: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
word_starts = is_word_start.nonzero()
2020-07-22 22:51:27 | INFO | fairseq.data.iterators | Data loading buffer is empty or nearly empty. This may indicate a data loading bottleneck, and increasing the number of workers (--num-workers) may help.
2020-07-22 22:51:28 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 15.272 | ppl 39554.8 | wps 18371.1 | wpb 1015 | bsz 1 | num_updates 500
/home/xwu/tpu_fairseq/fairseq/data/denoising_dataset.py:296: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(
, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
word_starts = is_word_start.nonzero()
2020-07-22 22:57:27 | INFO | fairseq.data.iterators | Data loading buffer is empty or nearly empty. This may indicate a data loading bottleneck, and increasing the number of workers (--num-workers) may help.
2020-07-22 22:57:28 | INFO | valid_EN | epoch 001 | valid on 'valid_EN' subset | loss 9.564 | ppl 756.97 | wps 15873.9 | wpb 933.6 | bsz 1 | num_updates 500
/home/xwu/tpu_fairseq/fairseq/data/denoising_dataset.py:296: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
word_starts = is_word_start.nonzero()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment