Skip to content

Instantly share code, notes, and snippets.

@xwang233
Created May 25, 2022 17:14
Show Gist options
  • Save xwang233/f3b5b4818762b08d716f969899b6d263 to your computer and use it in GitHub Desktop.
Save xwang233/f3b5b4818762b08d716f969899b6d263 to your computer and use it in GitHub Desktop.

Some preliminary training results with cuda graph: training ResNet50 with AMP + channels-last for 10 epochs.

  • V100x8 (bs=128)

    • Eager
    torchrun --nproc_per_node 8 --max_restarts 0 train.py --model resnet50 -b 128 --epochs 10 --warmup-epochs 0 --cooldown-epochs 0 --amp --channels-last /data/imagenet/
    
    epoch train_loss eval_loss eval_top1 eval_top5
    0 6.208239749625877 5.4027525075722975 6.706291016931932 18.407894418980028
    1 5.615560814186379 4.84901764841102 11.784538088528254 28.218376642844003
    2 5.136796633402507 4.323157276166494 17.545118271890445 37.504807807146385
    3 4.7953448118986906 3.7626026542104376 24.498786026492866 47.963067016806974
    4 4.507016429194698 3.423314337484074 29.635703380733915 54.41063787462666
    5 4.289653071650752 3.1307527017626624 34.84173296848144 60.41882016578028
    6 4.11788868021082 3.0150971740606747 36.54279549924991 62.23346715083606
    7 4.026934923949065 2.9002563913796005 38.74725053606075 64.3413152978748
    8 3.939338940161246 2.751310221549797 42.14862436166036 67.83170869418694
    9 3.88635614183214 2.6796389289330875 43.1327976594622 68.64858529996367
    Train: 9 [1200/1299 ( 92%)]  Loss: 3.847 (3.89)  Time: 0.450s, 2274.66/s  (0.245s, 4182.47/s)  LR: 1.225e-03  Data: 0.317 (0.063)
    Train: 9 [1250/1299 ( 96%)]  Loss: 3.821 (3.89)  Time: 0.161s, 6350.47/s  (0.245s, 4182.58/s)  LR: 1.225e-03  Data: 0.010 (0.062)
    Train: 9 [1298/1299 (100%)]  Loss: 3.819 (3.89)  Time: 0.129s, 7928.78/s  (0.245s, 4183.51/s)  LR: 1.225e-03  Data: 0.000 (0.062)
    
    • Cuda graph
    torchrun --nproc_per_node 8 --max_restarts 0 train.py --model resnet50 -b 128 --epochs 10 --warmup-epochs 0 --cooldown-epochs 0 --amp --channels-last --cuda-graph /data/imagenet/
    
    epoch train_loss eval_loss eval_top1 eval_top5
    0 6.194712073714645 5.374842243803937 7.221778167744417 19.288399360103043
    1 5.599122329994485 4.854270178519917 11.482096924443994 27.745333420651164
    2 5.13392992372866 4.24286096871319 18.23278504153416 38.81072861970785
    3 4.806759851950186 3.8029978184571744 24.27544832368313 47.474248177392134
    4 4.519653938434742 3.5016416973665234 28.911752687646477 53.29424986269254
    5 4.3088924796492964 3.142594792693334 34.46597274027953 60.03780138908967
    6 4.135235415564643 2.996300570251088 37.239552033142694 62.895968051199624
    7 4.05128632651435 2.8545036201290896 39.98225618257566 65.62597658652584
    8 3.9748768188335277 2.792614549778841 41.3593926558704 66.97862328403016
    9 3.9006914827558727 2.710650674820909 42.663585658256714 68.15420743085355
    Train: 9 [1200/1299 ( 92%)]  Loss: 3.919 (3.90)  Time: 0.143s, 7166.69/s  (0.247s, 4138.47/s)  LR: 1.225e-03  Data: 0.017 (0.076)
    Train: 9 [1250/1299 ( 96%)]  Loss: 3.886 (3.90)  Time: 0.142s, 7202.44/s  (0.248s, 4137.12/s)  LR: 1.225e-03  Data: 0.019 (0.078)
    Train: 9 [1298/1299 (100%)]  Loss: 3.888 (3.90)  Time: 0.120s, 8501.35/s  (0.247s, 4141.28/s)  LR: 1.225e-03  Data: 0.000 (0.079)
    
    • Cuda graph + nvfuser
    torchrun --nproc_per_node 8 --max_restarts 0 train.py --model resnet50 -b 128 --epochs 10 --warmup-epochs 0 --cooldown-epochs 0 --amp --channels-last --cuda-graph --torchscript --fuser nvfuser /data/imagenet/
    
    epoch train_loss eval_loss eval_top1 eval_top5
    0 6.190722483175772 5.393998541130796 7.169117646646166 19.13484999611904
    1 5.6089883733678745 4.870650530211063 11.33252902567285 27.376108800142113
    2 5.142191533689146 4.20071156495649 18.489401789788353 39.299923072139464
    3 4.791466571666576 3.6729673677552346 25.908825933224 49.83405550614197
    4 4.50929613466616 3.5278229344455396 29.02368446274352 53.5518431937128
    5 4.306085250995777 3.2050700024339527 33.950635829565975 59.22911307560646
    6 4.127599504258898 3.005733451750643 37.03559580676622 62.60674835291645
    7 4.046973157812048 2.9310232160779104 38.84295596139683 64.33500504489702
    8 3.975538492202759 2.771382490790043 41.74912557920821 67.32456008411175
    9 3.8938007531342684 2.7126073681158203 42.740961321277304 68.26058018074869
    Train: 9 [1200/1299 ( 92%)]  Loss: 3.936 (3.90)  Time: 0.142s, 7197.36/s  (0.245s, 4178.45/s)  LR: 1.225e-03  Data: 0.012 (0.025)
    Train: 9 [1250/1299 ( 96%)]  Loss: 3.850 (3.90)  Time: 0.163s, 6294.34/s  (0.245s, 4176.74/s)  LR: 1.225e-03  Data: 0.018 (0.025)
    Train: 9 [1298/1299 (100%)]  Loss: 3.849 (3.89)  Time: 0.124s, 8227.80/s  (0.245s, 4180.31/s)  LR: 1.225e-03  Data: 0.000 (0.024)
    
  • A100x8 (bs=32)

    • Eager
    torchrun --nproc_per_node 8 --max_restarts 0 train.py --model resnet50 -b 32 --epochs 10 --warmup-epochs 0 --cooldown-epochs 0 --amp --channels-last /tmp/imagenet/
    
    epoch train_loss eval_loss eval_top1 eval_top5
    0 5.8870961946599625 4.6462725 12.932 30.648
    1 4.863906893075681 3.85366 22.836 45.822
    2 4.26885768478992 3.04082375 34.858 61.108
    3 3.90367591381073 2.7670975 40.652 66.7
    4 3.625600024765613 2.46657625 45.826 71.8
    5 3.436457358154596 2.262946875 50.286 75.416
    6 3.3094739773694206 2.041573125 54.092 78.136
    7 3.190333118625716 1.951306875 56.368 79.976
    8 3.0618435331419405 1.854063125 58.456 81.4
    9 3.0082468332028856 1.80154 59.296 82.124
    Train: 9 [4950/5004 ( 99%)]  Loss: 2.908 (3.01)  Time: 0.049s, 5236.64/s  (0.045s, 5669.49/s)  LR: 1.225e-03  Data: 0.002 (0.003)
    Train: 9 [5000/5004 (100%)]  Loss: 3.279 (3.01)  Time: 0.043s, 6002.48/s  (0.045s, 5664.61/s)  LR: 1.225e-03  Data: 0.002 (0.003)
    Train: 9 [5003/5004 (100%)]  Loss: 2.997 (3.01)  Time: 0.036s, 7206.90/s  (0.045s, 5665.04/s)  LR: 1.225e-03  Data: 0.000 (0.003)
    
    • Cuda graph
    torchrun --nproc_per_node 8 --max_restarts 0 train.py --model resnet50 -b 32 --epochs 10 --warmup-epochs 0 --cooldown-epochs 0 --amp --channels-last --cuda-graph /tmp/imagenet/
    
    epoch train_loss eval_loss eval_top1 eval_top5
    0 5.890629614100737 4.7512475 11.534 28.57
    1 4.871789721881642 3.75664875 24.584 48.088
    2 4.272617749139374 3.11374625 33.974 60.054
    3 3.9079988119648954 2.72658375 41.016 67.068
    4 3.630196543300853 2.4879925 46.01 71.786
    5 3.442710937238207 2.250521875 50.266 75.216
    6 3.30540766435511 2.066018125 53.372 77.684
    7 3.192443331082662 1.969358125 56.064 79.716
    8 3.07297398296057 1.847984375 58.37 81.576
    9 3.016620549501157 1.812573125 59.2 81.964
    Train: 9 [4950/5004 ( 99%)]  Loss: 2.900 (3.02)  Time: 0.027s, 9503.99/s  (0.039s, 6628.00/s)  LR: 1.225e-03  Data: 0.002 (0.003)
    Train: 9 [5000/5004 (100%)]  Loss: 3.191 (3.02)  Time: 0.026s, 9709.21/s  (0.039s, 6629.02/s)  LR: 1.225e-03  Data: 0.003 (0.003)
    Train: 9 [5003/5004 (100%)]  Loss: 2.920 (3.02)  Time: 0.024s, 10806.47/s  (0.039s, 6630.37/s)  LR: 1.225e-03  Data: 0.000 (0.003)
    
    • Cuda graph + nvfuser
    torchrun --nproc_per_node 8 --max_restarts 0 train.py --model resnet50 -b 32 --epochs 10 --warmup-epochs 0 --cooldown-epochs 0 --amp --channels-last --cuda-graph --torchscript --fuser nvfuser /tmp/imagenet/
    
    epoch train_loss eval_loss eval_top1 eval_top5
    0 5.8854453563690186 4.75567875 11.644 28.4
    1 4.874088768865548 3.69351 25.398 49.776
    2 4.281818831668181 3.09722875 34.388 60.428
    3 3.9115645581600713 2.747608125 40.552 67.044
    4 3.6315714003993014 2.51176 45.83 71.612
    5 3.4473784530864044 2.245996875 50.356 75.332
    6 3.3047901064741847 2.058786875 53.322 77.758
    7 3.1903992306952382 1.966245 56.096 79.876
    8 3.075165505502738 1.840585625 58.364 81.478
    9 3.016200960851183 1.805793125 59.274 81.924
    Train: 9 [4950/5004 ( 99%)]  Loss: 2.891 (3.02)  Time: 0.028s, 9106.68/s  (0.038s, 6670.38/s)  LR: 1.225e-03  Data: 0.004 (0.003)
    Train: 9 [5000/5004 (100%)]  Loss: 3.169 (3.02)  Time: 0.026s, 9730.15/s  (0.038s, 6671.21/s)  LR: 1.225e-03  Data: 0.003 (0.003)
    Train: 9 [5003/5004 (100%)]  Loss: 2.939 (3.02)  Time: 0.024s, 10746.45/s  (0.038s, 6672.33/s)  LR: 1.225e-03  Data: 0.000 (0.003)
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment