Skip to content

Instantly share code, notes, and snippets.

@xwang233
Created May 26, 2022 20:47
Show Gist options
  • Save xwang233/d5136facb3361af54693081da346fd33 to your computer and use it in GitHub Desktop.
Save xwang233/d5136facb3361af54693081da346fd33 to your computer and use it in GitHub Desktop.

Some preliminary training results with cuda graph: training ResNet50 with FP32 + contiguous for 10 epochs.

  • A100x8 (bs=128)

    • Eager (6499.87/s, 38.21)
    00:34:26 Train: 9 [1150/1251 ( 92%)]  Loss: 3.974 (3.99)  Time: 0.156s, 6560.68/s  (0.158s, 6497.20/s)  LR: 1.225e-03  Data: 0.008 (0.008)
    00:34:34 Train: 9 [1200/1251 ( 96%)]  Loss: 3.920 (3.98)  Time: 0.156s, 6561.68/s  (0.158s, 6498.66/s)  LR: 1.225e-03  Data: 0.008 (0.008)
    00:34:42 Train: 9 [1250/1251 (100%)]  Loss: 3.985 (3.98)  Time: 0.148s, 6925.95/s  (0.158s, 6499.87/s)  LR: 1.225e-03  Data: 0.000 (0.008)
    00:34:42 Distributing BatchNorm running means and vars
    00:34:43 Test: [   0/48]  Time: 0.816 (0.816)  Loss:  2.0477 (2.0477)  Acc@1: 56.5430 (56.5430)  Acc@5: 81.6406 (81.6406)
    00:34:51 Test: [  48/48]  Time: 0.038 (0.177)  Loss:  1.9639 (2.9433)  Acc@1: 62.3821 (38.2100)  Acc@5: 79.7170 (63.8240)
    00:34:51 Current checkpoints:
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-9.pth.tar', 38.21000001342773)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-8.pth.tar', 37.536000040283206)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-7.pth.tar', 34.857999938964845)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-6.pth.tar', 32.905999956054686)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-5.pth.tar', 29.288000021972657)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-4.pth.tar', 24.774000018310545)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-3.pth.tar', 21.122)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-2.pth.tar', 15.742000043945312)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-1.pth.tar', 10.913999971923829)
    00:34:51  ('./output/train/20220526-033005-resnet50-224/checkpoint-0.pth.tar', 6.547999993896484)
    
    • Cuda graph (6608.18/s, 43.19)
    00:33:55 Train: 9 [1150/1251 ( 92%)]  Loss: 3.615 (3.61)  Time: 0.159s, 6444.43/s  (0.155s, 6605.29/s)  LR: 1.225e-03  Data: 0.008 (0.010)
    00:34:03 Train: 9 [1200/1251 ( 96%)]  Loss: 3.517 (3.61)  Time: 0.153s, 6673.32/s  (0.155s, 6606.20/s)  LR: 1.225e-03  Data: 0.009 (0.010)
    00:34:11 Train: 9 [1250/1251 (100%)]  Loss: 3.516 (3.61)  Time: 0.142s, 7218.84/s  (0.155s, 6608.18/s)  LR: 1.225e-03  Data: 0.000 (0.010)
    00:34:11 Distributing BatchNorm running means and vars
    00:34:12 Test: [   0/48]  Time: 0.826 (0.826)  Loss:  1.7451 (1.7451)  Acc@1: 62.6953 (62.6953)  Acc@5: 84.5703 (84.5703)
    00:34:20 Test: [  48/48]  Time: 0.039 (0.182)  Loss:  1.7677 (2.6343)  Acc@1: 64.1509 (43.1880)  Acc@5: 81.7217 (69.2660)
    00:34:20 Current checkpoints:
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-9.pth.tar', 43.18799994140625)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-8.pth.tar', 42.345999938964844)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-7.pth.tar', 37.76399999389648)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-6.pth.tar', 35.066000063476565)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-5.pth.tar', 32.66799999389649)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-4.pth.tar', 26.705999991455077)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-3.pth.tar', 22.881999982910155)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-2.pth.tar', 16.074000025634767)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-1.pth.tar', 12.040000034179688)
    00:34:20  ('./output/train/20220526-040504-resnet50-224/checkpoint-0.pth.tar', 6.153999999389648)
    
    • TorchScript + nvfuser (6453.39/s, 38.77)
    00:34:42 Train: 9 [1150/1251 ( 92%)]  Loss: 3.951 (3.97)  Time: 0.160s, 6390.97/s  (0.159s, 6449.55/s)  LR: 1.225e-03  Data: 0.008 (0.009)
    00:34:50 Train: 9 [1200/1251 ( 96%)]  Loss: 3.893 (3.97)  Time: 0.157s, 6529.63/s  (0.159s, 6451.75/s)  LR: 1.225e-03  Data: 0.009 (0.009)
    00:34:58 Train: 9 [1250/1251 (100%)]  Loss: 3.945 (3.96)  Time: 0.148s, 6907.22/s  (0.159s, 6453.39/s)  LR: 1.225e-03  Data: 0.000 (0.009)
    00:34:58 Distributing BatchNorm running means and vars
    00:34:59 Test: [   0/48]  Time: 0.823 (0.823)  Loss:  2.0276 (2.0276)  Acc@1: 57.4219 (57.4219)  Acc@5: 82.4219 (82.4219)
    00:35:07 Test: [  48/48]  Time: 0.037 (0.175)  Loss:  1.9413 (2.9267)  Acc@1: 61.9104 (38.7720)  Acc@5: 79.3632 (64.3180)
    00:35:07 Current checkpoints:
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-9.pth.tar', 38.77200000244141)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-8.pth.tar', 38.05999997802734)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-7.pth.tar', 35.45200008666992)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-6.pth.tar', 32.83400010131836)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-5.pth.tar', 29.510000010986328)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-4.pth.tar', 25.98999998779297)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-3.pth.tar', 20.665999975585937)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-2.pth.tar', 14.357999983520507)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-1.pth.tar', 11.40200004638672)
    00:35:07  ('./output/train/20220526-051328-resnet50-224/checkpoint-0.pth.tar', 6.669999987792969)
    
    • AOT_autograd + nvfuser (6887.96/s, 38.33)
    00:33:29 Train: 9 [1150/1251 ( 92%)]  Loss: 3.975 (3.98)  Time: 0.143s, 7150.03/s  (0.149s, 6881.95/s)  LR: 1.225e-03  Data: 0.008 (0.010)
    00:33:37 Train: 9 [1200/1251 ( 96%)]  Loss: 3.922 (3.98)  Time: 0.144s, 7131.44/s  (0.149s, 6885.85/s)  LR: 1.225e-03  Data: 0.007 (0.010)
    00:33:44 Train: 9 [1250/1251 (100%)]  Loss: 3.967 (3.98)  Time: 0.133s, 7699.01/s  (0.149s, 6887.96/s)  LR: 1.225e-03  Data: 0.000 (0.010)
    00:33:44 Distributing BatchNorm running means and vars
    00:33:45 Test: [   0/48]  Time: 0.810 (0.810)  Loss:  2.0058 (2.0058)  Acc@1: 57.5195 (57.5195)  Acc@5: 82.4219 (82.4219)
    00:33:53 Test: [  48/48]  Time: 0.028 (0.177)  Loss:  1.9564 (2.9429)  Acc@1: 61.3208 (38.3280)  Acc@5: 79.5991 (64.0720)
    00:33:53 Current checkpoints:
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-9.pth.tar', 38.32799994018555)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-8.pth.tar', 37.32000004516602)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-7.pth.tar', 35.39800006835937)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-6.pth.tar', 32.64800006958008)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-5.pth.tar', 29.128000045166015)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-4.pth.tar', 25.896000029296875)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-3.pth.tar', 21.98600001586914)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-2.pth.tar', 14.59600000732422)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-1.pth.tar', 10.74800001953125)
    00:33:53  ('./output/train/20220526-043928-resnet50-224/checkpoint-0.pth.tar', 6.654000009765625)
    
  • A100x8 (bs=32)

    • Eager (5130.81/s, 59.50)
    00:43:45 Train: 9 [4950/5004 ( 99%)]  Loss: 2.912 (3.01)  Time: 0.049s, 5185.56/s  (0.050s, 5133.92/s)  LR: 1.225e-03  Data: 0.002 (0.003)
    00:43:47 Train: 9 [5000/5004 (100%)]  Loss: 3.229 (3.02)  Time: 0.049s, 5235.87/s  (0.050s, 5130.71/s)  LR: 1.225e-03  Data: 0.002 (0.003)
    00:43:48 Train: 9 [5003/5004 (100%)]  Loss: 2.981 (3.02)  Time: 0.047s, 5491.47/s  (0.050s, 5130.81/s)  LR: 1.225e-03  Data: 0.000 (0.003)
    00:43:48 Distributing BatchNorm running means and vars
    00:43:48 Test: [   0/195]  Time: 0.270 (0.270)  Loss:  1.2109 (1.2109)  Acc@1: 73.0469 (73.0469)  Acc@5: 90.2344 (90.2344)
    00:43:50 Test: [  50/195]  Time: 0.016 (0.044)  Loss:  1.0599 (1.4276)  Acc@1: 76.5625 (66.4139)  Acc@5: 92.1875 (88.1510)
    00:43:52 Test: [ 100/195]  Time: 0.017 (0.045)  Loss:  2.5944 (1.5885)  Acc@1: 39.0625 (63.2851)  Acc@5: 69.5312 (85.7673)
    00:43:54 Test: [ 150/195]  Time: 0.041 (0.044)  Loss:  2.1700 (1.7593)  Acc@1: 60.1562 (60.3684)  Acc@5: 75.0000 (82.7685)
    00:43:56 Test: [ 195/195]  Time: 0.007 (0.043)  Loss:  2.8764 (1.8067)  Acc@1: 35.0000 (59.4980)  Acc@5: 66.2500 (82.0880)
    00:43:56 Current checkpoints:
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-9.pth.tar', 59.498)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-8.pth.tar', 58.134)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-7.pth.tar', 56.354)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-6.pth.tar', 53.758)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-5.pth.tar', 51.068)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-4.pth.tar', 46.434)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-3.pth.tar', 42.02)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-2.pth.tar', 34.928)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-1.pth.tar', 23.408)
    00:43:56  ('./output/train/20220526-003803-resnet50-224/checkpoint-0.pth.tar', 11.81)
    
    • Cuda graph (5833.69/s, 57.36)
    00:38:37 Train: 9 [4950/5004 ( 99%)]  Loss: 2.909 (3.01)  Time: 0.043s, 5991.50/s  (0.044s, 5833.30/s)  LR: 1.225e-03  Data: 0.002 (0.003)
    00:38:39 Train: 9 [5000/5004 (100%)]  Loss: 3.171 (3.01)  Time: 0.043s, 5973.89/s  (0.044s, 5833.54/s)  LR: 1.225e-03  Data: 0.002 (0.003)
    00:38:39 Train: 9 [5003/5004 (100%)]  Loss: 2.864 (3.01)  Time: 0.040s, 6353.20/s  (0.044s, 5833.69/s)  LR: 1.225e-03  Data: 0.000 (0.003)
    00:38:39 Distributing BatchNorm running means and vars
    00:38:39 Test: [   0/195]  Time: 0.305 (0.305)  Loss:  1.1444 (1.1444)  Acc@1: 70.7031 (70.7031)  Acc@5: 92.1875 (92.1875)
    00:38:41 Test: [  50/195]  Time: 0.016 (0.045)  Loss:  1.1188 (1.5279)  Acc@1: 76.1719 (63.4191)  Acc@5: 91.0156 (86.7264)
    00:38:43 Test: [ 100/195]  Time: 0.016 (0.045)  Loss:  2.3071 (1.6706)  Acc@1: 52.3438 (61.0767)  Acc@5: 74.2188 (84.5490)
    00:38:46 Test: [ 150/195]  Time: 0.018 (0.046)  Loss:  1.9372 (1.8478)  Acc@1: 65.6250 (58.1100)  Acc@5: 80.8594 (81.4156)
    00:38:48 Test: [ 195/195]  Time: 0.007 (0.045)  Loss:  2.9765 (1.8927)  Acc@1: 33.7500 (57.3600)  Acc@5: 62.5000 (80.6320)
    00:38:48 Current checkpoints:
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-9.pth.tar', 57.36)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-8.pth.tar', 57.068)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-7.pth.tar', 55.386)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-6.pth.tar', 51.982)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-5.pth.tar', 50.092)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-4.pth.tar', 46.658)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-3.pth.tar', 41.436)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-2.pth.tar', 34.518)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-1.pth.tar', 24.962)
    00:38:48  ('./output/train/20220526-012207-resnet50-224/checkpoint-0.pth.tar', 11.35)
    
    • TorchScript + nvfuser (4986.46/s, 59.50)
    00:44:45 Train: 9 [4950/5004 ( 99%)]  Loss: 2.912 (3.01)  Time: 0.052s, 4969.42/s  (0.051s, 4986.28/s)  LR: 1.225e-03  Data: 0.003 (0.003)
    00:44:48 Train: 9 [5000/5004 (100%)]  Loss: 3.229 (3.02)  Time: 0.050s, 5086.10/s  (0.051s, 4986.34/s)  LR: 1.225e-03  Data: 0.003 (0.003)
    00:44:48 Train: 9 [5003/5004 (100%)]  Loss: 2.981 (3.02)  Time: 0.047s, 5419.52/s  (0.051s, 4986.46/s)  LR: 1.225e-03  Data: 0.000 (0.003)
    00:44:48 Distributing BatchNorm running means and vars
    00:44:48 Test: [   0/195]  Time: 0.286 (0.286)  Loss:  1.2109 (1.2109)  Acc@1: 73.0469 (73.0469)  Acc@5: 90.2344 (90.2344)
    00:44:50 Test: [  50/195]  Time: 0.018 (0.045)  Loss:  1.0599 (1.4276)  Acc@1: 76.5625 (66.4139)  Acc@5: 92.1875 (88.1510)
    00:44:52 Test: [ 100/195]  Time: 0.093 (0.046)  Loss:  2.5944 (1.5885)  Acc@1: 39.0625 (63.2851)  Acc@5: 69.5312 (85.7673)
    00:44:54 Test: [ 150/195]  Time: 0.049 (0.045)  Loss:  2.1700 (1.7593)  Acc@1: 60.1562 (60.3684)  Acc@5: 75.0000 (82.7685)
    00:44:56 Test: [ 195/195]  Time: 0.006 (0.044)  Loss:  2.8763 (1.8067)  Acc@1: 35.0000 (59.4980)  Acc@5: 66.2500 (82.0880)
    00:44:57 Current checkpoints:
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-9.pth.tar', 59.498)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-8.pth.tar', 58.134)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-7.pth.tar', 56.354)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-6.pth.tar', 53.758)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-5.pth.tar', 51.068)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-4.pth.tar', 46.434)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-3.pth.tar', 42.02)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-2.pth.tar', 34.928)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-1.pth.tar', 23.408)
    00:44:57  ('./output/train/20220526-024501-resnet50-224/checkpoint-0.pth.tar', 11.81)
    
    • AOT_autograd + nvfuser (5228.49/s, 59.39)
    00:43:40 Train: 9 [4950/5004 ( 99%)]  Loss: 2.874 (3.01)  Time: 0.048s, 5374.41/s  (0.049s, 5227.73/s)  LR: 1.225e-03  Data: 0.003 (0.003)
    00:43:43 Train: 9 [5000/5004 (100%)]  Loss: 3.190 (3.01)  Time: 0.050s, 5129.23/s  (0.049s, 5228.31/s)  LR: 1.225e-03  Data: 0.003 (0.003)
    00:43:43 Train: 9 [5003/5004 (100%)]  Loss: 3.022 (3.01)  Time: 0.044s, 5769.52/s  (0.049s, 5228.49/s)  LR: 1.225e-03  Data: 0.000 (0.003)
    00:43:43 Distributing BatchNorm running means and vars
    00:43:43 Test: [   0/195]  Time: 0.280 (0.280)  Loss:  1.1498 (1.1498)  Acc@1: 75.7812 (75.7812)  Acc@5: 91.0156 (91.0156)
    00:43:45 Test: [  50/195]  Time: 0.034 (0.044)  Loss:  1.0834 (1.4250)  Acc@1: 75.3906 (66.1075)  Acc@5: 92.1875 (88.1357)
    00:43:48 Test: [ 100/195]  Time: 0.014 (0.048)  Loss:  2.6195 (1.5827)  Acc@1: 39.4531 (63.2348)  Acc@5: 69.1406 (85.8137)
    00:43:50 Test: [ 150/195]  Time: 0.016 (0.047)  Loss:  2.1513 (1.7496)  Acc@1: 59.7656 (60.2727)  Acc@5: 76.1719 (83.0220)
    00:43:52 Test: [ 195/195]  Time: 0.010 (0.046)  Loss:  3.1136 (1.7934)  Acc@1: 31.2500 (59.3920)  Acc@5: 60.0000 (82.2660)
    00:43:52 Current checkpoints:
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-9.pth.tar', 59.392)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-8.pth.tar', 58.622)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-7.pth.tar', 56.45)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-6.pth.tar', 54.104)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-5.pth.tar', 51.45)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-4.pth.tar', 46.402)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-3.pth.tar', 40.386)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-2.pth.tar', 34.326)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-1.pth.tar', 21.562)
    00:43:52  ('./output/train/20220526-020101-resnet50-224/checkpoint-0.pth.tar', 10.714)
    
  • V100x8 (bs=64)

    • Eager (2573.18/s, 51.29)
    01:25:18 Train: 9 [2450/2502 ( 98%)]  Loss: 3.451 (3.40)  Time: 0.198s, 2590.48/s  (0.199s, 2572.97/s)  LR: 1.225e-03  Data: 0.010 (0.009)
    01:25:28 Train: 9 [2500/2502 (100%)]  Loss: 3.427 (3.40)  Time: 0.196s, 2618.35/s  (0.199s, 2573.12/s)  LR: 1.225e-03  Data: 0.007 (0.009)
    01:25:28 Train: 9 [2501/2502 (100%)]  Loss: 3.276 (3.40)  Time: 0.187s, 2731.49/s  (0.199s, 2573.18/s)  LR: 1.225e-03  Data: 0.000 (0.009)
    01:25:28 Distributing BatchNorm running means and vars
    01:25:29 Test: [   0/97]  Time: 0.679 (0.679)  Loss:  1.4868 (1.4868)  Acc@1: 66.6016 (66.6016)  Acc@5: 89.2578 (89.2578)
    01:25:36 Test: [  50/97]  Time: 0.064 (0.146)  Loss:  2.7062 (1.9954)  Acc@1: 42.1875 (54.9288)  Acc@5: 68.9453 (80.1011)
    01:25:42 Test: [  97/97]  Time: 0.038 (0.134)  Loss:  1.8312 (2.2331)  Acc@1: 62.5000 (51.2920)  Acc@5: 81.2500 (75.7080)
    01:25:42 Current checkpoints:
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-9.pth.tar', 51.29199997436523)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-8.pth.tar', 50.051999975585936)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-7.pth.tar', 47.99999998657226)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-6.pth.tar', 45.50999999023438)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-5.pth.tar', 42.39999998779297)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-4.pth.tar', 38.24799999511719)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-3.pth.tar', 31.925999998779297)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-2.pth.tar', 23.880000003662108)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-1.pth.tar', 16.486000005493164)
    01:25:42  ('./output/train/20220526-045239-resnet50-224/checkpoint-0.pth.tar', 8.647999998474122)
    
    • Cuda graph (2653.16/s, 53.00)
    01:22:54 Train: 9 [2450/2502 ( 98%)]  Loss: 3.260 (3.25)  Time: 0.193s, 2651.92/s  (0.193s, 2653.07/s)  LR: 1.225e-03  Data: 0.008 (0.008)
    01:23:04 Train: 9 [2500/2502 (100%)]  Loss: 3.289 (3.25)  Time: 0.190s, 2696.94/s  (0.193s, 2653.10/s)  LR: 1.225e-03  Data: 0.007 (0.008)
    01:23:04 Train: 9 [2501/2502 (100%)]  Loss: 3.100 (3.25)  Time: 0.183s, 2793.42/s  (0.193s, 2653.16/s)  LR: 1.225e-03  Data: 0.000 (0.008)
    01:23:04 Distributing BatchNorm running means and vars
    01:23:05 Test: [   0/97]  Time: 0.705 (0.705)  Loss:  1.3437 (1.3437)  Acc@1: 68.7500 (68.7500)  Acc@5: 90.8203 (90.8203)
    01:23:12 Test: [  50/97]  Time: 0.063 (0.145)  Loss:  2.3391 (1.8248)  Acc@1: 50.7812 (57.0887)  Acc@5: 72.4609 (81.9164)
    01:23:17 Test: [  97/97]  Time: 0.039 (0.136)  Loss:  1.8000 (2.0862)  Acc@1: 60.1190 (53.0000)  Acc@5: 80.0595 (77.4340)
    01:23:18 Current checkpoints:
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-9.pth.tar', 52.99999998413086)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-8.pth.tar', 52.142000015869144)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-7.pth.tar', 48.62399998413086)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-5.pth.tar', 44.70999998413086)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-6.pth.tar', 44.62599999023438)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-4.pth.tar', 39.216000004882815)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-3.pth.tar', 31.48200001953125)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-2.pth.tar', 25.087999985351562)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-1.pth.tar', 17.496000009155274)
    01:23:18  ('./output/train/20220526-061824-resnet50-224/checkpoint-0.pth.tar', 8.164000000305176)
    
    • TorchScript + nvfuser (2581.52/s, 51.19)
    01:25:04 Train: 9 [2450/2502 ( 98%)]  Loss: 3.466 (3.40)  Time: 0.200s, 2559.94/s  (0.198s, 2581.35/s)  LR: 1.225e-03  Data: 0.009 (0.010)
    01:25:13 Train: 9 [2500/2502 (100%)]  Loss: 3.397 (3.40)  Time: 0.196s, 2608.01/s  (0.198s, 2581.47/s)  LR: 1.225e-03  Data: 0.007 (0.010)
    01:25:14 Train: 9 [2501/2502 (100%)]  Loss: 3.272 (3.40)  Time: 0.189s, 2715.12/s  (0.198s, 2581.52/s)  LR: 1.225e-03  Data: 0.000 (0.010)
    01:25:14 Distributing BatchNorm running means and vars
    01:25:14 Test: [   0/97]  Time: 0.712 (0.712)  Loss:  1.5368 (1.5368)  Acc@1: 65.2344 (65.2344)  Acc@5: 88.0859 (88.0859)
    01:25:21 Test: [  50/97]  Time: 0.064 (0.145)  Loss:  2.6455 (1.9945)  Acc@1: 43.9453 (54.8598)  Acc@5: 69.1406 (79.9709)
    01:25:27 Test: [  97/97]  Time: 0.038 (0.137)  Loss:  1.8332 (2.2296)  Acc@1: 62.5000 (51.1940)  Acc@5: 81.2500 (75.6200)
    01:25:28 Current checkpoints:
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-9.pth.tar', 51.193999974365234)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-8.pth.tar', 50.3519999987793)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-7.pth.tar', 48.171999978027344)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-6.pth.tar', 45.637999987792966)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-5.pth.tar', 42.591999981689455)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-4.pth.tar', 38.05400000732422)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-3.pth.tar', 32.55400001953125)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-2.pth.tar', 25.975999987792967)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-1.pth.tar', 16.724000008544923)
    01:25:28  ('./output/train/20220526-090506-resnet50-224/checkpoint-0.pth.tar', 6.382000001678467)
    
    • AOT_autograd + nvfuser (2687.08/s, 51.24)
    01:22:50 Train: 9 [2450/2502 ( 98%)]  Loss: 3.415 (3.39)  Time: 0.188s, 2725.32/s  (0.191s, 2686.74/s)  LR: 1.225e-03  Data: 0.008 (0.009)
    01:22:59 Train: 9 [2500/2502 (100%)]  Loss: 3.462 (3.40)  Time: 0.187s, 2738.70/s  (0.191s, 2687.01/s)  LR: 1.225e-03  Data: 0.007 (0.008)
    01:22:59 Train: 9 [2501/2502 (100%)]  Loss: 3.279 (3.39)  Time: 0.179s, 2866.71/s  (0.191s, 2687.08/s)  LR: 1.225e-03  Data: 0.000 (0.008)
    01:22:59 Distributing BatchNorm running means and vars
    01:23:00 Test: [   0/97]  Time: 0.688 (0.688)  Loss:  1.4918 (1.4918)  Acc@1: 68.1641 (68.1641)  Acc@5: 88.0859 (88.0859)
    01:23:07 Test: [  50/97]  Time: 0.060 (0.145)  Loss:  2.6738 (1.9861)  Acc@1: 43.9453 (55.0666)  Acc@5: 68.5547 (80.0743)
    01:23:13 Test: [  97/97]  Time: 0.033 (0.136)  Loss:  1.8924 (2.2179)  Acc@1: 61.3095 (51.2440)  Acc@5: 80.3571 (76.0020)
    01:23:13 Current checkpoints:
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-9.pth.tar', 51.243999979248045)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-8.pth.tar', 50.33599998779297)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-7.pth.tar', 48.427999990234376)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-6.pth.tar', 45.630000009765624)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-5.pth.tar', 42.21799999633789)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-4.pth.tar', 38.02800000366211)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-3.pth.tar', 32.29200001708984)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-2.pth.tar', 25.220000002441406)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-1.pth.tar', 16.86400001159668)
    01:23:13  ('./output/train/20220526-074145-resnet50-224/checkpoint-0.pth.tar', 8.24599999847412)
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment