Some preliminary training results with cuda graph: training ResNet50 with FP32 + contiguous for 10 epochs.
-
A100x8 (bs=128)
- Eager (6499.87/s, 38.21)
00:34:26 Train: 9 [1150/1251 ( 92%)] Loss: 3.974 (3.99) Time: 0.156s, 6560.68/s (0.158s, 6497.20/s) LR: 1.225e-03 Data: 0.008 (0.008) 00:34:34 Train: 9 [1200/1251 ( 96%)] Loss: 3.920 (3.98) Time: 0.156s, 6561.68/s (0.158s, 6498.66/s) LR: 1.225e-03 Data: 0.008 (0.008) 00:34:42 Train: 9 [1250/1251 (100%)] Loss: 3.985 (3.98) Time: 0.148s, 6925.95/s (0.158s, 6499.87/s) LR: 1.225e-03 Data: 0.000 (0.008) 00:34:42 Distributing BatchNorm running means and vars 00:34:43 Test: [ 0/48] Time: 0.816 (0.816) Loss: 2.0477 (2.0477) Acc@1: 56.5430 (56.5430) Acc@5: 81.6406 (81.6406) 00:34:51 Test: [ 48/48] Time: 0.038 (0.177) Loss: 1.9639 (2.9433) Acc@1: 62.3821 (38.2100) Acc@5: 79.7170 (63.8240) 00:34:51 Current checkpoints: 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-9.pth.tar', 38.21000001342773) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-8.pth.tar', 37.536000040283206) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-7.pth.tar', 34.857999938964845) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-6.pth.tar', 32.905999956054686) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-5.pth.tar', 29.288000021972657) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-4.pth.tar', 24.774000018310545) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-3.pth.tar', 21.122) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-2.pth.tar', 15.742000043945312) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-1.pth.tar', 10.913999971923829) 00:34:51 ('./output/train/20220526-033005-resnet50-224/checkpoint-0.pth.tar', 6.547999993896484)
- Cuda graph (6608.18/s, 43.19)
00:33:55 Train: 9 [1150/1251 ( 92%)] Loss: 3.615 (3.61) Time: 0.159s, 6444.43/s (0.155s, 6605.29/s) LR: 1.225e-03 Data: 0.008 (0.010) 00:34:03 Train: 9 [1200/1251 ( 96%)] Loss: 3.517 (3.61) Time: 0.153s, 6673.32/s (0.155s, 6606.20/s) LR: 1.225e-03 Data: 0.009 (0.010) 00:34:11 Train: 9 [1250/1251 (100%)] Loss: 3.516 (3.61) Time: 0.142s, 7218.84/s (0.155s, 6608.18/s) LR: 1.225e-03 Data: 0.000 (0.010) 00:34:11 Distributing BatchNorm running means and vars 00:34:12 Test: [ 0/48] Time: 0.826 (0.826) Loss: 1.7451 (1.7451) Acc@1: 62.6953 (62.6953) Acc@5: 84.5703 (84.5703) 00:34:20 Test: [ 48/48] Time: 0.039 (0.182) Loss: 1.7677 (2.6343) Acc@1: 64.1509 (43.1880) Acc@5: 81.7217 (69.2660) 00:34:20 Current checkpoints: 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-9.pth.tar', 43.18799994140625) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-8.pth.tar', 42.345999938964844) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-7.pth.tar', 37.76399999389648) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-6.pth.tar', 35.066000063476565) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-5.pth.tar', 32.66799999389649) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-4.pth.tar', 26.705999991455077) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-3.pth.tar', 22.881999982910155) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-2.pth.tar', 16.074000025634767) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-1.pth.tar', 12.040000034179688) 00:34:20 ('./output/train/20220526-040504-resnet50-224/checkpoint-0.pth.tar', 6.153999999389648)
- TorchScript + nvfuser (6453.39/s, 38.77)
00:34:42 Train: 9 [1150/1251 ( 92%)] Loss: 3.951 (3.97) Time: 0.160s, 6390.97/s (0.159s, 6449.55/s) LR: 1.225e-03 Data: 0.008 (0.009) 00:34:50 Train: 9 [1200/1251 ( 96%)] Loss: 3.893 (3.97) Time: 0.157s, 6529.63/s (0.159s, 6451.75/s) LR: 1.225e-03 Data: 0.009 (0.009) 00:34:58 Train: 9 [1250/1251 (100%)] Loss: 3.945 (3.96) Time: 0.148s, 6907.22/s (0.159s, 6453.39/s) LR: 1.225e-03 Data: 0.000 (0.009) 00:34:58 Distributing BatchNorm running means and vars 00:34:59 Test: [ 0/48] Time: 0.823 (0.823) Loss: 2.0276 (2.0276) Acc@1: 57.4219 (57.4219) Acc@5: 82.4219 (82.4219) 00:35:07 Test: [ 48/48] Time: 0.037 (0.175) Loss: 1.9413 (2.9267) Acc@1: 61.9104 (38.7720) Acc@5: 79.3632 (64.3180) 00:35:07 Current checkpoints: 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-9.pth.tar', 38.77200000244141) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-8.pth.tar', 38.05999997802734) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-7.pth.tar', 35.45200008666992) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-6.pth.tar', 32.83400010131836) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-5.pth.tar', 29.510000010986328) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-4.pth.tar', 25.98999998779297) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-3.pth.tar', 20.665999975585937) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-2.pth.tar', 14.357999983520507) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-1.pth.tar', 11.40200004638672) 00:35:07 ('./output/train/20220526-051328-resnet50-224/checkpoint-0.pth.tar', 6.669999987792969)
- AOT_autograd + nvfuser (6887.96/s, 38.33)
00:33:29 Train: 9 [1150/1251 ( 92%)] Loss: 3.975 (3.98) Time: 0.143s, 7150.03/s (0.149s, 6881.95/s) LR: 1.225e-03 Data: 0.008 (0.010) 00:33:37 Train: 9 [1200/1251 ( 96%)] Loss: 3.922 (3.98) Time: 0.144s, 7131.44/s (0.149s, 6885.85/s) LR: 1.225e-03 Data: 0.007 (0.010) 00:33:44 Train: 9 [1250/1251 (100%)] Loss: 3.967 (3.98) Time: 0.133s, 7699.01/s (0.149s, 6887.96/s) LR: 1.225e-03 Data: 0.000 (0.010) 00:33:44 Distributing BatchNorm running means and vars 00:33:45 Test: [ 0/48] Time: 0.810 (0.810) Loss: 2.0058 (2.0058) Acc@1: 57.5195 (57.5195) Acc@5: 82.4219 (82.4219) 00:33:53 Test: [ 48/48] Time: 0.028 (0.177) Loss: 1.9564 (2.9429) Acc@1: 61.3208 (38.3280) Acc@5: 79.5991 (64.0720) 00:33:53 Current checkpoints: 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-9.pth.tar', 38.32799994018555) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-8.pth.tar', 37.32000004516602) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-7.pth.tar', 35.39800006835937) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-6.pth.tar', 32.64800006958008) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-5.pth.tar', 29.128000045166015) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-4.pth.tar', 25.896000029296875) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-3.pth.tar', 21.98600001586914) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-2.pth.tar', 14.59600000732422) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-1.pth.tar', 10.74800001953125) 00:33:53 ('./output/train/20220526-043928-resnet50-224/checkpoint-0.pth.tar', 6.654000009765625)
-
A100x8 (bs=32)
- Eager (5130.81/s, 59.50)
00:43:45 Train: 9 [4950/5004 ( 99%)] Loss: 2.912 (3.01) Time: 0.049s, 5185.56/s (0.050s, 5133.92/s) LR: 1.225e-03 Data: 0.002 (0.003) 00:43:47 Train: 9 [5000/5004 (100%)] Loss: 3.229 (3.02) Time: 0.049s, 5235.87/s (0.050s, 5130.71/s) LR: 1.225e-03 Data: 0.002 (0.003) 00:43:48 Train: 9 [5003/5004 (100%)] Loss: 2.981 (3.02) Time: 0.047s, 5491.47/s (0.050s, 5130.81/s) LR: 1.225e-03 Data: 0.000 (0.003) 00:43:48 Distributing BatchNorm running means and vars 00:43:48 Test: [ 0/195] Time: 0.270 (0.270) Loss: 1.2109 (1.2109) Acc@1: 73.0469 (73.0469) Acc@5: 90.2344 (90.2344) 00:43:50 Test: [ 50/195] Time: 0.016 (0.044) Loss: 1.0599 (1.4276) Acc@1: 76.5625 (66.4139) Acc@5: 92.1875 (88.1510) 00:43:52 Test: [ 100/195] Time: 0.017 (0.045) Loss: 2.5944 (1.5885) Acc@1: 39.0625 (63.2851) Acc@5: 69.5312 (85.7673) 00:43:54 Test: [ 150/195] Time: 0.041 (0.044) Loss: 2.1700 (1.7593) Acc@1: 60.1562 (60.3684) Acc@5: 75.0000 (82.7685) 00:43:56 Test: [ 195/195] Time: 0.007 (0.043) Loss: 2.8764 (1.8067) Acc@1: 35.0000 (59.4980) Acc@5: 66.2500 (82.0880) 00:43:56 Current checkpoints: 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-9.pth.tar', 59.498) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-8.pth.tar', 58.134) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-7.pth.tar', 56.354) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-6.pth.tar', 53.758) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-5.pth.tar', 51.068) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-4.pth.tar', 46.434) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-3.pth.tar', 42.02) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-2.pth.tar', 34.928) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-1.pth.tar', 23.408) 00:43:56 ('./output/train/20220526-003803-resnet50-224/checkpoint-0.pth.tar', 11.81)
- Cuda graph (5833.69/s, 57.36)
00:38:37 Train: 9 [4950/5004 ( 99%)] Loss: 2.909 (3.01) Time: 0.043s, 5991.50/s (0.044s, 5833.30/s) LR: 1.225e-03 Data: 0.002 (0.003) 00:38:39 Train: 9 [5000/5004 (100%)] Loss: 3.171 (3.01) Time: 0.043s, 5973.89/s (0.044s, 5833.54/s) LR: 1.225e-03 Data: 0.002 (0.003) 00:38:39 Train: 9 [5003/5004 (100%)] Loss: 2.864 (3.01) Time: 0.040s, 6353.20/s (0.044s, 5833.69/s) LR: 1.225e-03 Data: 0.000 (0.003) 00:38:39 Distributing BatchNorm running means and vars 00:38:39 Test: [ 0/195] Time: 0.305 (0.305) Loss: 1.1444 (1.1444) Acc@1: 70.7031 (70.7031) Acc@5: 92.1875 (92.1875) 00:38:41 Test: [ 50/195] Time: 0.016 (0.045) Loss: 1.1188 (1.5279) Acc@1: 76.1719 (63.4191) Acc@5: 91.0156 (86.7264) 00:38:43 Test: [ 100/195] Time: 0.016 (0.045) Loss: 2.3071 (1.6706) Acc@1: 52.3438 (61.0767) Acc@5: 74.2188 (84.5490) 00:38:46 Test: [ 150/195] Time: 0.018 (0.046) Loss: 1.9372 (1.8478) Acc@1: 65.6250 (58.1100) Acc@5: 80.8594 (81.4156) 00:38:48 Test: [ 195/195] Time: 0.007 (0.045) Loss: 2.9765 (1.8927) Acc@1: 33.7500 (57.3600) Acc@5: 62.5000 (80.6320) 00:38:48 Current checkpoints: 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-9.pth.tar', 57.36) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-8.pth.tar', 57.068) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-7.pth.tar', 55.386) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-6.pth.tar', 51.982) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-5.pth.tar', 50.092) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-4.pth.tar', 46.658) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-3.pth.tar', 41.436) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-2.pth.tar', 34.518) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-1.pth.tar', 24.962) 00:38:48 ('./output/train/20220526-012207-resnet50-224/checkpoint-0.pth.tar', 11.35)
- TorchScript + nvfuser (4986.46/s, 59.50)
00:44:45 Train: 9 [4950/5004 ( 99%)] Loss: 2.912 (3.01) Time: 0.052s, 4969.42/s (0.051s, 4986.28/s) LR: 1.225e-03 Data: 0.003 (0.003) 00:44:48 Train: 9 [5000/5004 (100%)] Loss: 3.229 (3.02) Time: 0.050s, 5086.10/s (0.051s, 4986.34/s) LR: 1.225e-03 Data: 0.003 (0.003) 00:44:48 Train: 9 [5003/5004 (100%)] Loss: 2.981 (3.02) Time: 0.047s, 5419.52/s (0.051s, 4986.46/s) LR: 1.225e-03 Data: 0.000 (0.003) 00:44:48 Distributing BatchNorm running means and vars 00:44:48 Test: [ 0/195] Time: 0.286 (0.286) Loss: 1.2109 (1.2109) Acc@1: 73.0469 (73.0469) Acc@5: 90.2344 (90.2344) 00:44:50 Test: [ 50/195] Time: 0.018 (0.045) Loss: 1.0599 (1.4276) Acc@1: 76.5625 (66.4139) Acc@5: 92.1875 (88.1510) 00:44:52 Test: [ 100/195] Time: 0.093 (0.046) Loss: 2.5944 (1.5885) Acc@1: 39.0625 (63.2851) Acc@5: 69.5312 (85.7673) 00:44:54 Test: [ 150/195] Time: 0.049 (0.045) Loss: 2.1700 (1.7593) Acc@1: 60.1562 (60.3684) Acc@5: 75.0000 (82.7685) 00:44:56 Test: [ 195/195] Time: 0.006 (0.044) Loss: 2.8763 (1.8067) Acc@1: 35.0000 (59.4980) Acc@5: 66.2500 (82.0880) 00:44:57 Current checkpoints: 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-9.pth.tar', 59.498) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-8.pth.tar', 58.134) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-7.pth.tar', 56.354) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-6.pth.tar', 53.758) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-5.pth.tar', 51.068) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-4.pth.tar', 46.434) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-3.pth.tar', 42.02) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-2.pth.tar', 34.928) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-1.pth.tar', 23.408) 00:44:57 ('./output/train/20220526-024501-resnet50-224/checkpoint-0.pth.tar', 11.81)
- AOT_autograd + nvfuser (5228.49/s, 59.39)
00:43:40 Train: 9 [4950/5004 ( 99%)] Loss: 2.874 (3.01) Time: 0.048s, 5374.41/s (0.049s, 5227.73/s) LR: 1.225e-03 Data: 0.003 (0.003) 00:43:43 Train: 9 [5000/5004 (100%)] Loss: 3.190 (3.01) Time: 0.050s, 5129.23/s (0.049s, 5228.31/s) LR: 1.225e-03 Data: 0.003 (0.003) 00:43:43 Train: 9 [5003/5004 (100%)] Loss: 3.022 (3.01) Time: 0.044s, 5769.52/s (0.049s, 5228.49/s) LR: 1.225e-03 Data: 0.000 (0.003) 00:43:43 Distributing BatchNorm running means and vars 00:43:43 Test: [ 0/195] Time: 0.280 (0.280) Loss: 1.1498 (1.1498) Acc@1: 75.7812 (75.7812) Acc@5: 91.0156 (91.0156) 00:43:45 Test: [ 50/195] Time: 0.034 (0.044) Loss: 1.0834 (1.4250) Acc@1: 75.3906 (66.1075) Acc@5: 92.1875 (88.1357) 00:43:48 Test: [ 100/195] Time: 0.014 (0.048) Loss: 2.6195 (1.5827) Acc@1: 39.4531 (63.2348) Acc@5: 69.1406 (85.8137) 00:43:50 Test: [ 150/195] Time: 0.016 (0.047) Loss: 2.1513 (1.7496) Acc@1: 59.7656 (60.2727) Acc@5: 76.1719 (83.0220) 00:43:52 Test: [ 195/195] Time: 0.010 (0.046) Loss: 3.1136 (1.7934) Acc@1: 31.2500 (59.3920) Acc@5: 60.0000 (82.2660) 00:43:52 Current checkpoints: 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-9.pth.tar', 59.392) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-8.pth.tar', 58.622) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-7.pth.tar', 56.45) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-6.pth.tar', 54.104) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-5.pth.tar', 51.45) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-4.pth.tar', 46.402) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-3.pth.tar', 40.386) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-2.pth.tar', 34.326) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-1.pth.tar', 21.562) 00:43:52 ('./output/train/20220526-020101-resnet50-224/checkpoint-0.pth.tar', 10.714)
-
V100x8 (bs=64)
- Eager (2573.18/s, 51.29)
01:25:18 Train: 9 [2450/2502 ( 98%)] Loss: 3.451 (3.40) Time: 0.198s, 2590.48/s (0.199s, 2572.97/s) LR: 1.225e-03 Data: 0.010 (0.009) 01:25:28 Train: 9 [2500/2502 (100%)] Loss: 3.427 (3.40) Time: 0.196s, 2618.35/s (0.199s, 2573.12/s) LR: 1.225e-03 Data: 0.007 (0.009) 01:25:28 Train: 9 [2501/2502 (100%)] Loss: 3.276 (3.40) Time: 0.187s, 2731.49/s (0.199s, 2573.18/s) LR: 1.225e-03 Data: 0.000 (0.009) 01:25:28 Distributing BatchNorm running means and vars 01:25:29 Test: [ 0/97] Time: 0.679 (0.679) Loss: 1.4868 (1.4868) Acc@1: 66.6016 (66.6016) Acc@5: 89.2578 (89.2578) 01:25:36 Test: [ 50/97] Time: 0.064 (0.146) Loss: 2.7062 (1.9954) Acc@1: 42.1875 (54.9288) Acc@5: 68.9453 (80.1011) 01:25:42 Test: [ 97/97] Time: 0.038 (0.134) Loss: 1.8312 (2.2331) Acc@1: 62.5000 (51.2920) Acc@5: 81.2500 (75.7080) 01:25:42 Current checkpoints: 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-9.pth.tar', 51.29199997436523) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-8.pth.tar', 50.051999975585936) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-7.pth.tar', 47.99999998657226) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-6.pth.tar', 45.50999999023438) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-5.pth.tar', 42.39999998779297) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-4.pth.tar', 38.24799999511719) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-3.pth.tar', 31.925999998779297) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-2.pth.tar', 23.880000003662108) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-1.pth.tar', 16.486000005493164) 01:25:42 ('./output/train/20220526-045239-resnet50-224/checkpoint-0.pth.tar', 8.647999998474122)
- Cuda graph (2653.16/s, 53.00)
01:22:54 Train: 9 [2450/2502 ( 98%)] Loss: 3.260 (3.25) Time: 0.193s, 2651.92/s (0.193s, 2653.07/s) LR: 1.225e-03 Data: 0.008 (0.008) 01:23:04 Train: 9 [2500/2502 (100%)] Loss: 3.289 (3.25) Time: 0.190s, 2696.94/s (0.193s, 2653.10/s) LR: 1.225e-03 Data: 0.007 (0.008) 01:23:04 Train: 9 [2501/2502 (100%)] Loss: 3.100 (3.25) Time: 0.183s, 2793.42/s (0.193s, 2653.16/s) LR: 1.225e-03 Data: 0.000 (0.008) 01:23:04 Distributing BatchNorm running means and vars 01:23:05 Test: [ 0/97] Time: 0.705 (0.705) Loss: 1.3437 (1.3437) Acc@1: 68.7500 (68.7500) Acc@5: 90.8203 (90.8203) 01:23:12 Test: [ 50/97] Time: 0.063 (0.145) Loss: 2.3391 (1.8248) Acc@1: 50.7812 (57.0887) Acc@5: 72.4609 (81.9164) 01:23:17 Test: [ 97/97] Time: 0.039 (0.136) Loss: 1.8000 (2.0862) Acc@1: 60.1190 (53.0000) Acc@5: 80.0595 (77.4340) 01:23:18 Current checkpoints: 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-9.pth.tar', 52.99999998413086) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-8.pth.tar', 52.142000015869144) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-7.pth.tar', 48.62399998413086) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-5.pth.tar', 44.70999998413086) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-6.pth.tar', 44.62599999023438) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-4.pth.tar', 39.216000004882815) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-3.pth.tar', 31.48200001953125) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-2.pth.tar', 25.087999985351562) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-1.pth.tar', 17.496000009155274) 01:23:18 ('./output/train/20220526-061824-resnet50-224/checkpoint-0.pth.tar', 8.164000000305176)
- TorchScript + nvfuser (2581.52/s, 51.19)
01:25:04 Train: 9 [2450/2502 ( 98%)] Loss: 3.466 (3.40) Time: 0.200s, 2559.94/s (0.198s, 2581.35/s) LR: 1.225e-03 Data: 0.009 (0.010) 01:25:13 Train: 9 [2500/2502 (100%)] Loss: 3.397 (3.40) Time: 0.196s, 2608.01/s (0.198s, 2581.47/s) LR: 1.225e-03 Data: 0.007 (0.010) 01:25:14 Train: 9 [2501/2502 (100%)] Loss: 3.272 (3.40) Time: 0.189s, 2715.12/s (0.198s, 2581.52/s) LR: 1.225e-03 Data: 0.000 (0.010) 01:25:14 Distributing BatchNorm running means and vars 01:25:14 Test: [ 0/97] Time: 0.712 (0.712) Loss: 1.5368 (1.5368) Acc@1: 65.2344 (65.2344) Acc@5: 88.0859 (88.0859) 01:25:21 Test: [ 50/97] Time: 0.064 (0.145) Loss: 2.6455 (1.9945) Acc@1: 43.9453 (54.8598) Acc@5: 69.1406 (79.9709) 01:25:27 Test: [ 97/97] Time: 0.038 (0.137) Loss: 1.8332 (2.2296) Acc@1: 62.5000 (51.1940) Acc@5: 81.2500 (75.6200) 01:25:28 Current checkpoints: 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-9.pth.tar', 51.193999974365234) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-8.pth.tar', 50.3519999987793) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-7.pth.tar', 48.171999978027344) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-6.pth.tar', 45.637999987792966) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-5.pth.tar', 42.591999981689455) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-4.pth.tar', 38.05400000732422) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-3.pth.tar', 32.55400001953125) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-2.pth.tar', 25.975999987792967) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-1.pth.tar', 16.724000008544923) 01:25:28 ('./output/train/20220526-090506-resnet50-224/checkpoint-0.pth.tar', 6.382000001678467)
- AOT_autograd + nvfuser (2687.08/s, 51.24)
01:22:50 Train: 9 [2450/2502 ( 98%)] Loss: 3.415 (3.39) Time: 0.188s, 2725.32/s (0.191s, 2686.74/s) LR: 1.225e-03 Data: 0.008 (0.009) 01:22:59 Train: 9 [2500/2502 (100%)] Loss: 3.462 (3.40) Time: 0.187s, 2738.70/s (0.191s, 2687.01/s) LR: 1.225e-03 Data: 0.007 (0.008) 01:22:59 Train: 9 [2501/2502 (100%)] Loss: 3.279 (3.39) Time: 0.179s, 2866.71/s (0.191s, 2687.08/s) LR: 1.225e-03 Data: 0.000 (0.008) 01:22:59 Distributing BatchNorm running means and vars 01:23:00 Test: [ 0/97] Time: 0.688 (0.688) Loss: 1.4918 (1.4918) Acc@1: 68.1641 (68.1641) Acc@5: 88.0859 (88.0859) 01:23:07 Test: [ 50/97] Time: 0.060 (0.145) Loss: 2.6738 (1.9861) Acc@1: 43.9453 (55.0666) Acc@5: 68.5547 (80.0743) 01:23:13 Test: [ 97/97] Time: 0.033 (0.136) Loss: 1.8924 (2.2179) Acc@1: 61.3095 (51.2440) Acc@5: 80.3571 (76.0020) 01:23:13 Current checkpoints: 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-9.pth.tar', 51.243999979248045) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-8.pth.tar', 50.33599998779297) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-7.pth.tar', 48.427999990234376) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-6.pth.tar', 45.630000009765624) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-5.pth.tar', 42.21799999633789) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-4.pth.tar', 38.02800000366211) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-3.pth.tar', 32.29200001708984) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-2.pth.tar', 25.220000002441406) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-1.pth.tar', 16.86400001159668) 01:23:13 ('./output/train/20220526-074145-resnet50-224/checkpoint-0.pth.tar', 8.24599999847412)