-
-
Save adam-peaston-SC/e5e5f3dbd1469bf8d7bd0e8d41f471ed to your computer and use it in GitHub Desktop.
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 20:29:48 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 20:29:48 ] Completed importing Timer 0.020 ms, 0.00 s total | |
[ 2023-10-07 20:29:48 ] Completed importing everything else 645.807 ms, 0.65 s total | |
[ 2023-10-07 20:29:48 ] Completed defined other functions 0.021 ms, 0.65 s total | |
| distributed init (rank 5): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 0): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 20:29:51 ] Completed main preliminaries 2,672.082 ms, 3.32 s total | |
loading annotations into memory... | |
Done (t=12.39s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 20:30:06 ] Completed loading data 14,458.412 ms, 17.78 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 20:30:06 ] Completed creating data samplers 102.056 ms, 17.88 s total | |
[ 2023-10-07 20:30:06 ] Completed creating data loaders 0.209 ms, 17.88 s total | |
[ 2023-10-07 20:30:07 ] Completed creating model and .to(device) 1,784.297 ms, 19.66 s total | |
[ 2023-10-07 20:30:08 ] Completed preparing model for distributed training 391.627 ms, 20.05 s total | |
[ 2023-10-07 20:30:08 ] Completed optimizer and scaler 0.568 ms, 20.06 s total | |
[ 2023-10-07 20:30:08 ] Completed learning rate schedulers 0.123 ms, 20.06 s total | |
[ 2023-10-07 20:30:09 ] Completed init coco evaluator 966.697 ms, 21.02 s total | |
RESUMING FROM PREVIOUS JOB /mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc | |
[ 2023-10-07 20:30:10 ] Completed retrieving checkpoint 953.051 ms, 21.97 s total | |
EPOCH :: 13 | |
[ 2023-10-07 20:30:10 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 20:30:10 ] Completed training preliminaries 0.831 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 178 | |
[ 2023-10-07 20:30:10 ] Completed Epoch: 13 batch 178: moving batch data to device 263.850 ms, 0.26 s total | |
[ 2023-10-07 20:30:16 ] Completed Epoch: 13 batch 178: forward pass 5,668.097 ms, 5.93 s total | |
[ 2023-10-07 20:30:16 ] Completed Epoch: 13 batch 178: backward pass 320.484 ms, 6.25 s total | |
[ 2023-10-07 20:30:17 ] Completed Epoch: 13 batch 178: computing loss 919.551 ms, 7.17 s total | |
EPOCH: [13], BATCH: [178/889], loss: 0.353, loss_box_reg: 0.103, loss_classifier: 0.088, loss_mask: 0.126, loss_objectness: 0.012, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 178 | |
[ 2023-10-07 20:30:19 ] Completed saving temp checkpoint 1,665.854 ms, 8.84 s total | |
[ 2023-10-07 20:30:19 ] Completed replacing temp checkpoint with checkpoint 23.998 ms, 8.86 s total | |
[ 2023-10-07 20:30:19 ] Completed Epoch: 13 batch 179: moving batch data to device 18.899 ms, 8.88 s total | |
[ 2023-10-07 20:30:19 ] Completed Epoch: 13 batch 179: forward pass 306.148 ms, 9.19 s total | |
[ 2023-10-07 20:30:19 ] Completed Epoch: 13 batch 179: backward pass 149.391 ms, 9.34 s total | |
[ 2023-10-07 20:30:20 ] Completed Epoch: 13 batch 179: computing loss 1,204.551 ms, 10.54 s total | |
EPOCH: [13], BATCH: [179/889], loss: 0.363, loss_box_reg: 0.106, loss_classifier: 0.092, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 179 | |
[ 2023-10-07 20:30:21 ] Completed saving temp checkpoint 1,004.944 ms, 11.55 s total | |
[ 2023-10-07 20:30:21 ] Completed replacing temp checkpoint with checkpoint 65.612 ms, 11.61 s total | |
[ 2023-10-07 20:30:21 ] Completed Epoch: 13 batch 180: moving batch data to device 21.205 ms, 11.63 s total | |
[ 2023-10-07 20:30:22 ] Completed Epoch: 13 batch 180: forward pass 320.299 ms, 11.95 s total | |
[ 2023-10-07 20:30:22 ] Completed Epoch: 13 batch 180: backward pass 375.374 ms, 12.33 s total | |
[ 2023-10-07 20:30:23 ] Completed Epoch: 13 batch 180: computing loss 926.449 ms, 13.26 s total | |
EPOCH: [13], BATCH: [180/889], loss: 0.418, loss_box_reg: 0.119, loss_classifier: 0.102, loss_mask: 0.141, loss_objectness: 0.020, loss_rpn_box_reg: 0.036 | |
Saving checkpoint at epoch 13 train batch 180 | |
[ 2023-10-07 20:30:24 ] Completed saving temp checkpoint 1,044.228 ms, 14.30 s total | |
[ 2023-10-07 20:30:24 ] Completed replacing temp checkpoint with checkpoint 70.090 ms, 14.37 s total | |
[ 2023-10-07 20:30:24 ] Completed Epoch: 13 batch 181: moving batch data to device 20.761 ms, 14.39 s total | |
[ 2023-10-07 20:30:25 ] Completed Epoch: 13 batch 181: forward pass 333.843 ms, 14.72 s total | |
[ 2023-10-07 20:30:25 ] Completed Epoch: 13 batch 181: backward pass 81.624 ms, 14.81 s total | |
[ 2023-10-07 20:30:26 ] Completed Epoch: 13 batch 181: computing loss 1,816.710 ms, 16.62 s total | |
EPOCH: [13], BATCH: [181/889], loss: 0.402, loss_box_reg: 0.122, loss_classifier: 0.101, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 181 | |
[ 2023-10-07 20:30:28 ] Completed saving temp checkpoint 1,446.053 ms, 18.07 s total | |
[ 2023-10-07 20:30:28 ] Completed replacing temp checkpoint with checkpoint 69.901 ms, 18.14 s total | |
[ 2023-10-07 20:30:28 ] Completed Epoch: 13 batch 182: moving batch data to device 21.792 ms, 18.16 s total | |
[ 2023-10-07 20:30:28 ] Completed Epoch: 13 batch 182: forward pass 313.629 ms, 18.47 s total | |
[ 2023-10-07 20:30:29 ] Completed Epoch: 13 batch 182: backward pass 393.885 ms, 18.87 s total | |
[ 2023-10-07 20:30:30 ] Completed Epoch: 13 batch 182: computing loss 1,555.282 ms, 20.42 s total | |
EPOCH: [13], BATCH: [182/889], loss: 0.352, loss_box_reg: 0.104, loss_classifier: 0.086, loss_mask: 0.124, loss_objectness: 0.017, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 182 | |
[ 2023-10-07 20:30:31 ] Completed saving temp checkpoint 986.916 ms, 21.41 s total | |
[ 2023-10-07 20:30:31 ] Completed replacing temp checkpoint with checkpoint 67.232 ms, 21.48 s total | |
[ 2023-10-07 20:30:31 ] Completed Epoch: 13 batch 183: moving batch data to device 19.967 ms, 21.50 s total | |
[ 2023-10-07 20:30:32 ] Completed Epoch: 13 batch 183: forward pass 363.909 ms, 21.86 s total | |
[ 2023-10-07 20:30:32 ] Completed Epoch: 13 batch 183: backward pass 68.638 ms, 21.93 s total | |
[ 2023-10-07 20:30:33 ] Completed Epoch: 13 batch 183: computing loss 1,089.971 ms, 23.02 s total | |
EPOCH: [13], BATCH: [183/889], loss: 0.376, loss_box_reg: 0.113, loss_classifier: 0.088, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 183 | |
[ 2023-10-07 20:30:34 ] Completed saving temp checkpoint 1,205.219 ms, 24.23 s total | |
[ 2023-10-07 20:30:34 ] Completed replacing temp checkpoint with checkpoint 60.021 ms, 24.29 s total | |
[ 2023-10-07 20:30:34 ] Completed Epoch: 13 batch 184: moving batch data to device 22.305 ms, 24.31 s total | |
[ 2023-10-07 20:30:34 ] Completed Epoch: 13 batch 184: forward pass 327.497 ms, 24.64 s total | |
[ 2023-10-07 20:30:35 ] Completed Epoch: 13 batch 184: backward pass 87.571 ms, 24.72 s total | |
[ 2023-10-07 20:30:36 ] Completed Epoch: 13 batch 184: computing loss 1,211.946 ms, 25.93 s total | |
EPOCH: [13], BATCH: [184/889], loss: 0.419, loss_box_reg: 0.125, loss_classifier: 0.106, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 184 | |
[ 2023-10-07 20:30:37 ] Completed saving temp checkpoint 1,157.785 ms, 27.09 s total | |
[ 2023-10-07 20:30:37 ] Completed replacing temp checkpoint with checkpoint 42.763 ms, 27.14 s total | |
[ 2023-10-07 20:30:37 ] Completed Epoch: 13 batch 185: moving batch data to device 17.957 ms, 27.15 s total | |
[ 2023-10-07 20:30:37 ] Completed Epoch: 13 batch 185: forward pass 334.698 ms, 27.49 s total | |
[ 2023-10-07 20:30:37 ] Completed Epoch: 13 batch 185: backward pass 68.048 ms, 27.56 s total | |
[ 2023-10-07 20:30:39 ] Completed Epoch: 13 batch 185: computing loss 1,239.720 ms, 28.80 s total | |
EPOCH: [13], BATCH: [185/889], loss: 0.380, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.126, loss_objectness: 0.019, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 185 | |
[ 2023-10-07 20:30:39 ] Completed saving temp checkpoint 903.262 ms, 29.70 s total | |
[ 2023-10-07 20:30:40 ] Completed replacing temp checkpoint with checkpoint 58.779 ms, 29.76 s total | |
[ 2023-10-07 20:30:40 ] Completed Epoch: 13 batch 186: moving batch data to device 21.418 ms, 29.78 s total | |
[ 2023-10-07 20:30:40 ] Completed Epoch: 13 batch 186: forward pass 326.448 ms, 30.11 s total | |
[ 2023-10-07 20:30:40 ] Completed Epoch: 13 batch 186: backward pass 74.928 ms, 30.18 s total | |
[ 2023-10-07 20:30:41 ] Completed Epoch: 13 batch 186: computing loss 1,327.581 ms, 31.51 s total | |
EPOCH: [13], BATCH: [186/889], loss: 0.363, loss_box_reg: 0.110, loss_classifier: 0.088, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 186 | |
[ 2023-10-07 20:30:42 ] Completed saving temp checkpoint 1,050.988 ms, 32.56 s total | |
[ 2023-10-07 20:30:42 ] Completed replacing temp checkpoint with checkpoint 65.173 ms, 32.62 s total | |
[ 2023-10-07 20:30:42 ] Completed Epoch: 13 batch 187: moving batch data to device 24.124 ms, 32.65 s total | |
[ 2023-10-07 20:30:43 ] Completed Epoch: 13 batch 187: forward pass 326.560 ms, 32.97 s total | |
[ 2023-10-07 20:30:43 ] Completed Epoch: 13 batch 187: backward pass 42.582 ms, 33.02 s total | |
[ 2023-10-07 20:30:44 ] Completed Epoch: 13 batch 187: computing loss 1,377.614 ms, 34.39 s total | |
EPOCH: [13], BATCH: [187/889], loss: 0.379, loss_box_reg: 0.115, loss_classifier: 0.096, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 187 | |
[ 2023-10-07 20:30:46 ] Completed saving temp checkpoint 1,495.497 ms, 35.89 s total | |
[ 2023-10-07 20:30:46 ] Completed replacing temp checkpoint with checkpoint 73.748 ms, 35.96 s total | |
[ 2023-10-07 20:30:46 ] Completed Epoch: 13 batch 188: moving batch data to device 22.204 ms, 35.99 s total | |
[ 2023-10-07 20:30:46 ] Completed Epoch: 13 batch 188: forward pass 302.889 ms, 36.29 s total | |
[ 2023-10-07 20:30:46 ] Completed Epoch: 13 batch 188: backward pass 80.070 ms, 36.37 s total | |
[ 2023-10-07 20:30:47 ] Completed Epoch: 13 batch 188: computing loss 1,234.838 ms, 37.60 s total | |
EPOCH: [13], BATCH: [188/889], loss: 0.432, loss_box_reg: 0.129, loss_classifier: 0.109, loss_mask: 0.143, loss_objectness: 0.018, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 188 | |
[ 2023-10-07 20:30:49 ] Completed saving temp checkpoint 1,713.400 ms, 39.32 s total | |
[ 2023-10-07 20:30:49 ] Completed replacing temp checkpoint with checkpoint 69.660 ms, 39.39 s total | |
[ 2023-10-07 20:30:49 ] Completed Epoch: 13 batch 189: moving batch data to device 23.996 ms, 39.41 s total | |
[ 2023-10-07 20:30:49 ] Completed Epoch: 13 batch 189: forward pass 292.397 ms, 39.70 s total | |
[ 2023-10-07 20:30:50 ] Completed Epoch: 13 batch 189: backward pass 89.729 ms, 39.79 s total | |
[ 2023-10-07 20:30:51 ] Completed Epoch: 13 batch 189: computing loss 1,208.920 ms, 41.00 s total | |
EPOCH: [13], BATCH: [189/889], loss: 0.391, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 189 | |
[ 2023-10-07 20:30:52 ] Completed saving temp checkpoint 1,116.745 ms, 42.12 s total | |
[ 2023-10-07 20:30:52 ] Completed replacing temp checkpoint with checkpoint 48.592 ms, 42.17 s total | |
[ 2023-10-07 20:30:52 ] Completed Epoch: 13 batch 190: moving batch data to device 22.751 ms, 42.19 s total | |
[ 2023-10-07 20:30:52 ] Completed Epoch: 13 batch 190: forward pass 325.562 ms, 42.52 s total | |
[ 2023-10-07 20:30:52 ] Completed Epoch: 13 batch 190: backward pass 75.002 ms, 42.59 s total | |
[ 2023-10-07 20:30:54 ] Completed Epoch: 13 batch 190: computing loss 1,248.879 ms, 43.84 s total | |
EPOCH: [13], BATCH: [190/889], loss: 0.362, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.013, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 190 | |
[ 2023-10-07 20:30:55 ] Completed saving temp checkpoint 1,150.627 ms, 44.99 s total | |
[ 2023-10-07 20:30:55 ] Completed replacing temp checkpoint with checkpoint 43.850 ms, 45.03 s total | |
[ 2023-10-07 20:30:55 ] Completed Epoch: 13 batch 191: moving batch data to device 22.360 ms, 45.06 s total | |
[ 2023-10-07 20:30:55 ] Completed Epoch: 13 batch 191: forward pass 310.296 ms, 45.37 s total | |
[ 2023-10-07 20:30:55 ] Completed Epoch: 13 batch 191: backward pass 71.113 ms, 45.44 s total | |
[ 2023-10-07 20:30:57 ] Completed Epoch: 13 batch 191: computing loss 1,687.658 ms, 47.13 s total | |
EPOCH: [13], BATCH: [191/889], loss: 0.388, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 191 | |
[ 2023-10-07 20:30:59 ] Completed saving temp checkpoint 1,601.125 ms, 48.73 s total | |
[ 2023-10-07 20:30:59 ] Completed replacing temp checkpoint with checkpoint 88.879 ms, 48.82 s total | |
[ 2023-10-07 20:30:59 ] Completed Epoch: 13 batch 192: moving batch data to device 20.862 ms, 48.84 s total | |
[ 2023-10-07 20:30:59 ] Completed Epoch: 13 batch 192: forward pass 291.588 ms, 49.13 s total | |
[ 2023-10-07 20:30:59 ] Completed Epoch: 13 batch 192: backward pass 94.742 ms, 49.22 s total | |
[ 2023-10-07 20:31:00 ] Completed Epoch: 13 batch 192: computing loss 1,408.642 ms, 50.63 s total | |
EPOCH: [13], BATCH: [192/889], loss: 0.375, loss_box_reg: 0.111, loss_classifier: 0.096, loss_mask: 0.124, loss_objectness: 0.014, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 192 | |
[ 2023-10-07 20:31:01 ] Completed saving temp checkpoint 1,016.514 ms, 51.65 s total | |
[ 2023-10-07 20:31:01 ] Completed replacing temp checkpoint with checkpoint 63.104 ms, 51.71 s total | |
[ 2023-10-07 20:31:02 ] Completed Epoch: 13 batch 193: moving batch data to device 24.064 ms, 51.74 s total | |
[ 2023-10-07 20:31:02 ] Completed Epoch: 13 batch 193: forward pass 330.109 ms, 52.07 s total | |
[ 2023-10-07 20:31:02 ] Completed Epoch: 13 batch 193: backward pass 391.592 ms, 52.46 s total | |
[ 2023-10-07 20:31:03 ] Completed Epoch: 13 batch 193: computing loss 918.971 ms, 53.38 s total | |
EPOCH: [13], BATCH: [193/889], loss: 0.404, loss_box_reg: 0.125, loss_classifier: 0.099, loss_mask: 0.138, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 193 | |
[ 2023-10-07 20:31:04 ] Completed saving temp checkpoint 1,005.405 ms, 54.38 s total | |
[ 2023-10-07 20:31:04 ] Completed replacing temp checkpoint with checkpoint 58.940 ms, 54.44 s total | |
[ 2023-10-07 20:31:04 ] Completed Epoch: 13 batch 194: moving batch data to device 21.607 ms, 54.46 s total | |
[ 2023-10-07 20:31:05 ] Completed Epoch: 13 batch 194: forward pass 339.481 ms, 54.80 s total | |
[ 2023-10-07 20:31:05 ] Completed Epoch: 13 batch 194: backward pass 80.159 ms, 54.88 s total | |
[ 2023-10-07 20:31:06 ] Completed Epoch: 13 batch 194: computing loss 1,244.043 ms, 56.13 s total | |
EPOCH: [13], BATCH: [194/889], loss: 0.406, loss_box_reg: 0.117, loss_classifier: 0.107, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 194 | |
[ 2023-10-07 20:31:07 ] Completed saving temp checkpoint 1,189.988 ms, 57.32 s total | |
[ 2023-10-07 20:31:07 ] Completed replacing temp checkpoint with checkpoint 64.686 ms, 57.38 s total | |
[ 2023-10-07 20:31:07 ] Completed Epoch: 13 batch 195: moving batch data to device 22.344 ms, 57.40 s total | |
[ 2023-10-07 20:31:08 ] Completed Epoch: 13 batch 195: forward pass 318.383 ms, 57.72 s total | |
[ 2023-10-07 20:31:08 ] Completed Epoch: 13 batch 195: backward pass 55.502 ms, 57.78 s total | |
[ 2023-10-07 20:31:09 ] Completed Epoch: 13 batch 195: computing loss 1,191.752 ms, 58.97 s total | |
EPOCH: [13], BATCH: [195/889], loss: 0.402, loss_box_reg: 0.124, loss_classifier: 0.107, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 195 | |
[ 2023-10-07 20:31:10 ] Completed saving temp checkpoint 1,045.894 ms, 60.01 s total | |
[ 2023-10-07 20:31:10 ] Completed replacing temp checkpoint with checkpoint 38.715 ms, 60.05 s total | |
[ 2023-10-07 20:31:10 ] Completed Epoch: 13 batch 196: moving batch data to device 20.997 ms, 60.07 s total | |
[ 2023-10-07 20:31:10 ] Completed Epoch: 13 batch 196: forward pass 318.081 ms, 60.39 s total | |
[ 2023-10-07 20:31:10 ] Completed Epoch: 13 batch 196: backward pass 58.734 ms, 60.45 s total | |
[ 2023-10-07 20:31:12 ] Completed Epoch: 13 batch 196: computing loss 1,523.792 ms, 61.97 s total | |
EPOCH: [13], BATCH: [196/889], loss: 0.385, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 196 | |
[ 2023-10-07 20:31:13 ] Completed saving temp checkpoint 1,128.574 ms, 63.10 s total | |
[ 2023-10-07 20:31:13 ] Completed replacing temp checkpoint with checkpoint 67.397 ms, 63.17 s total | |
[ 2023-10-07 20:31:13 ] Completed Epoch: 13 batch 197: moving batch data to device 22.168 ms, 63.19 s total | |
[ 2023-10-07 20:31:13 ] Completed Epoch: 13 batch 197: forward pass 307.550 ms, 63.50 s total | |
[ 2023-10-07 20:31:13 ] Completed Epoch: 13 batch 197: backward pass 55.182 ms, 63.56 s total | |
[ 2023-10-07 20:31:15 ] Completed Epoch: 13 batch 197: computing loss 1,501.842 ms, 65.06 s total | |
EPOCH: [13], BATCH: [197/889], loss: 0.368, loss_box_reg: 0.106, loss_classifier: 0.089, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 197 | |
[ 2023-10-07 20:31:16 ] Completed saving temp checkpoint 1,358.905 ms, 66.42 s total | |
[ 2023-10-07 20:31:16 ] Completed replacing temp checkpoint with checkpoint 50.451 ms, 66.47 s total | |
[ 2023-10-07 20:31:16 ] Completed Epoch: 13 batch 198: moving batch data to device 22.912 ms, 66.49 s total | |
[ 2023-10-07 20:31:17 ] Completed Epoch: 13 batch 198: forward pass 361.130 ms, 66.85 s total | |
[ 2023-10-07 20:31:17 ] Completed Epoch: 13 batch 198: backward pass 107.381 ms, 66.96 s total | |
[ 2023-10-07 20:31:18 ] Completed Epoch: 13 batch 198: computing loss 1,314.248 ms, 68.27 s total | |
EPOCH: [13], BATCH: [198/889], loss: 0.396, loss_box_reg: 0.121, loss_classifier: 0.101, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 198 | |
[ 2023-10-07 20:31:19 ] Completed saving temp checkpoint 989.193 ms, 69.26 s total | |
[ 2023-10-07 20:31:19 ] Completed replacing temp checkpoint with checkpoint 50.916 ms, 69.31 s total | |
[ 2023-10-07 20:31:19 ] Completed Epoch: 13 batch 199: moving batch data to device 21.029 ms, 69.33 s total | |
[ 2023-10-07 20:31:19 ] Completed Epoch: 13 batch 199: forward pass 338.646 ms, 69.67 s total | |
[ 2023-10-07 20:31:20 ] Completed Epoch: 13 batch 199: backward pass 69.430 ms, 69.74 s total | |
[ 2023-10-07 20:31:21 ] Completed Epoch: 13 batch 199: computing loss 1,041.847 ms, 70.78 s total | |
EPOCH: [13], BATCH: [199/889], loss: 0.360, loss_box_reg: 0.106, loss_classifier: 0.088, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 199 | |
[ 2023-10-07 20:31:22 ] Completed saving temp checkpoint 1,154.441 ms, 71.94 s total | |
[ 2023-10-07 20:31:22 ] Completed replacing temp checkpoint with checkpoint 39.406 ms, 71.98 s total | |
[ 2023-10-07 20:31:22 ] Completed Epoch: 13 batch 200: moving batch data to device 21.565 ms, 72.00 s total | |
[ 2023-10-07 20:31:22 ] Completed Epoch: 13 batch 200: forward pass 313.716 ms, 72.31 s total | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 20:53:06 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 20:53:06 ] Completed importing Timer 0.027 ms, 0.00 s total | |
[ 2023-10-07 20:53:07 ] Completed importing everything else 707.623 ms, 0.71 s total | |
[ 2023-10-07 20:53:07 ] Completed defined other functions 0.026 ms, 0.71 s total | |
| distributed init (rank 1): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 0): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 20:53:10 ] Completed main preliminaries 2,901.928 ms, 3.61 s total | |
loading annotations into memory... | |
Done (t=11.30s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 20:53:23 ] Completed loading data 13,135.575 ms, 16.75 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 20:53:23 ] Completed creating data samplers 100.083 ms, 16.85 s total | |
[ 2023-10-07 20:53:23 ] Completed creating data loaders 0.231 ms, 16.85 s total | |
[ 2023-10-07 20:53:24 ] Completed creating model and .to(device) 664.227 ms, 17.51 s total | |
[ 2023-10-07 20:53:26 ] Completed preparing model for distributed training 2,263.756 ms, 19.77 s total | |
[ 2023-10-07 20:53:26 ] Completed optimizer and scaler 0.553 ms, 19.77 s total | |
[ 2023-10-07 20:53:26 ] Completed learning rate schedulers 0.148 ms, 19.77 s total | |
[ 2023-10-07 20:53:27 ] Completed init coco evaluator 986.679 ms, 20.76 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 20:53:28 ] Completed retrieving checkpoint 815.101 ms, 21.58 s total | |
EPOCH :: 13 | |
[ 2023-10-07 20:53:28 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 20:53:28 ] Completed training preliminaries 0.880 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 200 | |
[ 2023-10-07 20:53:28 ] Completed Epoch: 13 batch 200: moving batch data to device 291.686 ms, 0.29 s total | |
[ 2023-10-07 20:53:34 ] Completed Epoch: 13 batch 200: forward pass 6,033.523 ms, 6.33 s total | |
[ 2023-10-07 20:53:34 ] Completed Epoch: 13 batch 200: backward pass 145.206 ms, 6.47 s total | |
[ 2023-10-07 20:53:36 ] Completed Epoch: 13 batch 200: computing loss 1,173.047 ms, 7.64 s total | |
EPOCH: [13], BATCH: [200/889], loss: 0.415, loss_box_reg: 0.128, loss_classifier: 0.106, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 200 | |
[ 2023-10-07 20:53:39 ] Completed saving temp checkpoint 3,270.844 ms, 10.92 s total | |
[ 2023-10-07 20:53:39 ] Completed replacing temp checkpoint with checkpoint 153.115 ms, 11.07 s total | |
[ 2023-10-07 20:53:39 ] Completed Epoch: 13 batch 201: moving batch data to device 2.834 ms, 11.07 s total | |
[ 2023-10-07 20:53:39 ] Completed Epoch: 13 batch 201: forward pass 431.358 ms, 11.50 s total | |
[ 2023-10-07 20:53:39 ] Completed Epoch: 13 batch 201: backward pass 63.226 ms, 11.57 s total | |
[ 2023-10-07 20:53:42 ] Completed Epoch: 13 batch 201: computing loss 2,385.076 ms, 13.95 s total | |
EPOCH: [13], BATCH: [201/889], loss: 0.350, loss_box_reg: 0.106, loss_classifier: 0.087, loss_mask: 0.122, loss_objectness: 0.012, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 201 | |
[ 2023-10-07 20:53:44 ] Completed saving temp checkpoint 1,796.375 ms, 15.75 s total | |
[ 2023-10-07 20:53:44 ] Completed replacing temp checkpoint with checkpoint 843.611 ms, 16.59 s total | |
[ 2023-10-07 20:53:45 ] Completed Epoch: 13 batch 202: moving batch data to device 63.811 ms, 16.65 s total | |
[ 2023-10-07 20:53:45 ] Completed Epoch: 13 batch 202: forward pass 458.349 ms, 17.11 s total | |
[ 2023-10-07 20:53:45 ] Completed Epoch: 13 batch 202: backward pass 356.659 ms, 17.47 s total | |
[ 2023-10-07 20:53:47 ] Completed Epoch: 13 batch 202: computing loss 1,526.293 ms, 19.00 s total | |
EPOCH: [13], BATCH: [202/889], loss: 0.393, loss_box_reg: 0.122, loss_classifier: 0.097, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 202 | |
[ 2023-10-07 20:53:49 ] Completed saving temp checkpoint 2,271.994 ms, 21.27 s total | |
[ 2023-10-07 20:53:49 ] Completed replacing temp checkpoint with checkpoint 54.271 ms, 21.32 s total | |
[ 2023-10-07 20:53:49 ] Completed Epoch: 13 batch 203: moving batch data to device 4.006 ms, 21.33 s total | |
[ 2023-10-07 20:53:50 ] Completed Epoch: 13 batch 203: forward pass 448.070 ms, 21.77 s total | |
[ 2023-10-07 20:53:50 ] Completed Epoch: 13 batch 203: backward pass 395.800 ms, 22.17 s total | |
[ 2023-10-07 20:53:52 ] Completed Epoch: 13 batch 203: computing loss 1,767.707 ms, 23.94 s total | |
EPOCH: [13], BATCH: [203/889], loss: 0.370, loss_box_reg: 0.115, loss_classifier: 0.089, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 203 | |
[ 2023-10-07 20:53:54 ] Completed saving temp checkpoint 2,079.378 ms, 26.02 s total | |
[ 2023-10-07 20:53:54 ] Completed replacing temp checkpoint with checkpoint 81.680 ms, 26.10 s total | |
[ 2023-10-07 20:53:54 ] Completed Epoch: 13 batch 204: moving batch data to device 4.535 ms, 26.10 s total | |
[ 2023-10-07 20:53:54 ] Completed Epoch: 13 batch 204: forward pass 432.856 ms, 26.54 s total | |
[ 2023-10-07 20:53:55 ] Completed Epoch: 13 batch 204: backward pass 73.450 ms, 26.61 s total | |
[ 2023-10-07 20:53:57 ] Completed Epoch: 13 batch 204: computing loss 2,022.677 ms, 28.63 s total | |
EPOCH: [13], BATCH: [204/889], loss: 0.399, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 204 | |
[ 2023-10-07 20:53:59 ] Completed saving temp checkpoint 2,925.630 ms, 31.56 s total | |
[ 2023-10-07 20:54:01 ] Completed replacing temp checkpoint with checkpoint 1,262.232 ms, 32.82 s total | |
[ 2023-10-07 20:54:01 ] Completed Epoch: 13 batch 205: moving batch data to device 3.743 ms, 32.82 s total | |
[ 2023-10-07 20:54:01 ] Completed Epoch: 13 batch 205: forward pass 444.860 ms, 33.27 s total | |
[ 2023-10-07 20:54:01 ] Completed Epoch: 13 batch 205: backward pass 79.133 ms, 33.35 s total | |
[ 2023-10-07 20:54:03 ] Completed Epoch: 13 batch 205: computing loss 2,026.958 ms, 35.37 s total | |
EPOCH: [13], BATCH: [205/889], loss: 0.387, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 205 | |
[ 2023-10-07 20:54:05 ] Completed saving temp checkpoint 1,500.724 ms, 36.88 s total | |
[ 2023-10-07 20:54:05 ] Completed replacing temp checkpoint with checkpoint 601.770 ms, 37.48 s total | |
[ 2023-10-07 20:54:05 ] Completed Epoch: 13 batch 206: moving batch data to device 4.351 ms, 37.48 s total | |
[ 2023-10-07 20:54:06 ] Completed Epoch: 13 batch 206: forward pass 442.955 ms, 37.92 s total | |
[ 2023-10-07 20:54:06 ] Completed Epoch: 13 batch 206: backward pass 66.843 ms, 37.99 s total | |
[ 2023-10-07 20:54:08 ] Completed Epoch: 13 batch 206: computing loss 2,003.969 ms, 40.00 s total | |
EPOCH: [13], BATCH: [206/889], loss: 0.384, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 206 | |
[ 2023-10-07 20:54:09 ] Completed saving temp checkpoint 1,223.861 ms, 41.22 s total | |
[ 2023-10-07 20:54:10 ] Completed replacing temp checkpoint with checkpoint 599.129 ms, 41.82 s total | |
[ 2023-10-07 20:54:10 ] Completed Epoch: 13 batch 207: moving batch data to device 10.081 ms, 41.83 s total | |
[ 2023-10-07 20:54:10 ] Completed Epoch: 13 batch 207: forward pass 427.359 ms, 42.26 s total | |
[ 2023-10-07 20:54:10 ] Completed Epoch: 13 batch 207: backward pass 87.673 ms, 42.34 s total | |
[ 2023-10-07 20:54:12 ] Completed Epoch: 13 batch 207: computing loss 1,680.482 ms, 44.02 s total | |
EPOCH: [13], BATCH: [207/889], loss: 0.385, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 207 | |
[ 2023-10-07 20:54:14 ] Completed saving temp checkpoint 2,095.066 ms, 46.12 s total | |
[ 2023-10-07 20:54:14 ] Completed replacing temp checkpoint with checkpoint 39.957 ms, 46.16 s total | |
[ 2023-10-07 20:54:14 ] Completed Epoch: 13 batch 208: moving batch data to device 9.064 ms, 46.17 s total | |
[ 2023-10-07 20:54:15 ] Completed Epoch: 13 batch 208: forward pass 457.951 ms, 46.63 s total | |
[ 2023-10-07 20:54:15 ] Completed Epoch: 13 batch 208: backward pass 43.512 ms, 46.67 s total | |
[ 2023-10-07 20:54:17 ] Completed Epoch: 13 batch 208: computing loss 2,385.377 ms, 49.05 s total | |
EPOCH: [13], BATCH: [208/889], loss: 0.406, loss_box_reg: 0.126, loss_classifier: 0.105, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 208 | |
[ 2023-10-07 20:54:19 ] Completed saving temp checkpoint 2,141.603 ms, 51.20 s total | |
[ 2023-10-07 20:54:20 ] Completed replacing temp checkpoint with checkpoint 1,334.263 ms, 52.53 s total | |
[ 2023-10-07 20:54:20 ] Completed Epoch: 13 batch 209: moving batch data to device 9.119 ms, 52.54 s total | |
[ 2023-10-07 20:54:21 ] Completed Epoch: 13 batch 209: forward pass 378.244 ms, 52.92 s total | |
[ 2023-10-07 20:54:21 ] Completed Epoch: 13 batch 209: backward pass 121.692 ms, 53.04 s total | |
[ 2023-10-07 20:54:21 ] Completed Epoch: 13 batch 209: computing loss 300.367 ms, 53.34 s total | |
EPOCH: [13], BATCH: [209/889], loss: 0.393, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 209 | |
[ 2023-10-07 20:54:23 ] Completed saving temp checkpoint 1,595.424 ms, 54.94 s total | |
[ 2023-10-07 20:54:23 ] Completed replacing temp checkpoint with checkpoint 55.819 ms, 54.99 s total | |
[ 2023-10-07 20:54:23 ] Completed Epoch: 13 batch 210: moving batch data to device 6.597 ms, 55.00 s total | |
[ 2023-10-07 20:54:23 ] Completed Epoch: 13 batch 210: forward pass 432.733 ms, 55.43 s total | |
[ 2023-10-07 20:54:23 ] Completed Epoch: 13 batch 210: backward pass 68.437 ms, 55.50 s total | |
[ 2023-10-07 20:54:25 ] Completed Epoch: 13 batch 210: computing loss 1,396.526 ms, 56.90 s total | |
EPOCH: [13], BATCH: [210/889], loss: 0.393, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 210 | |
[ 2023-10-07 20:54:27 ] Completed saving temp checkpoint 1,792.026 ms, 58.69 s total | |
[ 2023-10-07 20:54:27 ] Completed replacing temp checkpoint with checkpoint 114.719 ms, 58.80 s total | |
[ 2023-10-07 20:54:27 ] Completed Epoch: 13 batch 211: moving batch data to device 5.653 ms, 58.81 s total | |
[ 2023-10-07 20:54:27 ] Completed Epoch: 13 batch 211: forward pass 137.900 ms, 58.95 s total | |
[ 2023-10-07 20:54:27 ] Completed Epoch: 13 batch 211: backward pass 90.613 ms, 59.04 s total | |
[ 2023-10-07 20:54:27 ] Completed Epoch: 13 batch 211: computing loss 137.834 ms, 59.17 s total | |
EPOCH: [13], BATCH: [211/889], loss: 0.390, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 211 | |
[ 2023-10-07 20:54:28 ] Completed saving temp checkpoint 1,198.300 ms, 60.37 s total | |
[ 2023-10-07 20:54:28 ] Completed replacing temp checkpoint with checkpoint 50.485 ms, 60.42 s total | |
[ 2023-10-07 20:54:28 ] Completed Epoch: 13 batch 212: moving batch data to device 4.321 ms, 60.43 s total | |
[ 2023-10-07 20:54:28 ] Completed Epoch: 13 batch 212: forward pass 112.222 ms, 60.54 s total | |
[ 2023-10-07 20:54:29 ] Completed Epoch: 13 batch 212: backward pass 120.474 ms, 60.66 s total | |
[ 2023-10-07 20:54:29 ] Completed Epoch: 13 batch 212: computing loss 86.290 ms, 60.75 s total | |
EPOCH: [13], BATCH: [212/889], loss: 0.384, loss_box_reg: 0.115, loss_classifier: 0.091, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 212 | |
[ 2023-10-07 20:54:30 ] Completed saving temp checkpoint 1,270.790 ms, 62.02 s total | |
[ 2023-10-07 20:54:30 ] Completed replacing temp checkpoint with checkpoint 64.948 ms, 62.08 s total | |
[ 2023-10-07 20:54:30 ] Completed Epoch: 13 batch 213: moving batch data to device 4.548 ms, 62.09 s total | |
[ 2023-10-07 20:54:30 ] Completed Epoch: 13 batch 213: forward pass 107.804 ms, 62.19 s total | |
[ 2023-10-07 20:54:30 ] Completed Epoch: 13 batch 213: backward pass 78.285 ms, 62.27 s total | |
[ 2023-10-07 20:54:30 ] Completed Epoch: 13 batch 213: computing loss 117.143 ms, 62.39 s total | |
EPOCH: [13], BATCH: [213/889], loss: 0.396, loss_box_reg: 0.124, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 213 | |
[ 2023-10-07 20:54:31 ] Completed saving temp checkpoint 1,135.718 ms, 63.53 s total | |
[ 2023-10-07 20:54:31 ] Completed replacing temp checkpoint with checkpoint 54.082 ms, 63.58 s total | |
[ 2023-10-07 20:54:31 ] Completed Epoch: 13 batch 214: moving batch data to device 6.147 ms, 63.59 s total | |
[ 2023-10-07 20:54:32 ] Completed Epoch: 13 batch 214: forward pass 112.651 ms, 63.70 s total | |
[ 2023-10-07 20:54:32 ] Completed Epoch: 13 batch 214: backward pass 78.181 ms, 63.78 s total | |
[ 2023-10-07 20:54:32 ] Completed Epoch: 13 batch 214: computing loss 116.033 ms, 63.89 s total | |
EPOCH: [13], BATCH: [214/889], loss: 0.406, loss_box_reg: 0.121, loss_classifier: 0.102, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 214 | |
[ 2023-10-07 20:54:33 ] Completed saving temp checkpoint 1,172.984 ms, 65.07 s total | |
[ 2023-10-07 20:54:33 ] Completed replacing temp checkpoint with checkpoint 53.118 ms, 65.12 s total | |
[ 2023-10-07 20:54:33 ] Completed Epoch: 13 batch 215: moving batch data to device 7.978 ms, 65.13 s total | |
[ 2023-10-07 20:54:33 ] Completed Epoch: 13 batch 215: forward pass 113.263 ms, 65.24 s total | |
[ 2023-10-07 20:54:33 ] Completed Epoch: 13 batch 215: backward pass 118.975 ms, 65.36 s total | |
[ 2023-10-07 20:54:33 ] Completed Epoch: 13 batch 215: computing loss 84.587 ms, 65.44 s total | |
EPOCH: [13], BATCH: [215/889], loss: 0.421, loss_box_reg: 0.127, loss_classifier: 0.103, loss_mask: 0.142, loss_objectness: 0.018, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 215 | |
[ 2023-10-07 20:54:34 ] Completed saving temp checkpoint 1,120.460 ms, 66.56 s total | |
[ 2023-10-07 20:54:35 ] Completed replacing temp checkpoint with checkpoint 61.744 ms, 66.63 s total | |
[ 2023-10-07 20:54:35 ] Completed Epoch: 13 batch 216: moving batch data to device 7.009 ms, 66.63 s total | |
[ 2023-10-07 20:54:35 ] Completed Epoch: 13 batch 216: forward pass 403.194 ms, 67.04 s total | |
[ 2023-10-07 20:54:35 ] Completed Epoch: 13 batch 216: backward pass 113.931 ms, 67.15 s total | |
[ 2023-10-07 20:54:36 ] Completed Epoch: 13 batch 216: computing loss 958.208 ms, 68.11 s total | |
EPOCH: [13], BATCH: [216/889], loss: 0.406, loss_box_reg: 0.121, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 216 | |
[ 2023-10-07 20:54:37 ] Completed saving temp checkpoint 1,165.457 ms, 69.27 s total | |
[ 2023-10-07 20:54:37 ] Completed replacing temp checkpoint with checkpoint 66.807 ms, 69.34 s total | |
[ 2023-10-07 20:54:37 ] Completed Epoch: 13 batch 217: moving batch data to device 10.224 ms, 69.35 s total | |
[ 2023-10-07 20:54:37 ] Completed Epoch: 13 batch 217: forward pass 107.063 ms, 69.46 s total | |
[ 2023-10-07 20:54:37 ] Completed Epoch: 13 batch 217: backward pass 82.125 ms, 69.54 s total | |
[ 2023-10-07 20:54:38 ] Completed Epoch: 13 batch 217: computing loss 99.647 ms, 69.64 s total | |
EPOCH: [13], BATCH: [217/889], loss: 0.404, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.138, loss_objectness: 0.020, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 217 | |
[ 2023-10-07 20:54:39 ] Completed saving temp checkpoint 1,103.708 ms, 70.74 s total | |
[ 2023-10-07 20:54:39 ] Completed replacing temp checkpoint with checkpoint 59.976 ms, 70.80 s total | |
[ 2023-10-07 20:54:39 ] Completed Epoch: 13 batch 218: moving batch data to device 5.522 ms, 70.81 s total | |
[ 2023-10-07 20:54:39 ] Completed Epoch: 13 batch 218: forward pass 115.590 ms, 70.92 s total | |
[ 2023-10-07 20:54:39 ] Completed Epoch: 13 batch 218: backward pass 96.852 ms, 71.02 s total | |
[ 2023-10-07 20:54:39 ] Completed Epoch: 13 batch 218: computing loss 132.900 ms, 71.15 s total | |
EPOCH: [13], BATCH: [218/889], loss: 0.411, loss_box_reg: 0.123, loss_classifier: 0.106, loss_mask: 0.132, loss_objectness: 0.020, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 218 | |
[ 2023-10-07 20:54:40 ] Completed saving temp checkpoint 1,339.643 ms, 72.49 s total | |
[ 2023-10-07 20:54:40 ] Completed replacing temp checkpoint with checkpoint 62.063 ms, 72.56 s total | |
[ 2023-10-07 20:54:40 ] Completed Epoch: 13 batch 219: moving batch data to device 23.736 ms, 72.58 s total | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 21:21:31 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 21:21:31 ] Completed importing Timer 0.026 ms, 0.00 s total | |
[ 2023-10-07 21:21:32 ] Completed importing everything else 510.624 ms, 0.51 s total | |
[ 2023-10-07 21:21:32 ] Completed defined other functions 0.025 ms, 0.51 s total | |
| distributed init (rank 4): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 5): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 21:21:35 ] Completed main preliminaries 3,051.202 ms, 3.56 s total | |
loading annotations into memory... | |
Done (t=12.83s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 21:21:50 ] Completed loading data 14,825.410 ms, 18.39 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 21:21:50 ] Completed creating data samplers 107.380 ms, 18.49 s total | |
[ 2023-10-07 21:21:50 ] Completed creating data loaders 0.240 ms, 18.49 s total | |
[ 2023-10-07 21:21:50 ] Completed creating model and .to(device) 731.898 ms, 19.23 s total | |
[ 2023-10-07 21:21:52 ] Completed preparing model for distributed training 1,281.266 ms, 20.51 s total | |
[ 2023-10-07 21:21:52 ] Completed optimizer and scaler 0.557 ms, 20.51 s total | |
[ 2023-10-07 21:21:52 ] Completed learning rate schedulers 0.211 ms, 20.51 s total | |
[ 2023-10-07 21:21:53 ] Completed init coco evaluator 967.945 ms, 21.48 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 21:21:54 ] Completed retrieving checkpoint 974.301 ms, 22.45 s total | |
EPOCH :: 13 | |
[ 2023-10-07 21:21:54 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 21:21:54 ] Completed training preliminaries 0.876 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 219 | |
[ 2023-10-07 21:21:54 ] Completed Epoch: 13 batch 219: moving batch data to device 243.872 ms, 0.24 s total | |
[ 2023-10-07 21:21:59 ] Completed Epoch: 13 batch 219: forward pass 5,249.355 ms, 5.49 s total | |
[ 2023-10-07 21:21:59 ] Completed Epoch: 13 batch 219: backward pass 266.134 ms, 5.76 s total | |
[ 2023-10-07 21:22:00 ] Completed Epoch: 13 batch 219: computing loss 1,030.547 ms, 6.79 s total | |
EPOCH: [13], BATCH: [219/889], loss: 0.361, loss_box_reg: 0.106, loss_classifier: 0.082, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 219 | |
[ 2023-10-07 21:22:02 ] Completed saving temp checkpoint 2,039.547 ms, 8.83 s total | |
[ 2023-10-07 21:22:03 ] Completed replacing temp checkpoint with checkpoint 160.846 ms, 8.99 s total | |
[ 2023-10-07 21:22:03 ] Completed Epoch: 13 batch 220: moving batch data to device 63.593 ms, 9.05 s total | |
[ 2023-10-07 21:22:03 ] Completed Epoch: 13 batch 220: forward pass 342.548 ms, 9.40 s total | |
[ 2023-10-07 21:22:03 ] Completed Epoch: 13 batch 220: backward pass 412.471 ms, 9.81 s total | |
[ 2023-10-07 21:22:05 ] Completed Epoch: 13 batch 220: computing loss 1,678.773 ms, 11.49 s total | |
EPOCH: [13], BATCH: [220/889], loss: 0.377, loss_box_reg: 0.119, loss_classifier: 0.096, loss_mask: 0.124, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 220 | |
[ 2023-10-07 21:22:06 ] Completed saving temp checkpoint 1,401.175 ms, 12.89 s total | |
[ 2023-10-07 21:22:07 ] Completed replacing temp checkpoint with checkpoint 45.700 ms, 12.94 s total | |
[ 2023-10-07 21:22:07 ] Completed Epoch: 13 batch 221: moving batch data to device 19.078 ms, 12.95 s total | |
[ 2023-10-07 21:22:07 ] Completed Epoch: 13 batch 221: forward pass 326.889 ms, 13.28 s total | |
[ 2023-10-07 21:22:07 ] Completed Epoch: 13 batch 221: backward pass 40.970 ms, 13.32 s total | |
[ 2023-10-07 21:22:08 ] Completed Epoch: 13 batch 221: computing loss 1,274.695 ms, 14.60 s total | |
EPOCH: [13], BATCH: [221/889], loss: 0.378, loss_box_reg: 0.109, loss_classifier: 0.099, loss_mask: 0.135, loss_objectness: 0.016, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 221 | |
[ 2023-10-07 21:22:09 ] Completed saving temp checkpoint 1,171.356 ms, 15.77 s total | |
[ 2023-10-07 21:22:09 ] Completed replacing temp checkpoint with checkpoint 77.042 ms, 15.85 s total | |
[ 2023-10-07 21:22:09 ] Completed Epoch: 13 batch 222: moving batch data to device 24.459 ms, 15.87 s total | |
[ 2023-10-07 21:22:10 ] Completed Epoch: 13 batch 222: forward pass 349.117 ms, 16.22 s total | |
[ 2023-10-07 21:22:10 ] Completed Epoch: 13 batch 222: backward pass 50.014 ms, 16.27 s total | |
[ 2023-10-07 21:22:11 ] Completed Epoch: 13 batch 222: computing loss 1,275.117 ms, 17.54 s total | |
EPOCH: [13], BATCH: [222/889], loss: 0.389, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 222 | |
[ 2023-10-07 21:22:13 ] Completed saving temp checkpoint 1,806.208 ms, 19.35 s total | |
[ 2023-10-07 21:22:13 ] Completed replacing temp checkpoint with checkpoint 61.136 ms, 19.41 s total | |
[ 2023-10-07 21:22:13 ] Completed Epoch: 13 batch 223: moving batch data to device 3.906 ms, 19.42 s total | |
[ 2023-10-07 21:22:13 ] Completed Epoch: 13 batch 223: forward pass 441.284 ms, 19.86 s total | |
[ 2023-10-07 21:22:14 ] Completed Epoch: 13 batch 223: backward pass 376.843 ms, 20.23 s total | |
[ 2023-10-07 21:22:15 ] Completed Epoch: 13 batch 223: computing loss 1,012.753 ms, 21.25 s total | |
EPOCH: [13], BATCH: [223/889], loss: 0.404, loss_box_reg: 0.123, loss_classifier: 0.109, loss_mask: 0.131, loss_objectness: 0.018, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 223 | |
[ 2023-10-07 21:22:16 ] Completed saving temp checkpoint 1,175.613 ms, 22.42 s total | |
[ 2023-10-07 21:22:16 ] Completed replacing temp checkpoint with checkpoint 36.986 ms, 22.46 s total | |
[ 2023-10-07 21:22:16 ] Completed Epoch: 13 batch 224: moving batch data to device 21.800 ms, 22.48 s total | |
[ 2023-10-07 21:22:16 ] Completed Epoch: 13 batch 224: forward pass 304.285 ms, 22.78 s total | |
[ 2023-10-07 21:22:17 ] Completed Epoch: 13 batch 224: backward pass 638.819 ms, 23.42 s total | |
[ 2023-10-07 21:22:18 ] Completed Epoch: 13 batch 224: computing loss 831.048 ms, 24.25 s total | |
EPOCH: [13], BATCH: [224/889], loss: 0.403, loss_box_reg: 0.122, loss_classifier: 0.106, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 224 | |
[ 2023-10-07 21:22:19 ] Completed saving temp checkpoint 1,075.172 ms, 25.33 s total | |
[ 2023-10-07 21:22:19 ] Completed replacing temp checkpoint with checkpoint 41.491 ms, 25.37 s total | |
[ 2023-10-07 21:22:19 ] Completed Epoch: 13 batch 225: moving batch data to device 22.336 ms, 25.39 s total | |
[ 2023-10-07 21:22:19 ] Completed Epoch: 13 batch 225: forward pass 324.066 ms, 25.72 s total | |
[ 2023-10-07 21:22:19 ] Completed Epoch: 13 batch 225: backward pass 39.983 ms, 25.76 s total | |
[ 2023-10-07 21:22:21 ] Completed Epoch: 13 batch 225: computing loss 1,470.817 ms, 27.23 s total | |
EPOCH: [13], BATCH: [225/889], loss: 0.389, loss_box_reg: 0.116, loss_classifier: 0.098, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 225 | |
[ 2023-10-07 21:22:22 ] Completed saving temp checkpoint 1,114.838 ms, 28.34 s total | |
[ 2023-10-07 21:22:22 ] Completed replacing temp checkpoint with checkpoint 67.471 ms, 28.41 s total | |
[ 2023-10-07 21:22:22 ] Completed Epoch: 13 batch 226: moving batch data to device 24.899 ms, 28.44 s total | |
[ 2023-10-07 21:22:22 ] Completed Epoch: 13 batch 226: forward pass 337.503 ms, 28.77 s total | |
[ 2023-10-07 21:22:22 ] Completed Epoch: 13 batch 226: backward pass 34.536 ms, 28.81 s total | |
[ 2023-10-07 21:22:24 ] Completed Epoch: 13 batch 226: computing loss 1,211.138 ms, 30.02 s total | |
EPOCH: [13], BATCH: [226/889], loss: 0.422, loss_box_reg: 0.130, loss_classifier: 0.109, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 226 | |
[ 2023-10-07 21:22:25 ] Completed saving temp checkpoint 1,183.239 ms, 31.20 s total | |
[ 2023-10-07 21:22:25 ] Completed replacing temp checkpoint with checkpoint 67.894 ms, 31.27 s total | |
[ 2023-10-07 21:22:25 ] Completed Epoch: 13 batch 227: moving batch data to device 20.568 ms, 31.29 s total | |
[ 2023-10-07 21:22:25 ] Completed Epoch: 13 batch 227: forward pass 338.908 ms, 31.63 s total | |
[ 2023-10-07 21:22:25 ] Completed Epoch: 13 batch 227: backward pass 82.642 ms, 31.71 s total | |
[ 2023-10-07 21:22:27 ] Completed Epoch: 13 batch 227: computing loss 1,236.324 ms, 32.95 s total | |
EPOCH: [13], BATCH: [227/889], loss: 0.405, loss_box_reg: 0.127, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 227 | |
[ 2023-10-07 21:22:28 ] Completed saving temp checkpoint 1,170.192 ms, 34.12 s total | |
[ 2023-10-07 21:22:28 ] Completed replacing temp checkpoint with checkpoint 44.143 ms, 34.16 s total | |
[ 2023-10-07 21:22:28 ] Completed Epoch: 13 batch 228: moving batch data to device 21.484 ms, 34.18 s total | |
[ 2023-10-07 21:22:28 ] Completed Epoch: 13 batch 228: forward pass 336.720 ms, 34.52 s total | |
[ 2023-10-07 21:22:29 ] Completed Epoch: 13 batch 228: backward pass 396.681 ms, 34.92 s total | |
[ 2023-10-07 21:22:29 ] Completed Epoch: 13 batch 228: computing loss 853.897 ms, 35.77 s total | |
EPOCH: [13], BATCH: [228/889], loss: 0.397, loss_box_reg: 0.122, loss_classifier: 0.110, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 228 | |
[ 2023-10-07 21:22:30 ] Completed saving temp checkpoint 902.539 ms, 36.67 s total | |
[ 2023-10-07 21:22:30 ] Completed replacing temp checkpoint with checkpoint 71.010 ms, 36.75 s total | |
[ 2023-10-07 21:22:30 ] Completed Epoch: 13 batch 229: moving batch data to device 24.275 ms, 36.77 s total | |
[ 2023-10-07 21:22:31 ] Completed Epoch: 13 batch 229: forward pass 348.430 ms, 37.12 s total | |
[ 2023-10-07 21:22:31 ] Completed Epoch: 13 batch 229: backward pass 126.232 ms, 37.24 s total | |
[ 2023-10-07 21:22:32 ] Completed Epoch: 13 batch 229: computing loss 1,565.254 ms, 38.81 s total | |
EPOCH: [13], BATCH: [229/889], loss: 0.355, loss_box_reg: 0.105, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 229 | |
[ 2023-10-07 21:22:34 ] Completed saving temp checkpoint 1,458.709 ms, 40.27 s total | |
[ 2023-10-07 21:22:34 ] Completed replacing temp checkpoint with checkpoint 53.474 ms, 40.32 s total | |
[ 2023-10-07 21:22:34 ] Completed Epoch: 13 batch 230: moving batch data to device 21.517 ms, 40.34 s total | |
[ 2023-10-07 21:22:34 ] Completed Epoch: 13 batch 230: forward pass 317.831 ms, 40.66 s total | |
[ 2023-10-07 21:22:34 ] Completed Epoch: 13 batch 230: backward pass 50.135 ms, 40.71 s total | |
[ 2023-10-07 21:22:36 ] Completed Epoch: 13 batch 230: computing loss 1,932.606 ms, 42.64 s total | |
EPOCH: [13], BATCH: [230/889], loss: 0.379, loss_box_reg: 0.112, loss_classifier: 0.089, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 230 | |
[ 2023-10-07 21:22:38 ] Completed saving temp checkpoint 1,453.577 ms, 44.10 s total | |
[ 2023-10-07 21:22:38 ] Completed replacing temp checkpoint with checkpoint 56.552 ms, 44.15 s total | |
[ 2023-10-07 21:22:38 ] Completed Epoch: 13 batch 231: moving batch data to device 23.541 ms, 44.18 s total | |
[ 2023-10-07 21:22:38 ] Completed Epoch: 13 batch 231: forward pass 329.859 ms, 44.51 s total | |
[ 2023-10-07 21:22:38 ] Completed Epoch: 13 batch 231: backward pass 73.592 ms, 44.58 s total | |
[ 2023-10-07 21:22:39 ] Completed Epoch: 13 batch 231: computing loss 1,238.040 ms, 45.82 s total | |
EPOCH: [13], BATCH: [231/889], loss: 0.370, loss_box_reg: 0.108, loss_classifier: 0.097, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 231 | |
[ 2023-10-07 21:22:40 ] Completed saving temp checkpoint 1,086.622 ms, 46.91 s total | |
[ 2023-10-07 21:22:41 ] Completed replacing temp checkpoint with checkpoint 55.825 ms, 46.96 s total | |
[ 2023-10-07 21:22:41 ] Completed Epoch: 13 batch 232: moving batch data to device 23.577 ms, 46.98 s total | |
[ 2023-10-07 21:22:41 ] Completed Epoch: 13 batch 232: forward pass 339.513 ms, 47.32 s total | |
[ 2023-10-07 21:22:41 ] Completed Epoch: 13 batch 232: backward pass 90.031 ms, 47.41 s total | |
[ 2023-10-07 21:22:42 ] Completed Epoch: 13 batch 232: computing loss 1,225.779 ms, 48.64 s total | |
EPOCH: [13], BATCH: [232/889], loss: 0.372, loss_box_reg: 0.106, loss_classifier: 0.099, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 232 | |
[ 2023-10-07 21:22:44 ] Completed saving temp checkpoint 1,457.633 ms, 50.10 s total | |
[ 2023-10-07 21:22:44 ] Completed replacing temp checkpoint with checkpoint 43.304 ms, 50.14 s total | |
[ 2023-10-07 21:22:44 ] Completed Epoch: 13 batch 233: moving batch data to device 21.574 ms, 50.16 s total | |
[ 2023-10-07 21:22:44 ] Completed Epoch: 13 batch 233: forward pass 330.986 ms, 50.49 s total | |
[ 2023-10-07 21:22:44 ] Completed Epoch: 13 batch 233: backward pass 71.645 ms, 50.57 s total | |
[ 2023-10-07 21:22:46 ] Completed Epoch: 13 batch 233: computing loss 1,834.122 ms, 52.40 s total | |
EPOCH: [13], BATCH: [233/889], loss: 0.382, loss_box_reg: 0.112, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 233 | |
[ 2023-10-07 21:22:48 ] Completed saving temp checkpoint 1,799.946 ms, 54.20 s total | |
[ 2023-10-07 21:22:48 ] Completed replacing temp checkpoint with checkpoint 59.325 ms, 54.26 s total | |
[ 2023-10-07 21:22:48 ] Completed Epoch: 13 batch 234: moving batch data to device 23.381 ms, 54.28 s total | |
[ 2023-10-07 21:22:48 ] Completed Epoch: 13 batch 234: forward pass 405.964 ms, 54.69 s total | |
[ 2023-10-07 21:22:48 ] Completed Epoch: 13 batch 234: backward pass 41.439 ms, 54.73 s total | |
[ 2023-10-07 21:22:49 ] Completed Epoch: 13 batch 234: computing loss 1,020.559 ms, 55.75 s total | |
EPOCH: [13], BATCH: [234/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 234 | |
[ 2023-10-07 21:22:50 ] Completed saving temp checkpoint 1,025.129 ms, 56.78 s total | |
[ 2023-10-07 21:22:50 ] Completed replacing temp checkpoint with checkpoint 58.446 ms, 56.83 s total | |
[ 2023-10-07 21:22:50 ] Completed Epoch: 13 batch 235: moving batch data to device 22.947 ms, 56.86 s total | |
[ 2023-10-07 21:22:51 ] Completed Epoch: 13 batch 235: forward pass 325.226 ms, 57.18 s total | |
[ 2023-10-07 21:22:51 ] Completed Epoch: 13 batch 235: backward pass 49.066 ms, 57.23 s total | |
[ 2023-10-07 21:22:52 ] Completed Epoch: 13 batch 235: computing loss 1,449.461 ms, 58.68 s total | |
EPOCH: [13], BATCH: [235/889], loss: 0.402, loss_box_reg: 0.117, loss_classifier: 0.109, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 235 | |
[ 2023-10-07 21:22:53 ] Completed saving temp checkpoint 1,067.417 ms, 59.75 s total | |
[ 2023-10-07 21:22:53 ] Completed replacing temp checkpoint with checkpoint 55.752 ms, 59.80 s total | |
[ 2023-10-07 21:22:53 ] Completed Epoch: 13 batch 236: moving batch data to device 22.620 ms, 59.83 s total | |
[ 2023-10-07 21:22:54 ] Completed Epoch: 13 batch 236: forward pass 331.063 ms, 60.16 s total | |
[ 2023-10-07 21:22:54 ] Completed Epoch: 13 batch 236: backward pass 85.726 ms, 60.24 s total | |
[ 2023-10-07 21:22:55 ] Completed Epoch: 13 batch 236: computing loss 1,595.707 ms, 61.84 s total | |
EPOCH: [13], BATCH: [236/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.103, loss_mask: 0.131, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 236 | |
[ 2023-10-07 21:22:57 ] Completed saving temp checkpoint 1,369.757 ms, 63.21 s total | |
[ 2023-10-07 21:22:57 ] Completed replacing temp checkpoint with checkpoint 66.424 ms, 63.27 s total | |
[ 2023-10-07 21:22:57 ] Completed Epoch: 13 batch 237: moving batch data to device 25.064 ms, 63.30 s total | |
[ 2023-10-07 21:22:57 ] Completed Epoch: 13 batch 237: forward pass 311.921 ms, 63.61 s total | |
[ 2023-10-07 21:22:57 ] Completed Epoch: 13 batch 237: backward pass 41.054 ms, 63.65 s total | |
[ 2023-10-07 21:22:59 ] Completed Epoch: 13 batch 237: computing loss 1,385.378 ms, 65.04 s total | |
EPOCH: [13], BATCH: [237/889], loss: 0.409, loss_box_reg: 0.120, loss_classifier: 0.115, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 237 | |
[ 2023-10-07 21:23:00 ] Completed saving temp checkpoint 1,781.589 ms, 66.82 s total | |
[ 2023-10-07 21:23:00 ] Completed replacing temp checkpoint with checkpoint 59.599 ms, 66.88 s total | |
[ 2023-10-07 21:23:00 ] Completed Epoch: 13 batch 238: moving batch data to device 6.168 ms, 66.89 s total | |
[ 2023-10-07 21:23:01 ] Completed Epoch: 13 batch 238: forward pass 447.173 ms, 67.33 s total | |
[ 2023-10-07 21:23:01 ] Completed Epoch: 13 batch 238: backward pass 67.778 ms, 67.40 s total | |
[ 2023-10-07 21:23:02 ] Completed Epoch: 13 batch 238: computing loss 857.329 ms, 68.26 s total | |
EPOCH: [13], BATCH: [238/889], loss: 0.406, loss_box_reg: 0.118, loss_classifier: 0.106, loss_mask: 0.135, loss_objectness: 0.019, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 238 | |
[ 2023-10-07 21:23:03 ] Completed saving temp checkpoint 1,210.376 ms, 69.47 s total | |
[ 2023-10-07 21:23:03 ] Completed replacing temp checkpoint with checkpoint 72.455 ms, 69.54 s total | |
[ 2023-10-07 21:23:03 ] Completed Epoch: 13 batch 239: moving batch data to device 21.724 ms, 69.56 s total | |
[ 2023-10-07 21:23:03 ] Completed Epoch: 13 batch 239: forward pass 322.209 ms, 69.88 s total | |
[ 2023-10-07 21:23:04 ] Completed Epoch: 13 batch 239: backward pass 67.942 ms, 69.95 s total | |
[ 2023-10-07 21:23:05 ] Completed Epoch: 13 batch 239: computing loss 1,338.141 ms, 71.29 s total | |
EPOCH: [13], BATCH: [239/889], loss: 0.386, loss_box_reg: 0.116, loss_classifier: 0.103, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 239 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 21:42:05 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 21:42:05 ] Completed importing Timer 0.022 ms, 0.00 s total | |
[ 2023-10-07 21:42:05 ] Completed importing everything else 715.647 ms, 0.72 s total | |
[ 2023-10-07 21:42:05 ] Completed defined other functions 0.022 ms, 0.72 s total | |
| distributed init (rank 2): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 3): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 21:42:13 ] Completed main preliminaries 7,812.464 ms, 8.53 s total | |
loading annotations into memory... | |
Done (t=11.64s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 21:42:27 ] Completed loading data 13,569.845 ms, 22.10 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 21:42:27 ] Completed creating data samplers 108.759 ms, 22.21 s total | |
[ 2023-10-07 21:42:27 ] Completed creating data loaders 0.222 ms, 22.21 s total | |
[ 2023-10-07 21:42:28 ] Completed creating model and .to(device) 667.254 ms, 22.87 s total | |
[ 2023-10-07 21:42:30 ] Completed preparing model for distributed training 2,210.429 ms, 25.08 s total | |
[ 2023-10-07 21:42:30 ] Completed optimizer and scaler 0.542 ms, 25.09 s total | |
[ 2023-10-07 21:42:30 ] Completed learning rate schedulers 0.132 ms, 25.09 s total | |
[ 2023-10-07 21:42:31 ] Completed init coco evaluator 977.855 ms, 26.06 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 21:42:32 ] Completed retrieving checkpoint 890.640 ms, 26.95 s total | |
EPOCH :: 13 | |
[ 2023-10-07 21:42:32 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 21:42:32 ] Completed training preliminaries 0.861 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 239 | |
[ 2023-10-07 21:42:32 ] Completed Epoch: 13 batch 239: moving batch data to device 262.362 ms, 0.26 s total | |
[ 2023-10-07 21:42:37 ] Completed Epoch: 13 batch 239: forward pass 5,453.160 ms, 5.72 s total | |
[ 2023-10-07 21:42:38 ] Completed Epoch: 13 batch 239: backward pass 218.153 ms, 5.93 s total | |
[ 2023-10-07 21:42:39 ] Completed Epoch: 13 batch 239: computing loss 1,031.126 ms, 6.97 s total | |
EPOCH: [13], BATCH: [239/889], loss: 0.386, loss_box_reg: 0.116, loss_classifier: 0.102, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 239 | |
[ 2023-10-07 21:42:41 ] Completed saving temp checkpoint 2,624.210 ms, 9.59 s total | |
[ 2023-10-07 21:42:41 ] Completed replacing temp checkpoint with checkpoint 182.548 ms, 9.77 s total | |
[ 2023-10-07 21:42:42 ] Completed Epoch: 13 batch 240: moving batch data to device 61.757 ms, 9.83 s total | |
[ 2023-10-07 21:42:42 ] Completed Epoch: 13 batch 240: forward pass 377.813 ms, 10.21 s total | |
[ 2023-10-07 21:42:42 ] Completed Epoch: 13 batch 240: backward pass 72.774 ms, 10.28 s total | |
[ 2023-10-07 21:42:44 ] Completed Epoch: 13 batch 240: computing loss 1,736.049 ms, 12.02 s total | |
EPOCH: [13], BATCH: [240/889], loss: 0.362, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 240 | |
[ 2023-10-07 21:42:46 ] Completed saving temp checkpoint 2,136.967 ms, 14.16 s total | |
[ 2023-10-07 21:42:46 ] Completed replacing temp checkpoint with checkpoint 98.776 ms, 14.26 s total | |
[ 2023-10-07 21:42:46 ] Completed Epoch: 13 batch 241: moving batch data to device 19.954 ms, 14.28 s total | |
[ 2023-10-07 21:42:46 ] Completed Epoch: 13 batch 241: forward pass 325.517 ms, 14.60 s total | |
[ 2023-10-07 21:42:47 ] Completed Epoch: 13 batch 241: backward pass 365.055 ms, 14.97 s total | |
[ 2023-10-07 21:42:48 ] Completed Epoch: 13 batch 241: computing loss 1,432.836 ms, 16.40 s total | |
EPOCH: [13], BATCH: [241/889], loss: 0.410, loss_box_reg: 0.123, loss_classifier: 0.103, loss_mask: 0.135, loss_objectness: 0.018, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 241 | |
[ 2023-10-07 21:42:50 ] Completed saving temp checkpoint 1,773.419 ms, 18.17 s total | |
[ 2023-10-07 21:42:50 ] Completed replacing temp checkpoint with checkpoint 38.498 ms, 18.21 s total | |
[ 2023-10-07 21:42:50 ] Completed Epoch: 13 batch 242: moving batch data to device 19.921 ms, 18.23 s total | |
[ 2023-10-07 21:42:50 ] Completed Epoch: 13 batch 242: forward pass 378.703 ms, 18.61 s total | |
[ 2023-10-07 21:42:50 ] Completed Epoch: 13 batch 242: backward pass 85.965 ms, 18.70 s total | |
[ 2023-10-07 21:42:52 ] Completed Epoch: 13 batch 242: computing loss 1,625.480 ms, 20.32 s total | |
EPOCH: [13], BATCH: [242/889], loss: 0.378, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 242 | |
[ 2023-10-07 21:42:54 ] Completed saving temp checkpoint 2,089.049 ms, 22.41 s total | |
[ 2023-10-07 21:42:54 ] Completed replacing temp checkpoint with checkpoint 65.335 ms, 22.48 s total | |
[ 2023-10-07 21:42:54 ] Completed Epoch: 13 batch 243: moving batch data to device 20.021 ms, 22.50 s total | |
[ 2023-10-07 21:42:55 ] Completed Epoch: 13 batch 243: forward pass 399.342 ms, 22.90 s total | |
[ 2023-10-07 21:42:55 ] Completed Epoch: 13 batch 243: backward pass 66.536 ms, 22.96 s total | |
[ 2023-10-07 21:42:57 ] Completed Epoch: 13 batch 243: computing loss 1,854.960 ms, 24.82 s total | |
EPOCH: [13], BATCH: [243/889], loss: 0.413, loss_box_reg: 0.129, loss_classifier: 0.110, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 243 | |
[ 2023-10-07 21:42:59 ] Completed saving temp checkpoint 1,999.818 ms, 26.82 s total | |
[ 2023-10-07 21:42:59 ] Completed replacing temp checkpoint with checkpoint 81.612 ms, 26.90 s total | |
[ 2023-10-07 21:42:59 ] Completed Epoch: 13 batch 244: moving batch data to device 24.608 ms, 26.92 s total | |
[ 2023-10-07 21:42:59 ] Completed Epoch: 13 batch 244: forward pass 317.442 ms, 27.24 s total | |
[ 2023-10-07 21:42:59 ] Completed Epoch: 13 batch 244: backward pass 185.203 ms, 27.43 s total | |
[ 2023-10-07 21:43:01 ] Completed Epoch: 13 batch 244: computing loss 1,756.140 ms, 29.18 s total | |
EPOCH: [13], BATCH: [244/889], loss: 0.422, loss_box_reg: 0.131, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.035 | |
Saving checkpoint at epoch 13 train batch 244 | |
[ 2023-10-07 21:43:02 ] Completed saving temp checkpoint 1,431.293 ms, 30.61 s total | |
[ 2023-10-07 21:43:02 ] Completed replacing temp checkpoint with checkpoint 53.942 ms, 30.67 s total | |
[ 2023-10-07 21:43:02 ] Completed Epoch: 13 batch 245: moving batch data to device 19.163 ms, 30.69 s total | |
[ 2023-10-07 21:43:03 ] Completed Epoch: 13 batch 245: forward pass 325.542 ms, 31.01 s total | |
[ 2023-10-07 21:43:03 ] Completed Epoch: 13 batch 245: backward pass 113.673 ms, 31.13 s total | |
[ 2023-10-07 21:43:05 ] Completed Epoch: 13 batch 245: computing loss 2,261.558 ms, 33.39 s total | |
EPOCH: [13], BATCH: [245/889], loss: 0.403, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 245 | |
[ 2023-10-07 21:43:07 ] Completed saving temp checkpoint 1,857.590 ms, 35.24 s total | |
[ 2023-10-07 21:43:07 ] Completed replacing temp checkpoint with checkpoint 62.223 ms, 35.31 s total | |
[ 2023-10-07 21:43:07 ] Completed Epoch: 13 batch 246: moving batch data to device 21.174 ms, 35.33 s total | |
[ 2023-10-07 21:43:07 ] Completed Epoch: 13 batch 246: forward pass 335.024 ms, 35.66 s total | |
[ 2023-10-07 21:43:07 ] Completed Epoch: 13 batch 246: backward pass 75.994 ms, 35.74 s total | |
[ 2023-10-07 21:43:09 ] Completed Epoch: 13 batch 246: computing loss 1,774.734 ms, 37.51 s total | |
EPOCH: [13], BATCH: [246/889], loss: 0.395, loss_box_reg: 0.118, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 246 | |
[ 2023-10-07 21:43:11 ] Completed saving temp checkpoint 1,970.926 ms, 39.48 s total | |
[ 2023-10-07 21:43:11 ] Completed replacing temp checkpoint with checkpoint 57.848 ms, 39.54 s total | |
[ 2023-10-07 21:43:11 ] Completed Epoch: 13 batch 247: moving batch data to device 2.924 ms, 39.55 s total | |
[ 2023-10-07 21:43:12 ] Completed Epoch: 13 batch 247: forward pass 445.254 ms, 39.99 s total | |
[ 2023-10-07 21:43:12 ] Completed Epoch: 13 batch 247: backward pass 65.274 ms, 40.06 s total | |
[ 2023-10-07 21:43:13 ] Completed Epoch: 13 batch 247: computing loss 1,226.314 ms, 41.28 s total | |
EPOCH: [13], BATCH: [247/889], loss: 0.419, loss_box_reg: 0.124, loss_classifier: 0.109, loss_mask: 0.137, loss_objectness: 0.016, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 247 | |
[ 2023-10-07 21:43:14 ] Completed saving temp checkpoint 1,399.100 ms, 42.68 s total | |
[ 2023-10-07 21:43:14 ] Completed replacing temp checkpoint with checkpoint 35.921 ms, 42.72 s total | |
[ 2023-10-07 21:43:14 ] Completed Epoch: 13 batch 248: moving batch data to device 20.987 ms, 42.74 s total | |
[ 2023-10-07 21:43:15 ] Completed Epoch: 13 batch 248: forward pass 327.202 ms, 43.07 s total | |
[ 2023-10-07 21:43:15 ] Completed Epoch: 13 batch 248: backward pass 357.232 ms, 43.42 s total | |
[ 2023-10-07 21:43:17 ] Completed Epoch: 13 batch 248: computing loss 1,445.853 ms, 44.87 s total | |
EPOCH: [13], BATCH: [248/889], loss: 0.378, loss_box_reg: 0.108, loss_classifier: 0.095, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 248 | |
[ 2023-10-07 21:43:19 ] Completed saving temp checkpoint 1,989.372 ms, 46.86 s total | |
[ 2023-10-07 21:43:19 ] Completed replacing temp checkpoint with checkpoint 75.888 ms, 46.93 s total | |
[ 2023-10-07 21:43:19 ] Completed Epoch: 13 batch 249: moving batch data to device 21.782 ms, 46.96 s total | |
[ 2023-10-07 21:43:19 ] Completed Epoch: 13 batch 249: forward pass 310.365 ms, 47.27 s total | |
[ 2023-10-07 21:43:19 ] Completed Epoch: 13 batch 249: backward pass 45.444 ms, 47.31 s total | |
[ 2023-10-07 21:43:21 ] Completed Epoch: 13 batch 249: computing loss 2,213.336 ms, 49.52 s total | |
EPOCH: [13], BATCH: [249/889], loss: 0.384, loss_box_reg: 0.113, loss_classifier: 0.098, loss_mask: 0.134, loss_objectness: 0.013, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 249 | |
[ 2023-10-07 21:43:25 ] Completed saving temp checkpoint 3,907.823 ms, 53.43 s total | |
[ 2023-10-07 21:43:25 ] Completed replacing temp checkpoint with checkpoint 100.264 ms, 53.53 s total | |
[ 2023-10-07 21:43:25 ] Completed Epoch: 13 batch 250: moving batch data to device 6.145 ms, 53.54 s total | |
[ 2023-10-07 21:43:26 ] Completed Epoch: 13 batch 250: forward pass 434.978 ms, 53.97 s total | |
[ 2023-10-07 21:43:26 ] Completed Epoch: 13 batch 250: backward pass 77.360 ms, 54.05 s total | |
[ 2023-10-07 21:43:28 ] Completed Epoch: 13 batch 250: computing loss 1,765.851 ms, 55.82 s total | |
EPOCH: [13], BATCH: [250/889], loss: 0.395, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 250 | |
[ 2023-10-07 21:43:33 ] Completed saving temp checkpoint 5,878.672 ms, 61.70 s total | |
[ 2023-10-07 21:43:33 ] Completed replacing temp checkpoint with checkpoint 68.456 ms, 61.76 s total | |
[ 2023-10-07 21:43:33 ] Completed Epoch: 13 batch 251: moving batch data to device 8.696 ms, 61.77 s total | |
[ 2023-10-07 21:43:34 ] Completed Epoch: 13 batch 251: forward pass 421.775 ms, 62.19 s total | |
[ 2023-10-07 21:43:34 ] Completed Epoch: 13 batch 251: backward pass 75.791 ms, 62.27 s total | |
[ 2023-10-07 21:43:36 ] Completed Epoch: 13 batch 251: computing loss 2,340.480 ms, 64.61 s total | |
EPOCH: [13], BATCH: [251/889], loss: 0.405, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 251 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 22:01:27 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 22:01:27 ] Completed importing Timer 0.023 ms, 0.00 s total | |
[ 2023-10-07 22:01:28 ] Completed importing everything else 551.466 ms, 0.55 s total | |
[ 2023-10-07 22:01:28 ] Completed defined other functions 0.025 ms, 0.55 s total | |
| distributed init (rank 5): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 0): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 22:01:36 ] Completed main preliminaries 8,221.088 ms, 8.77 s total | |
loading annotations into memory... | |
Done (t=11.85s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.29s) | |
creating index... | |
index created! | |
[ 2023-10-07 22:01:50 ] Completed loading data 14,007.493 ms, 22.78 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 22:01:50 ] Completed creating data samplers 128.170 ms, 22.91 s total | |
[ 2023-10-07 22:01:50 ] Completed creating data loaders 0.241 ms, 22.91 s total | |
[ 2023-10-07 22:01:51 ] Completed creating model and .to(device) 679.890 ms, 23.59 s total | |
[ 2023-10-07 22:01:53 ] Completed preparing model for distributed training 1,982.271 ms, 25.57 s total | |
[ 2023-10-07 22:01:53 ] Completed optimizer and scaler 0.611 ms, 25.57 s total | |
[ 2023-10-07 22:01:53 ] Completed learning rate schedulers 0.161 ms, 25.57 s total | |
[ 2023-10-07 22:01:54 ] Completed init coco evaluator 1,045.979 ms, 26.62 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 22:01:55 ] Completed retrieving checkpoint 807.166 ms, 27.42 s total | |
EPOCH :: 13 | |
[ 2023-10-07 22:01:55 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 22:01:55 ] Completed training preliminaries 0.870 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 251 | |
[ 2023-10-07 22:01:55 ] Completed Epoch: 13 batch 251: moving batch data to device 369.538 ms, 0.37 s total | |
[ 2023-10-07 22:02:01 ] Completed Epoch: 13 batch 251: forward pass 5,466.479 ms, 5.84 s total | |
[ 2023-10-07 22:02:01 ] Completed Epoch: 13 batch 251: backward pass 265.855 ms, 6.10 s total | |
[ 2023-10-07 22:02:02 ] Completed Epoch: 13 batch 251: computing loss 992.731 ms, 7.10 s total | |
EPOCH: [13], BATCH: [251/889], loss: 0.406, loss_box_reg: 0.124, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 251 | |
[ 2023-10-07 22:02:04 ] Completed saving temp checkpoint 1,786.217 ms, 8.88 s total | |
[ 2023-10-07 22:02:04 ] Completed replacing temp checkpoint with checkpoint 166.190 ms, 9.05 s total | |
[ 2023-10-07 22:02:04 ] Completed Epoch: 13 batch 252: moving batch data to device 19.190 ms, 9.07 s total | |
[ 2023-10-07 22:02:04 ] Completed Epoch: 13 batch 252: forward pass 321.590 ms, 9.39 s total | |
[ 2023-10-07 22:02:05 ] Completed Epoch: 13 batch 252: backward pass 387.156 ms, 9.78 s total | |
[ 2023-10-07 22:02:06 ] Completed Epoch: 13 batch 252: computing loss 1,541.009 ms, 11.32 s total | |
EPOCH: [13], BATCH: [252/889], loss: 0.392, loss_box_reg: 0.115, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 252 | |
[ 2023-10-07 22:02:08 ] Completed saving temp checkpoint 1,723.546 ms, 13.04 s total | |
[ 2023-10-07 22:02:08 ] Completed replacing temp checkpoint with checkpoint 52.678 ms, 13.09 s total | |
[ 2023-10-07 22:02:08 ] Completed Epoch: 13 batch 253: moving batch data to device 18.802 ms, 13.11 s total | |
[ 2023-10-07 22:02:08 ] Completed Epoch: 13 batch 253: forward pass 317.156 ms, 13.43 s total | |
[ 2023-10-07 22:02:08 ] Completed Epoch: 13 batch 253: backward pass 78.653 ms, 13.51 s total | |
[ 2023-10-07 22:02:10 ] Completed Epoch: 13 batch 253: computing loss 1,946.918 ms, 15.45 s total | |
EPOCH: [13], BATCH: [253/889], loss: 0.402, loss_box_reg: 0.120, loss_classifier: 0.107, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 253 | |
[ 2023-10-07 22:02:12 ] Completed saving temp checkpoint 1,527.802 ms, 16.98 s total | |
[ 2023-10-07 22:02:12 ] Completed replacing temp checkpoint with checkpoint 44.723 ms, 17.03 s total | |
[ 2023-10-07 22:02:12 ] Completed Epoch: 13 batch 254: moving batch data to device 9.931 ms, 17.04 s total | |
[ 2023-10-07 22:02:12 ] Completed Epoch: 13 batch 254: forward pass 380.784 ms, 17.42 s total | |
[ 2023-10-07 22:02:12 ] Completed Epoch: 13 batch 254: backward pass 70.222 ms, 17.49 s total | |
[ 2023-10-07 22:02:13 ] Completed Epoch: 13 batch 254: computing loss 1,239.490 ms, 18.73 s total | |
EPOCH: [13], BATCH: [254/889], loss: 0.406, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.036 | |
Saving checkpoint at epoch 13 train batch 254 | |
[ 2023-10-07 22:02:15 ] Completed saving temp checkpoint 1,230.818 ms, 19.96 s total | |
[ 2023-10-07 22:02:15 ] Completed replacing temp checkpoint with checkpoint 48.286 ms, 20.01 s total | |
[ 2023-10-07 22:02:15 ] Completed Epoch: 13 batch 255: moving batch data to device 27.356 ms, 20.03 s total | |
[ 2023-10-07 22:02:15 ] Completed Epoch: 13 batch 255: forward pass 314.359 ms, 20.35 s total | |
[ 2023-10-07 22:02:15 ] Completed Epoch: 13 batch 255: backward pass 71.528 ms, 20.42 s total | |
[ 2023-10-07 22:02:17 ] Completed Epoch: 13 batch 255: computing loss 1,848.888 ms, 22.27 s total | |
EPOCH: [13], BATCH: [255/889], loss: 0.380, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.131, loss_objectness: 0.018, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 255 | |
[ 2023-10-07 22:02:19 ] Completed saving temp checkpoint 1,828.894 ms, 24.10 s total | |
[ 2023-10-07 22:02:19 ] Completed replacing temp checkpoint with checkpoint 89.585 ms, 24.19 s total | |
[ 2023-10-07 22:02:19 ] Completed Epoch: 13 batch 256: moving batch data to device 19.999 ms, 24.21 s total | |
[ 2023-10-07 22:02:19 ] Completed Epoch: 13 batch 256: forward pass 323.841 ms, 24.53 s total | |
[ 2023-10-07 22:02:19 ] Completed Epoch: 13 batch 256: backward pass 74.606 ms, 24.61 s total | |
[ 2023-10-07 22:02:21 ] Completed Epoch: 13 batch 256: computing loss 1,540.043 ms, 26.15 s total | |
EPOCH: [13], BATCH: [256/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.095, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 256 | |
[ 2023-10-07 22:02:23 ] Completed saving temp checkpoint 1,861.753 ms, 28.01 s total | |
[ 2023-10-07 22:02:23 ] Completed replacing temp checkpoint with checkpoint 63.815 ms, 28.07 s total | |
[ 2023-10-07 22:02:23 ] Completed Epoch: 13 batch 257: moving batch data to device 3.793 ms, 28.08 s total | |
[ 2023-10-07 22:02:23 ] Completed Epoch: 13 batch 257: forward pass 436.565 ms, 28.51 s total | |
[ 2023-10-07 22:02:23 ] Completed Epoch: 13 batch 257: backward pass 68.704 ms, 28.58 s total | |
[ 2023-10-07 22:02:25 ] Completed Epoch: 13 batch 257: computing loss 1,472.676 ms, 30.05 s total | |
EPOCH: [13], BATCH: [257/889], loss: 0.399, loss_box_reg: 0.114, loss_classifier: 0.098, loss_mask: 0.141, loss_objectness: 0.019, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 257 | |
[ 2023-10-07 22:02:26 ] Completed saving temp checkpoint 1,298.303 ms, 31.35 s total | |
[ 2023-10-07 22:02:26 ] Completed replacing temp checkpoint with checkpoint 57.357 ms, 31.41 s total | |
[ 2023-10-07 22:02:26 ] Completed Epoch: 13 batch 258: moving batch data to device 23.130 ms, 31.43 s total | |
[ 2023-10-07 22:02:26 ] Completed Epoch: 13 batch 258: forward pass 316.621 ms, 31.75 s total | |
[ 2023-10-07 22:02:27 ] Completed Epoch: 13 batch 258: backward pass 109.746 ms, 31.86 s total | |
[ 2023-10-07 22:02:28 ] Completed Epoch: 13 batch 258: computing loss 1,677.634 ms, 33.54 s total | |
EPOCH: [13], BATCH: [258/889], loss: 0.377, loss_box_reg: 0.111, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 258 | |
[ 2023-10-07 22:02:30 ] Completed saving temp checkpoint 1,626.398 ms, 35.16 s total | |
[ 2023-10-07 22:02:30 ] Completed replacing temp checkpoint with checkpoint 62.180 ms, 35.22 s total | |
[ 2023-10-07 22:02:30 ] Completed Epoch: 13 batch 259: moving batch data to device 22.629 ms, 35.25 s total | |
[ 2023-10-07 22:02:30 ] Completed Epoch: 13 batch 259: forward pass 307.273 ms, 35.55 s total | |
[ 2023-10-07 22:02:30 ] Completed Epoch: 13 batch 259: backward pass 68.927 ms, 35.62 s total | |
[ 2023-10-07 22:02:32 ] Completed Epoch: 13 batch 259: computing loss 1,799.651 ms, 37.42 s total | |
EPOCH: [13], BATCH: [259/889], loss: 0.402, loss_box_reg: 0.118, loss_classifier: 0.106, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 259 | |
[ 2023-10-07 22:02:34 ] Completed saving temp checkpoint 2,124.627 ms, 39.55 s total | |
[ 2023-10-07 22:02:34 ] Completed replacing temp checkpoint with checkpoint 81.331 ms, 39.63 s total | |
[ 2023-10-07 22:02:34 ] Completed Epoch: 13 batch 260: moving batch data to device 8.684 ms, 39.64 s total | |
[ 2023-10-07 22:02:35 ] Completed Epoch: 13 batch 260: forward pass 376.512 ms, 40.01 s total | |
[ 2023-10-07 22:02:35 ] Completed Epoch: 13 batch 260: backward pass 69.445 ms, 40.08 s total | |
[ 2023-10-07 22:02:36 ] Completed Epoch: 13 batch 260: computing loss 1,206.590 ms, 41.29 s total | |
EPOCH: [13], BATCH: [260/889], loss: 0.405, loss_box_reg: 0.124, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 260 | |
[ 2023-10-07 22:02:37 ] Completed saving temp checkpoint 1,188.984 ms, 42.48 s total | |
[ 2023-10-07 22:02:37 ] Completed replacing temp checkpoint with checkpoint 35.389 ms, 42.51 s total | |
[ 2023-10-07 22:02:37 ] Completed Epoch: 13 batch 261: moving batch data to device 20.541 ms, 42.53 s total | |
[ 2023-10-07 22:02:38 ] Completed Epoch: 13 batch 261: forward pass 310.483 ms, 42.85 s total | |
[ 2023-10-07 22:02:38 ] Completed Epoch: 13 batch 261: backward pass 79.204 ms, 42.92 s total | |
[ 2023-10-07 22:02:39 ] Completed Epoch: 13 batch 261: computing loss 1,483.295 ms, 44.41 s total | |
EPOCH: [13], BATCH: [261/889], loss: 0.383, loss_box_reg: 0.116, loss_classifier: 0.094, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 261 | |
[ 2023-10-07 22:02:41 ] Completed saving temp checkpoint 2,135.781 ms, 46.54 s total | |
[ 2023-10-07 22:02:41 ] Completed replacing temp checkpoint with checkpoint 54.366 ms, 46.60 s total | |
[ 2023-10-07 22:02:41 ] Completed Epoch: 13 batch 262: moving batch data to device 21.314 ms, 46.62 s total | |
[ 2023-10-07 22:02:42 ] Completed Epoch: 13 batch 262: forward pass 400.913 ms, 47.02 s total | |
[ 2023-10-07 22:02:42 ] Completed Epoch: 13 batch 262: backward pass 56.845 ms, 47.08 s total | |
[ 2023-10-07 22:02:43 ] Completed Epoch: 13 batch 262: computing loss 1,405.219 ms, 48.48 s total | |
EPOCH: [13], BATCH: [262/889], loss: 0.387, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 262 | |
[ 2023-10-07 22:02:45 ] Completed saving temp checkpoint 2,000.498 ms, 50.48 s total | |
[ 2023-10-07 22:02:45 ] Completed replacing temp checkpoint with checkpoint 44.713 ms, 50.53 s total | |
[ 2023-10-07 22:02:45 ] Completed Epoch: 13 batch 263: moving batch data to device 22.140 ms, 50.55 s total | |
[ 2023-10-07 22:02:46 ] Completed Epoch: 13 batch 263: forward pass 298.351 ms, 50.85 s total | |
[ 2023-10-07 22:02:46 ] Completed Epoch: 13 batch 263: backward pass 74.333 ms, 50.92 s total | |
[ 2023-10-07 22:02:47 ] Completed Epoch: 13 batch 263: computing loss 1,807.585 ms, 52.73 s total | |
EPOCH: [13], BATCH: [263/889], loss: 0.390, loss_box_reg: 0.118, loss_classifier: 0.102, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 263 | |
[ 2023-10-07 22:02:49 ] Completed saving temp checkpoint 1,776.992 ms, 54.51 s total | |
[ 2023-10-07 22:02:49 ] Completed replacing temp checkpoint with checkpoint 66.338 ms, 54.57 s total | |
[ 2023-10-07 22:02:49 ] Completed Epoch: 13 batch 264: moving batch data to device 25.119 ms, 54.60 s total | |
[ 2023-10-07 22:02:50 ] Completed Epoch: 13 batch 264: forward pass 401.078 ms, 55.00 s total | |
[ 2023-10-07 22:02:50 ] Completed Epoch: 13 batch 264: backward pass 44.110 ms, 55.04 s total | |
[ 2023-10-07 22:02:51 ] Completed Epoch: 13 batch 264: computing loss 1,322.211 ms, 56.37 s total | |
EPOCH: [13], BATCH: [264/889], loss: 0.363, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 264 | |
[ 2023-10-07 22:02:53 ] Completed saving temp checkpoint 1,936.250 ms, 58.30 s total | |
[ 2023-10-07 22:02:53 ] Completed replacing temp checkpoint with checkpoint 79.650 ms, 58.38 s total | |
[ 2023-10-07 22:02:53 ] Completed Epoch: 13 batch 265: moving batch data to device 23.717 ms, 58.41 s total | |
[ 2023-10-07 22:02:54 ] Completed Epoch: 13 batch 265: forward pass 396.701 ms, 58.80 s total | |
[ 2023-10-07 22:02:54 ] Completed Epoch: 13 batch 265: backward pass 104.881 ms, 58.91 s total | |
[ 2023-10-07 22:02:55 ] Completed Epoch: 13 batch 265: computing loss 1,531.652 ms, 60.44 s total | |
EPOCH: [13], BATCH: [265/889], loss: 0.380, loss_box_reg: 0.118, loss_classifier: 0.099, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 265 | |
[ 2023-10-07 22:02:57 ] Completed saving temp checkpoint 1,563.136 ms, 62.00 s total | |
[ 2023-10-07 22:02:57 ] Completed replacing temp checkpoint with checkpoint 41.972 ms, 62.04 s total | |
[ 2023-10-07 22:02:57 ] Completed Epoch: 13 batch 266: moving batch data to device 21.430 ms, 62.07 s total | |
[ 2023-10-07 22:02:57 ] Completed Epoch: 13 batch 266: forward pass 307.133 ms, 62.37 s total | |
[ 2023-10-07 22:02:57 ] Completed Epoch: 13 batch 266: backward pass 34.837 ms, 62.41 s total | |
[ 2023-10-07 22:02:59 ] Completed Epoch: 13 batch 266: computing loss 2,026.554 ms, 64.43 s total | |
EPOCH: [13], BATCH: [266/889], loss: 0.375, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 266 | |
[ 2023-10-07 22:03:01 ] Completed saving temp checkpoint 1,754.032 ms, 66.19 s total | |
[ 2023-10-07 22:03:01 ] Completed replacing temp checkpoint with checkpoint 40.575 ms, 66.23 s total | |
[ 2023-10-07 22:03:01 ] Completed Epoch: 13 batch 267: moving batch data to device 23.131 ms, 66.25 s total | |
[ 2023-10-07 22:03:01 ] Completed Epoch: 13 batch 267: forward pass 420.030 ms, 66.67 s total | |
[ 2023-10-07 22:03:01 ] Completed Epoch: 13 batch 267: backward pass 89.221 ms, 66.76 s total | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 22:19:24 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 22:19:24 ] Completed importing Timer 0.026 ms, 0.00 s total | |
[ 2023-10-07 22:19:25 ] Completed importing everything else 466.542 ms, 0.47 s total | |
[ 2023-10-07 22:19:25 ] Completed defined other functions 0.024 ms, 0.47 s total | |
| distributed init (rank 2): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 3): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 22:19:31 ] Completed main preliminaries 5,583.766 ms, 6.05 s total | |
loading annotations into memory... | |
Done (t=12.21s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 22:19:45 ] Completed loading data 14,122.596 ms, 20.17 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 22:19:45 ] Completed creating data samplers 101.259 ms, 20.27 s total | |
[ 2023-10-07 22:19:45 ] Completed creating data loaders 0.205 ms, 20.27 s total | |
[ 2023-10-07 22:19:46 ] Completed creating model and .to(device) 1,652.175 ms, 21.93 s total | |
[ 2023-10-07 22:19:54 ] Completed preparing model for distributed training 7,941.974 ms, 29.87 s total | |
[ 2023-10-07 22:19:54 ] Completed optimizer and scaler 0.584 ms, 29.87 s total | |
[ 2023-10-07 22:19:54 ] Completed learning rate schedulers 0.222 ms, 29.87 s total | |
[ 2023-10-07 22:19:55 ] Completed init coco evaluator 1,041.918 ms, 30.91 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 22:19:56 ] Completed retrieving checkpoint 775.672 ms, 31.69 s total | |
EPOCH :: 13 | |
[ 2023-10-07 22:19:56 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 22:19:56 ] Completed training preliminaries 0.882 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 267 | |
[ 2023-10-07 22:19:56 ] Completed Epoch: 13 batch 267: moving batch data to device 205.722 ms, 0.21 s total | |
[ 2023-10-07 22:20:08 ] Completed Epoch: 13 batch 267: forward pass 11,558.690 ms, 11.77 s total | |
[ 2023-10-07 22:20:08 ] Completed Epoch: 13 batch 267: backward pass 152.627 ms, 11.92 s total | |
[ 2023-10-07 22:20:10 ] Completed Epoch: 13 batch 267: computing loss 1,967.682 ms, 13.89 s total | |
EPOCH: [13], BATCH: [267/889], loss: 0.369, loss_box_reg: 0.111, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 267 | |
[ 2023-10-07 22:20:11 ] Completed saving temp checkpoint 970.952 ms, 14.86 s total | |
[ 2023-10-07 22:20:11 ] Completed replacing temp checkpoint with checkpoint 169.831 ms, 15.03 s total | |
[ 2023-10-07 22:20:11 ] Completed Epoch: 13 batch 268: moving batch data to device 78.087 ms, 15.10 s total | |
[ 2023-10-07 22:20:12 ] Completed Epoch: 13 batch 268: forward pass 737.190 ms, 15.84 s total | |
[ 2023-10-07 22:20:13 ] Completed Epoch: 13 batch 268: backward pass 632.964 ms, 16.47 s total | |
[ 2023-10-07 22:20:15 ] Completed Epoch: 13 batch 268: computing loss 2,812.743 ms, 19.29 s total | |
EPOCH: [13], BATCH: [268/889], loss: 0.385, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 268 | |
[ 2023-10-07 22:20:16 ] Completed saving temp checkpoint 1,041.909 ms, 20.33 s total | |
[ 2023-10-07 22:20:17 ] Completed replacing temp checkpoint with checkpoint 75.034 ms, 20.40 s total | |
[ 2023-10-07 22:20:17 ] Completed Epoch: 13 batch 269: moving batch data to device 19.674 ms, 20.42 s total | |
[ 2023-10-07 22:20:17 ] Completed Epoch: 13 batch 269: forward pass 747.699 ms, 21.17 s total | |
[ 2023-10-07 22:20:18 ] Completed Epoch: 13 batch 269: backward pass 1,111.140 ms, 22.28 s total | |
[ 2023-10-07 22:20:21 ] Completed Epoch: 13 batch 269: computing loss 2,381.261 ms, 24.66 s total | |
EPOCH: [13], BATCH: [269/889], loss: 0.387, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 269 | |
[ 2023-10-07 22:20:22 ] Completed saving temp checkpoint 1,040.453 ms, 25.70 s total | |
[ 2023-10-07 22:20:22 ] Completed replacing temp checkpoint with checkpoint 60.887 ms, 25.77 s total | |
[ 2023-10-07 22:20:22 ] Completed Epoch: 13 batch 270: moving batch data to device 20.226 ms, 25.79 s total | |
[ 2023-10-07 22:20:23 ] Completed Epoch: 13 batch 270: forward pass 783.929 ms, 26.57 s total | |
[ 2023-10-07 22:20:23 ] Completed Epoch: 13 batch 270: backward pass 79.342 ms, 26.65 s total | |
[ 2023-10-07 22:20:26 ] Completed Epoch: 13 batch 270: computing loss 3,394.605 ms, 30.04 s total | |
EPOCH: [13], BATCH: [270/889], loss: 0.379, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 270 | |
[ 2023-10-07 22:20:27 ] Completed saving temp checkpoint 1,160.817 ms, 31.20 s total | |
[ 2023-10-07 22:20:27 ] Completed replacing temp checkpoint with checkpoint 61.697 ms, 31.27 s total | |
[ 2023-10-07 22:20:27 ] Completed Epoch: 13 batch 271: moving batch data to device 20.911 ms, 31.29 s total | |
[ 2023-10-07 22:20:28 ] Completed Epoch: 13 batch 271: forward pass 736.234 ms, 32.02 s total | |
[ 2023-10-07 22:20:28 ] Completed Epoch: 13 batch 271: backward pass 104.959 ms, 32.13 s total | |
[ 2023-10-07 22:20:32 ] Completed Epoch: 13 batch 271: computing loss 3,363.156 ms, 35.49 s total | |
EPOCH: [13], BATCH: [271/889], loss: 0.361, loss_box_reg: 0.108, loss_classifier: 0.089, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 271 | |
[ 2023-10-07 22:20:33 ] Completed saving temp checkpoint 1,760.888 ms, 37.25 s total | |
[ 2023-10-07 22:20:33 ] Completed replacing temp checkpoint with checkpoint 91.717 ms, 37.34 s total | |
[ 2023-10-07 22:20:34 ] Completed Epoch: 13 batch 272: moving batch data to device 26.558 ms, 37.37 s total | |
[ 2023-10-07 22:20:34 ] Completed Epoch: 13 batch 272: forward pass 724.626 ms, 38.10 s total | |
[ 2023-10-07 22:20:35 ] Completed Epoch: 13 batch 272: backward pass 545.061 ms, 38.64 s total | |
[ 2023-10-07 22:20:38 ] Completed Epoch: 13 batch 272: computing loss 2,895.725 ms, 41.54 s total | |
EPOCH: [13], BATCH: [272/889], loss: 0.372, loss_box_reg: 0.112, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 272 | |
[ 2023-10-07 22:20:39 ] Completed saving temp checkpoint 995.225 ms, 42.53 s total | |
[ 2023-10-07 22:20:39 ] Completed replacing temp checkpoint with checkpoint 51.020 ms, 42.58 s total | |
[ 2023-10-07 22:20:39 ] Completed Epoch: 13 batch 273: moving batch data to device 18.453 ms, 42.60 s total | |
[ 2023-10-07 22:20:39 ] Completed Epoch: 13 batch 273: forward pass 737.988 ms, 43.34 s total | |
[ 2023-10-07 22:20:40 ] Completed Epoch: 13 batch 273: backward pass 611.119 ms, 43.95 s total | |
[ 2023-10-07 22:20:43 ] Completed Epoch: 13 batch 273: computing loss 2,845.232 ms, 46.79 s total | |
EPOCH: [13], BATCH: [273/889], loss: 0.370, loss_box_reg: 0.107, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 273 | |
[ 2023-10-07 22:20:44 ] Completed saving temp checkpoint 1,214.512 ms, 48.01 s total | |
[ 2023-10-07 22:20:44 ] Completed replacing temp checkpoint with checkpoint 64.791 ms, 48.07 s total | |
[ 2023-10-07 22:20:44 ] Completed Epoch: 13 batch 274: moving batch data to device 24.920 ms, 48.10 s total | |
[ 2023-10-07 22:20:45 ] Completed Epoch: 13 batch 274: forward pass 740.762 ms, 48.84 s total | |
[ 2023-10-07 22:20:45 ] Completed Epoch: 13 batch 274: backward pass 65.252 ms, 48.91 s total | |
[ 2023-10-07 22:20:48 ] Completed Epoch: 13 batch 274: computing loss 3,373.270 ms, 52.28 s total | |
EPOCH: [13], BATCH: [274/889], loss: 0.401, loss_box_reg: 0.121, loss_classifier: 0.099, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 274 | |
[ 2023-10-07 22:20:49 ] Completed saving temp checkpoint 1,038.440 ms, 53.32 s total | |
[ 2023-10-07 22:20:50 ] Completed replacing temp checkpoint with checkpoint 72.382 ms, 53.39 s total | |
[ 2023-10-07 22:20:50 ] Completed Epoch: 13 batch 275: moving batch data to device 22.142 ms, 53.41 s total | |
[ 2023-10-07 22:20:50 ] Completed Epoch: 13 batch 275: forward pass 743.514 ms, 54.15 s total | |
[ 2023-10-07 22:20:50 ] Completed Epoch: 13 batch 275: backward pass 61.960 ms, 54.22 s total | |
[ 2023-10-07 22:20:54 ] Completed Epoch: 13 batch 275: computing loss 3,437.587 ms, 57.65 s total | |
EPOCH: [13], BATCH: [275/889], loss: 0.366, loss_box_reg: 0.109, loss_classifier: 0.089, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 275 | |
[ 2023-10-07 22:20:55 ] Completed saving temp checkpoint 970.030 ms, 58.62 s total | |
[ 2023-10-07 22:20:55 ] Completed replacing temp checkpoint with checkpoint 49.988 ms, 58.67 s total | |
[ 2023-10-07 22:20:55 ] Completed Epoch: 13 batch 276: moving batch data to device 22.796 ms, 58.70 s total | |
[ 2023-10-07 22:20:56 ] Completed Epoch: 13 batch 276: forward pass 732.531 ms, 59.43 s total | |
[ 2023-10-07 22:20:56 ] Completed Epoch: 13 batch 276: backward pass 72.573 ms, 59.50 s total | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 22:34:12 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 22:34:12 ] Completed importing Timer 0.022 ms, 0.00 s total | |
[ 2023-10-07 22:34:12 ] Completed importing everything else 460.062 ms, 0.46 s total | |
[ 2023-10-07 22:34:12 ] Completed defined other functions 0.026 ms, 0.46 s total | |
| distributed init (rank 5): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 4): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 22:34:22 ] Completed main preliminaries 10,129.197 ms, 10.59 s total | |
loading annotations into memory... | |
Done (t=12.35s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 22:34:37 ] Completed loading data 14,289.510 ms, 24.88 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 22:34:37 ] Completed creating data samplers 102.005 ms, 24.98 s total | |
[ 2023-10-07 22:34:37 ] Completed creating data loaders 0.199 ms, 24.98 s total | |
[ 2023-10-07 22:34:38 ] Completed creating model and .to(device) 920.146 ms, 25.90 s total | |
[ 2023-10-07 22:34:45 ] Completed preparing model for distributed training 7,700.027 ms, 33.60 s total | |
[ 2023-10-07 22:34:45 ] Completed optimizer and scaler 0.566 ms, 33.60 s total | |
[ 2023-10-07 22:34:45 ] Completed learning rate schedulers 0.217 ms, 33.60 s total | |
[ 2023-10-07 22:34:46 ] Completed init coco evaluator 963.893 ms, 34.57 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 22:34:47 ] Completed retrieving checkpoint 992.096 ms, 35.56 s total | |
EPOCH :: 13 | |
[ 2023-10-07 22:34:47 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 22:34:47 ] Completed training preliminaries 0.844 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 276 | |
[ 2023-10-07 22:34:48 ] Completed Epoch: 13 batch 276: moving batch data to device 253.728 ms, 0.25 s total | |
[ 2023-10-07 22:34:59 ] Completed Epoch: 13 batch 276: forward pass 11,418.600 ms, 11.67 s total | |
[ 2023-10-07 22:34:59 ] Completed Epoch: 13 batch 276: backward pass 194.640 ms, 11.87 s total | |
[ 2023-10-07 22:35:01 ] Completed Epoch: 13 batch 276: computing loss 1,958.372 ms, 13.83 s total | |
EPOCH: [13], BATCH: [276/889], loss: 0.401, loss_box_reg: 0.122, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 276 | |
[ 2023-10-07 22:35:02 ] Completed saving temp checkpoint 1,081.217 ms, 14.91 s total | |
[ 2023-10-07 22:35:03 ] Completed replacing temp checkpoint with checkpoint 192.169 ms, 15.10 s total | |
[ 2023-10-07 22:35:03 ] Completed Epoch: 13 batch 277: moving batch data to device 20.577 ms, 15.12 s total | |
[ 2023-10-07 22:35:03 ] Completed Epoch: 13 batch 277: forward pass 739.676 ms, 15.86 s total | |
[ 2023-10-07 22:35:03 ] Completed Epoch: 13 batch 277: backward pass 70.292 ms, 15.93 s total | |
[ 2023-10-07 22:35:07 ] Completed Epoch: 13 batch 277: computing loss 3,430.871 ms, 19.36 s total | |
EPOCH: [13], BATCH: [277/889], loss: 0.386, loss_box_reg: 0.116, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 277 | |
[ 2023-10-07 22:35:08 ] Completed saving temp checkpoint 1,111.717 ms, 20.47 s total | |
[ 2023-10-07 22:35:08 ] Completed replacing temp checkpoint with checkpoint 52.866 ms, 20.53 s total | |
[ 2023-10-07 22:35:08 ] Completed Epoch: 13 batch 278: moving batch data to device 20.167 ms, 20.55 s total | |
[ 2023-10-07 22:35:09 ] Completed Epoch: 13 batch 278: forward pass 754.879 ms, 21.30 s total | |
[ 2023-10-07 22:35:09 ] Completed Epoch: 13 batch 278: backward pass 260.566 ms, 21.56 s total | |
[ 2023-10-07 22:35:12 ] Completed Epoch: 13 batch 278: computing loss 3,320.664 ms, 24.88 s total | |
EPOCH: [13], BATCH: [278/889], loss: 0.376, loss_box_reg: 0.114, loss_classifier: 0.090, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 278 | |
[ 2023-10-07 22:35:14 ] Completed saving temp checkpoint 1,234.766 ms, 26.12 s total | |
[ 2023-10-07 22:35:14 ] Completed replacing temp checkpoint with checkpoint 69.776 ms, 26.19 s total | |
[ 2023-10-07 22:35:14 ] Completed Epoch: 13 batch 279: moving batch data to device 18.645 ms, 26.21 s total | |
[ 2023-10-07 22:35:14 ] Completed Epoch: 13 batch 279: forward pass 756.969 ms, 26.96 s total | |
[ 2023-10-07 22:35:15 ] Completed Epoch: 13 batch 279: backward pass 572.414 ms, 27.53 s total | |
[ 2023-10-07 22:35:18 ] Completed Epoch: 13 batch 279: computing loss 2,913.303 ms, 30.45 s total | |
EPOCH: [13], BATCH: [279/889], loss: 0.385, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 279 | |
[ 2023-10-07 22:35:19 ] Completed saving temp checkpoint 1,052.600 ms, 31.50 s total | |
[ 2023-10-07 22:35:19 ] Completed replacing temp checkpoint with checkpoint 64.452 ms, 31.56 s total | |
[ 2023-10-07 22:35:19 ] Completed Epoch: 13 batch 280: moving batch data to device 27.875 ms, 31.59 s total | |
[ 2023-10-07 22:35:20 ] Completed Epoch: 13 batch 280: forward pass 729.686 ms, 32.32 s total | |
[ 2023-10-07 22:35:20 ] Completed Epoch: 13 batch 280: backward pass 96.007 ms, 32.42 s total | |
[ 2023-10-07 22:35:23 ] Completed Epoch: 13 batch 280: computing loss 3,351.986 ms, 35.77 s total | |
EPOCH: [13], BATCH: [280/889], loss: 0.358, loss_box_reg: 0.102, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 280 | |
[ 2023-10-07 22:35:24 ] Completed saving temp checkpoint 1,048.768 ms, 36.82 s total | |
[ 2023-10-07 22:35:24 ] Completed replacing temp checkpoint with checkpoint 71.099 ms, 36.89 s total | |
[ 2023-10-07 22:35:24 ] Completed Epoch: 13 batch 281: moving batch data to device 19.592 ms, 36.91 s total | |
[ 2023-10-07 22:35:25 ] Completed Epoch: 13 batch 281: forward pass 730.608 ms, 37.64 s total | |
[ 2023-10-07 22:35:25 ] Completed Epoch: 13 batch 281: backward pass 132.007 ms, 37.77 s total | |
[ 2023-10-07 22:35:28 ] Completed Epoch: 13 batch 281: computing loss 3,182.632 ms, 40.96 s total | |
EPOCH: [13], BATCH: [281/889], loss: 0.393, loss_box_reg: 0.120, loss_classifier: 0.104, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 281 | |
[ 2023-10-07 22:35:29 ] Completed saving temp checkpoint 1,117.755 ms, 42.07 s total | |
[ 2023-10-07 22:35:30 ] Completed replacing temp checkpoint with checkpoint 70.138 ms, 42.14 s total | |
[ 2023-10-07 22:35:30 ] Completed Epoch: 13 batch 282: moving batch data to device 22.294 ms, 42.17 s total | |
[ 2023-10-07 22:35:30 ] Completed Epoch: 13 batch 282: forward pass 753.400 ms, 42.92 s total | |
[ 2023-10-07 22:35:30 ] Completed Epoch: 13 batch 282: backward pass 71.580 ms, 42.99 s total | |
[ 2023-10-07 22:35:34 ] Completed Epoch: 13 batch 282: computing loss 3,320.048 ms, 46.31 s total | |
EPOCH: [13], BATCH: [282/889], loss: 0.378, loss_box_reg: 0.119, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 282 | |
[ 2023-10-07 22:35:35 ] Completed saving temp checkpoint 882.555 ms, 47.19 s total | |
[ 2023-10-07 22:35:35 ] Completed replacing temp checkpoint with checkpoint 43.350 ms, 47.24 s total | |
[ 2023-10-07 22:35:35 ] Completed Epoch: 13 batch 283: moving batch data to device 19.797 ms, 47.26 s total | |
[ 2023-10-07 22:35:35 ] Completed Epoch: 13 batch 283: forward pass 749.036 ms, 48.00 s total | |
[ 2023-10-07 22:35:35 ] Completed Epoch: 13 batch 283: backward pass 84.646 ms, 48.09 s total | |
[ 2023-10-07 22:35:39 ] Completed Epoch: 13 batch 283: computing loss 3,356.819 ms, 51.45 s total | |
EPOCH: [13], BATCH: [283/889], loss: 0.396, loss_box_reg: 0.118, loss_classifier: 0.103, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 283 | |
[ 2023-10-07 22:35:41 ] Completed saving temp checkpoint 1,648.975 ms, 53.10 s total | |
[ 2023-10-07 22:35:41 ] Completed replacing temp checkpoint with checkpoint 102.755 ms, 53.20 s total | |
[ 2023-10-07 22:35:41 ] Completed Epoch: 13 batch 284: moving batch data to device 24.429 ms, 53.22 s total | |
[ 2023-10-07 22:35:41 ] Completed Epoch: 13 batch 284: forward pass 754.868 ms, 53.98 s total | |
[ 2023-10-07 22:35:41 ] Completed Epoch: 13 batch 284: backward pass 74.388 ms, 54.05 s total | |
[ 2023-10-07 22:35:45 ] Completed Epoch: 13 batch 284: computing loss 3,281.365 ms, 57.33 s total | |
EPOCH: [13], BATCH: [284/889], loss: 0.372, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.124, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 284 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 22:48:58 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 22:48:58 ] Completed importing Timer 0.021 ms, 0.00 s total | |
[ 2023-10-07 22:48:59 ] Completed importing everything else 710.724 ms, 0.71 s total | |
[ 2023-10-07 22:48:59 ] Completed defined other functions 0.028 ms, 0.71 s total | |
| distributed init (rank 5): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 2): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 22:49:02 ] Completed main preliminaries 3,165.165 ms, 3.88 s total | |
loading annotations into memory... | |
Done (t=11.57s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 22:49:16 ] Completed loading data 13,583.008 ms, 17.46 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 22:49:16 ] Completed creating data samplers 130.644 ms, 17.59 s total | |
[ 2023-10-07 22:49:16 ] Completed creating data loaders 0.251 ms, 17.59 s total | |
[ 2023-10-07 22:49:16 ] Completed creating model and .to(device) 679.967 ms, 18.27 s total | |
[ 2023-10-07 22:49:19 ] Completed preparing model for distributed training 2,499.317 ms, 20.77 s total | |
[ 2023-10-07 22:49:19 ] Completed optimizer and scaler 0.532 ms, 20.77 s total | |
[ 2023-10-07 22:49:19 ] Completed learning rate schedulers 0.120 ms, 20.77 s total | |
[ 2023-10-07 22:49:20 ] Completed init coco evaluator 973.400 ms, 21.74 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 22:49:21 ] Completed retrieving checkpoint 1,052.511 ms, 22.80 s total | |
EPOCH :: 13 | |
[ 2023-10-07 22:49:21 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 22:49:21 ] Completed training preliminaries 0.838 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 284 | |
[ 2023-10-07 22:49:21 ] Completed Epoch: 13 batch 284: moving batch data to device 243.604 ms, 0.24 s total | |
[ 2023-10-07 22:49:27 ] Completed Epoch: 13 batch 284: forward pass 5,332.136 ms, 5.58 s total | |
[ 2023-10-07 22:49:27 ] Completed Epoch: 13 batch 284: backward pass 165.398 ms, 5.74 s total | |
[ 2023-10-07 22:49:28 ] Completed Epoch: 13 batch 284: computing loss 1,159.899 ms, 6.90 s total | |
EPOCH: [13], BATCH: [284/889], loss: 0.373, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 284 | |
[ 2023-10-07 22:49:29 ] Completed saving temp checkpoint 1,355.206 ms, 8.26 s total | |
[ 2023-10-07 22:49:29 ] Completed replacing temp checkpoint with checkpoint 147.577 ms, 8.40 s total | |
[ 2023-10-07 22:49:29 ] Completed Epoch: 13 batch 285: moving batch data to device 23.661 ms, 8.43 s total | |
[ 2023-10-07 22:49:30 ] Completed Epoch: 13 batch 285: forward pass 321.395 ms, 8.75 s total | |
[ 2023-10-07 22:49:30 ] Completed Epoch: 13 batch 285: backward pass 73.272 ms, 8.82 s total | |
[ 2023-10-07 22:49:31 ] Completed Epoch: 13 batch 285: computing loss 1,535.616 ms, 10.36 s total | |
EPOCH: [13], BATCH: [285/889], loss: 0.399, loss_box_reg: 0.126, loss_classifier: 0.102, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 285 | |
[ 2023-10-07 22:49:33 ] Completed saving temp checkpoint 1,456.228 ms, 11.81 s total | |
[ 2023-10-07 22:49:33 ] Completed replacing temp checkpoint with checkpoint 51.519 ms, 11.87 s total | |
[ 2023-10-07 22:49:33 ] Completed Epoch: 13 batch 286: moving batch data to device 21.001 ms, 11.89 s total | |
[ 2023-10-07 22:49:33 ] Completed Epoch: 13 batch 286: forward pass 311.570 ms, 12.20 s total | |
[ 2023-10-07 22:49:33 ] Completed Epoch: 13 batch 286: backward pass 143.135 ms, 12.34 s total | |
[ 2023-10-07 22:49:35 ] Completed Epoch: 13 batch 286: computing loss 1,815.399 ms, 14.16 s total | |
EPOCH: [13], BATCH: [286/889], loss: 0.396, loss_box_reg: 0.119, loss_classifier: 0.102, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 286 | |
[ 2023-10-07 22:49:37 ] Completed saving temp checkpoint 1,744.326 ms, 15.90 s total | |
[ 2023-10-07 22:49:37 ] Completed replacing temp checkpoint with checkpoint 58.950 ms, 15.96 s total | |
[ 2023-10-07 22:49:37 ] Completed Epoch: 13 batch 287: moving batch data to device 22.263 ms, 15.98 s total | |
[ 2023-10-07 22:49:37 ] Completed Epoch: 13 batch 287: forward pass 332.324 ms, 16.32 s total | |
[ 2023-10-07 22:49:38 ] Completed Epoch: 13 batch 287: backward pass 392.631 ms, 16.71 s total | |
[ 2023-10-07 22:49:39 ] Completed Epoch: 13 batch 287: computing loss 1,137.273 ms, 17.85 s total | |
EPOCH: [13], BATCH: [287/889], loss: 0.388, loss_box_reg: 0.116, loss_classifier: 0.100, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 287 | |
[ 2023-10-07 22:49:40 ] Completed saving temp checkpoint 1,266.818 ms, 19.11 s total | |
[ 2023-10-07 22:49:40 ] Completed replacing temp checkpoint with checkpoint 50.186 ms, 19.16 s total | |
[ 2023-10-07 22:49:40 ] Completed Epoch: 13 batch 288: moving batch data to device 73.031 ms, 19.24 s total | |
[ 2023-10-07 22:49:41 ] Completed Epoch: 13 batch 288: forward pass 331.716 ms, 19.57 s total | |
[ 2023-10-07 22:49:41 ] Completed Epoch: 13 batch 288: backward pass 88.984 ms, 19.66 s total | |
[ 2023-10-07 22:49:42 ] Completed Epoch: 13 batch 288: computing loss 1,841.053 ms, 21.50 s total | |
EPOCH: [13], BATCH: [288/889], loss: 0.398, loss_box_reg: 0.122, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 288 | |
[ 2023-10-07 22:49:44 ] Completed saving temp checkpoint 1,816.615 ms, 23.31 s total | |
[ 2023-10-07 22:49:44 ] Completed replacing temp checkpoint with checkpoint 82.812 ms, 23.40 s total | |
[ 2023-10-07 22:49:44 ] Completed Epoch: 13 batch 289: moving batch data to device 20.154 ms, 23.42 s total | |
[ 2023-10-07 22:49:45 ] Completed Epoch: 13 batch 289: forward pass 325.805 ms, 23.74 s total | |
[ 2023-10-07 22:49:45 ] Completed Epoch: 13 batch 289: backward pass 87.435 ms, 23.83 s total | |
[ 2023-10-07 22:49:47 ] Completed Epoch: 13 batch 289: computing loss 2,014.451 ms, 25.84 s total | |
EPOCH: [13], BATCH: [289/889], loss: 0.381, loss_box_reg: 0.113, loss_classifier: 0.092, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 289 | |
[ 2023-10-07 22:49:49 ] Completed saving temp checkpoint 2,047.516 ms, 27.89 s total | |
[ 2023-10-07 22:49:49 ] Completed replacing temp checkpoint with checkpoint 58.873 ms, 27.95 s total | |
[ 2023-10-07 22:49:49 ] Completed Epoch: 13 batch 290: moving batch data to device 2.852 ms, 27.95 s total | |
[ 2023-10-07 22:49:49 ] Completed Epoch: 13 batch 290: forward pass 435.818 ms, 28.39 s total | |
[ 2023-10-07 22:49:50 ] Completed Epoch: 13 batch 290: backward pass 370.142 ms, 28.76 s total | |
[ 2023-10-07 22:49:51 ] Completed Epoch: 13 batch 290: computing loss 1,599.513 ms, 30.36 s total | |
EPOCH: [13], BATCH: [290/889], loss: 0.391, loss_box_reg: 0.117, loss_classifier: 0.099, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 290 | |
[ 2023-10-07 22:49:53 ] Completed saving temp checkpoint 1,676.741 ms, 32.04 s total | |
[ 2023-10-07 22:49:53 ] Completed replacing temp checkpoint with checkpoint 65.710 ms, 32.10 s total | |
[ 2023-10-07 22:49:53 ] Completed Epoch: 13 batch 291: moving batch data to device 7.515 ms, 32.11 s total | |
[ 2023-10-07 22:49:54 ] Completed Epoch: 13 batch 291: forward pass 446.162 ms, 32.56 s total | |
[ 2023-10-07 22:49:54 ] Completed Epoch: 13 batch 291: backward pass 38.502 ms, 32.59 s total | |
[ 2023-10-07 22:49:55 ] Completed Epoch: 13 batch 291: computing loss 1,809.728 ms, 34.40 s total | |
EPOCH: [13], BATCH: [291/889], loss: 0.397, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.127, loss_objectness: 0.019, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 291 | |
[ 2023-10-07 22:49:57 ] Completed saving temp checkpoint 1,856.743 ms, 36.26 s total | |
[ 2023-10-07 22:49:57 ] Completed replacing temp checkpoint with checkpoint 54.842 ms, 36.31 s total | |
[ 2023-10-07 22:49:57 ] Completed Epoch: 13 batch 292: moving batch data to device 21.504 ms, 36.34 s total | |
[ 2023-10-07 22:49:58 ] Completed Epoch: 13 batch 292: forward pass 324.782 ms, 36.66 s total | |
[ 2023-10-07 22:49:58 ] Completed Epoch: 13 batch 292: backward pass 75.960 ms, 36.74 s total | |
[ 2023-10-07 22:49:59 ] Completed Epoch: 13 batch 292: computing loss 1,532.465 ms, 38.27 s total | |
EPOCH: [13], BATCH: [292/889], loss: 0.367, loss_box_reg: 0.112, loss_classifier: 0.091, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 292 | |
[ 2023-10-07 22:50:01 ] Completed saving temp checkpoint 1,371.852 ms, 39.64 s total | |
[ 2023-10-07 22:50:01 ] Completed replacing temp checkpoint with checkpoint 52.632 ms, 39.69 s total | |
[ 2023-10-07 22:50:01 ] Completed Epoch: 13 batch 293: moving batch data to device 21.904 ms, 39.72 s total | |
[ 2023-10-07 22:50:01 ] Completed Epoch: 13 batch 293: forward pass 315.612 ms, 40.03 s total | |
[ 2023-10-07 22:50:01 ] Completed Epoch: 13 batch 293: backward pass 74.773 ms, 40.11 s total | |
[ 2023-10-07 22:50:03 ] Completed Epoch: 13 batch 293: computing loss 1,460.755 ms, 41.57 s total | |
EPOCH: [13], BATCH: [293/889], loss: 0.400, loss_box_reg: 0.122, loss_classifier: 0.095, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 293 | |
[ 2023-10-07 22:50:04 ] Completed saving temp checkpoint 1,289.994 ms, 42.86 s total | |
[ 2023-10-07 22:50:04 ] Completed replacing temp checkpoint with checkpoint 56.116 ms, 42.91 s total | |
[ 2023-10-07 22:50:04 ] Completed Epoch: 13 batch 294: moving batch data to device 21.162 ms, 42.93 s total | |
[ 2023-10-07 22:50:04 ] Completed Epoch: 13 batch 294: forward pass 313.039 ms, 43.25 s total | |
[ 2023-10-07 22:50:05 ] Completed Epoch: 13 batch 294: backward pass 342.197 ms, 43.59 s total | |
[ 2023-10-07 22:50:06 ] Completed Epoch: 13 batch 294: computing loss 1,168.170 ms, 44.76 s total | |
EPOCH: [13], BATCH: [294/889], loss: 0.418, loss_box_reg: 0.127, loss_classifier: 0.110, loss_mask: 0.137, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 294 | |
[ 2023-10-07 22:50:07 ] Completed saving temp checkpoint 1,449.425 ms, 46.21 s total | |
[ 2023-10-07 22:50:07 ] Completed replacing temp checkpoint with checkpoint 65.273 ms, 46.27 s total | |
[ 2023-10-07 22:50:07 ] Completed Epoch: 13 batch 295: moving batch data to device 23.513 ms, 46.30 s total | |
[ 2023-10-07 22:50:08 ] Completed Epoch: 13 batch 295: forward pass 353.318 ms, 46.65 s total | |
[ 2023-10-07 22:50:08 ] Completed Epoch: 13 batch 295: backward pass 68.594 ms, 46.72 s total | |
[ 2023-10-07 22:50:09 ] Completed Epoch: 13 batch 295: computing loss 1,655.603 ms, 48.37 s total | |
EPOCH: [13], BATCH: [295/889], loss: 0.397, loss_box_reg: 0.117, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.019, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 295 | |
[ 2023-10-07 22:50:11 ] Completed saving temp checkpoint 1,946.601 ms, 50.32 s total | |
[ 2023-10-07 22:50:11 ] Completed replacing temp checkpoint with checkpoint 65.313 ms, 50.39 s total | |
[ 2023-10-07 22:50:11 ] Completed Epoch: 13 batch 296: moving batch data to device 21.644 ms, 50.41 s total | |
[ 2023-10-07 22:50:12 ] Completed Epoch: 13 batch 296: forward pass 296.284 ms, 50.70 s total | |
[ 2023-10-07 22:50:12 ] Completed Epoch: 13 batch 296: backward pass 74.890 ms, 50.78 s total | |
[ 2023-10-07 22:50:13 ] Completed Epoch: 13 batch 296: computing loss 1,594.515 ms, 52.37 s total | |
EPOCH: [13], BATCH: [296/889], loss: 0.375, loss_box_reg: 0.112, loss_classifier: 0.099, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 296 | |
[ 2023-10-07 22:50:15 ] Completed saving temp checkpoint 1,265.598 ms, 53.64 s total | |
[ 2023-10-07 22:50:15 ] Completed replacing temp checkpoint with checkpoint 37.035 ms, 53.68 s total | |
[ 2023-10-07 22:50:15 ] Completed Epoch: 13 batch 297: moving batch data to device 21.791 ms, 53.70 s total | |
[ 2023-10-07 22:50:15 ] Completed Epoch: 13 batch 297: forward pass 383.074 ms, 54.08 s total | |
[ 2023-10-07 22:50:15 ] Completed Epoch: 13 batch 297: backward pass 74.060 ms, 54.15 s total | |
[ 2023-10-07 22:50:17 ] Completed Epoch: 13 batch 297: computing loss 1,389.431 ms, 55.54 s total | |
EPOCH: [13], BATCH: [297/889], loss: 0.360, loss_box_reg: 0.107, loss_classifier: 0.090, loss_mask: 0.126, loss_objectness: 0.014, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 297 | |
[ 2023-10-07 22:50:18 ] Completed saving temp checkpoint 1,305.610 ms, 56.85 s total | |
[ 2023-10-07 22:50:18 ] Completed replacing temp checkpoint with checkpoint 52.837 ms, 56.90 s total | |
[ 2023-10-07 22:50:18 ] Completed Epoch: 13 batch 298: moving batch data to device 22.614 ms, 56.92 s total | |
[ 2023-10-07 22:50:18 ] Completed Epoch: 13 batch 298: forward pass 306.823 ms, 57.23 s total | |
[ 2023-10-07 22:50:18 ] Completed Epoch: 13 batch 298: backward pass 39.933 ms, 57.27 s total | |
[ 2023-10-07 22:50:20 ] Completed Epoch: 13 batch 298: computing loss 1,498.558 ms, 58.77 s total | |
EPOCH: [13], BATCH: [298/889], loss: 0.362, loss_box_reg: 0.112, loss_classifier: 0.092, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 298 | |
[ 2023-10-07 22:50:21 ] Completed saving temp checkpoint 1,257.317 ms, 60.03 s total | |
[ 2023-10-07 22:50:21 ] Completed replacing temp checkpoint with checkpoint 70.707 ms, 60.10 s total | |
[ 2023-10-07 22:50:21 ] Completed Epoch: 13 batch 299: moving batch data to device 23.388 ms, 60.12 s total | |
[ 2023-10-07 22:50:21 ] Completed Epoch: 13 batch 299: forward pass 345.682 ms, 60.47 s total | |
[ 2023-10-07 22:50:22 ] Completed Epoch: 13 batch 299: backward pass 86.590 ms, 60.55 s total | |
[ 2023-10-07 22:50:23 ] Completed Epoch: 13 batch 299: computing loss 1,395.030 ms, 61.95 s total | |
EPOCH: [13], BATCH: [299/889], loss: 0.389, loss_box_reg: 0.114, loss_classifier: 0.096, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 299 | |
[ 2023-10-07 22:50:24 ] Completed saving temp checkpoint 1,305.594 ms, 63.25 s total | |
[ 2023-10-07 22:50:24 ] Completed replacing temp checkpoint with checkpoint 56.161 ms, 63.31 s total | |
[ 2023-10-07 22:50:24 ] Completed Epoch: 13 batch 300: moving batch data to device 22.869 ms, 63.33 s total | |
[ 2023-10-07 22:50:25 ] Completed Epoch: 13 batch 300: forward pass 316.257 ms, 63.65 s total | |
[ 2023-10-07 22:50:25 ] Completed Epoch: 13 batch 300: backward pass 45.363 ms, 63.70 s total | |
[ 2023-10-07 22:50:27 ] Completed Epoch: 13 batch 300: computing loss 1,841.517 ms, 65.54 s total | |
EPOCH: [13], BATCH: [300/889], loss: 0.388, loss_box_reg: 0.122, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 300 | |
[ 2023-10-07 22:50:28 ] Completed saving temp checkpoint 1,659.750 ms, 67.20 s total | |
[ 2023-10-07 22:50:28 ] Completed replacing temp checkpoint with checkpoint 73.653 ms, 67.27 s total | |
[ 2023-10-07 22:50:28 ] Completed Epoch: 13 batch 301: moving batch data to device 23.098 ms, 67.29 s total | |
[ 2023-10-07 22:50:29 ] Completed Epoch: 13 batch 301: forward pass 338.769 ms, 67.63 s total | |
[ 2023-10-07 22:50:29 ] Completed Epoch: 13 batch 301: backward pass 67.281 ms, 67.70 s total | |
[ 2023-10-07 22:50:31 ] Completed Epoch: 13 batch 301: computing loss 2,225.979 ms, 69.93 s total | |
EPOCH: [13], BATCH: [301/889], loss: 0.373, loss_box_reg: 0.108, loss_classifier: 0.093, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 301 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 23:03:42 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 23:03:42 ] Completed importing Timer 0.029 ms, 0.00 s total | |
[ 2023-10-07 23:03:43 ] Completed importing everything else 538.647 ms, 0.54 s total | |
[ 2023-10-07 23:03:43 ] Completed defined other functions 0.026 ms, 0.54 s total | |
| distributed init (rank 2): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 4): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 23:03:46 ] Completed main preliminaries 3,430.887 ms, 3.97 s total | |
loading annotations into memory... | |
Done (t=11.59s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 23:04:00 ] Completed loading data 13,493.033 ms, 17.46 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 23:04:00 ] Completed creating data samplers 103.517 ms, 17.57 s total | |
[ 2023-10-07 23:04:00 ] Completed creating data loaders 0.220 ms, 17.57 s total | |
[ 2023-10-07 23:04:01 ] Completed creating model and .to(device) 662.858 ms, 18.23 s total | |
[ 2023-10-07 23:04:03 ] Completed preparing model for distributed training 2,522.539 ms, 20.75 s total | |
[ 2023-10-07 23:04:03 ] Completed optimizer and scaler 0.548 ms, 20.75 s total | |
[ 2023-10-07 23:04:03 ] Completed learning rate schedulers 0.128 ms, 20.75 s total | |
[ 2023-10-07 23:04:04 ] Completed init coco evaluator 983.124 ms, 21.74 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 23:04:05 ] Completed retrieving checkpoint 880.634 ms, 22.62 s total | |
EPOCH :: 13 | |
[ 2023-10-07 23:04:05 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 23:04:05 ] Completed training preliminaries 0.878 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 301 | |
[ 2023-10-07 23:04:05 ] Completed Epoch: 13 batch 301: moving batch data to device 243.322 ms, 0.24 s total | |
[ 2023-10-07 23:04:11 ] Completed Epoch: 13 batch 301: forward pass 5,355.716 ms, 5.60 s total | |
[ 2023-10-07 23:04:11 ] Completed Epoch: 13 batch 301: backward pass 147.753 ms, 5.75 s total | |
[ 2023-10-07 23:04:12 ] Completed Epoch: 13 batch 301: computing loss 1,092.340 ms, 6.84 s total | |
EPOCH: [13], BATCH: [301/889], loss: 0.372, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 301 | |
[ 2023-10-07 23:04:13 ] Completed saving temp checkpoint 1,430.508 ms, 8.27 s total | |
[ 2023-10-07 23:04:14 ] Completed replacing temp checkpoint with checkpoint 136.463 ms, 8.41 s total | |
[ 2023-10-07 23:04:14 ] Completed Epoch: 13 batch 302: moving batch data to device 51.146 ms, 8.46 s total | |
[ 2023-10-07 23:04:14 ] Completed Epoch: 13 batch 302: forward pass 435.122 ms, 8.89 s total | |
[ 2023-10-07 23:04:14 ] Completed Epoch: 13 batch 302: backward pass 166.494 ms, 9.06 s total | |
[ 2023-10-07 23:04:16 ] Completed Epoch: 13 batch 302: computing loss 1,657.215 ms, 10.72 s total | |
EPOCH: [13], BATCH: [302/889], loss: 0.400, loss_box_reg: 0.125, loss_classifier: 0.105, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 302 | |
[ 2023-10-07 23:04:17 ] Completed saving temp checkpoint 1,168.966 ms, 11.89 s total | |
[ 2023-10-07 23:04:17 ] Completed replacing temp checkpoint with checkpoint 70.602 ms, 11.96 s total | |
[ 2023-10-07 23:04:17 ] Completed Epoch: 13 batch 303: moving batch data to device 20.686 ms, 11.98 s total | |
[ 2023-10-07 23:04:17 ] Completed Epoch: 13 batch 303: forward pass 323.360 ms, 12.30 s total | |
[ 2023-10-07 23:04:18 ] Completed Epoch: 13 batch 303: backward pass 419.196 ms, 12.72 s total | |
[ 2023-10-07 23:04:19 ] Completed Epoch: 13 batch 303: computing loss 957.482 ms, 13.68 s total | |
EPOCH: [13], BATCH: [303/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.099, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 303 | |
[ 2023-10-07 23:04:20 ] Completed saving temp checkpoint 1,339.188 ms, 15.02 s total | |
[ 2023-10-07 23:04:20 ] Completed replacing temp checkpoint with checkpoint 80.619 ms, 15.10 s total | |
[ 2023-10-07 23:04:20 ] Completed Epoch: 13 batch 304: moving batch data to device 27.539 ms, 15.12 s total | |
[ 2023-10-07 23:04:21 ] Completed Epoch: 13 batch 304: forward pass 332.471 ms, 15.46 s total | |
[ 2023-10-07 23:04:21 ] Completed Epoch: 13 batch 304: backward pass 73.261 ms, 15.53 s total | |
[ 2023-10-07 23:04:22 ] Completed Epoch: 13 batch 304: computing loss 1,294.268 ms, 16.82 s total | |
EPOCH: [13], BATCH: [304/889], loss: 0.408, loss_box_reg: 0.127, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 304 | |
[ 2023-10-07 23:04:23 ] Completed saving temp checkpoint 1,386.639 ms, 18.21 s total | |
[ 2023-10-07 23:04:23 ] Completed replacing temp checkpoint with checkpoint 69.203 ms, 18.28 s total | |
[ 2023-10-07 23:04:23 ] Completed Epoch: 13 batch 305: moving batch data to device 19.032 ms, 18.30 s total | |
[ 2023-10-07 23:04:24 ] Completed Epoch: 13 batch 305: forward pass 330.573 ms, 18.63 s total | |
[ 2023-10-07 23:04:24 ] Completed Epoch: 13 batch 305: backward pass 67.202 ms, 18.70 s total | |
[ 2023-10-07 23:04:25 ] Completed Epoch: 13 batch 305: computing loss 1,334.916 ms, 20.03 s total | |
EPOCH: [13], BATCH: [305/889], loss: 0.390, loss_box_reg: 0.119, loss_classifier: 0.103, loss_mask: 0.126, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 305 | |
[ 2023-10-07 23:04:26 ] Completed saving temp checkpoint 1,338.655 ms, 21.37 s total | |
[ 2023-10-07 23:04:27 ] Completed replacing temp checkpoint with checkpoint 51.345 ms, 21.42 s total | |
[ 2023-10-07 23:04:27 ] Completed Epoch: 13 batch 306: moving batch data to device 22.069 ms, 21.44 s total | |
[ 2023-10-07 23:04:27 ] Completed Epoch: 13 batch 306: forward pass 365.246 ms, 21.81 s total | |
[ 2023-10-07 23:04:27 ] Completed Epoch: 13 batch 306: backward pass 94.330 ms, 21.90 s total | |
[ 2023-10-07 23:04:29 ] Completed Epoch: 13 batch 306: computing loss 1,636.522 ms, 23.54 s total | |
EPOCH: [13], BATCH: [306/889], loss: 0.380, loss_box_reg: 0.114, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.017, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 306 | |
[ 2023-10-07 23:04:30 ] Completed saving temp checkpoint 1,665.230 ms, 25.21 s total | |
[ 2023-10-07 23:04:30 ] Completed replacing temp checkpoint with checkpoint 105.027 ms, 25.31 s total | |
[ 2023-10-07 23:04:30 ] Completed Epoch: 13 batch 307: moving batch data to device 21.304 ms, 25.33 s total | |
[ 2023-10-07 23:04:31 ] Completed Epoch: 13 batch 307: forward pass 313.372 ms, 25.65 s total | |
[ 2023-10-07 23:04:31 ] Completed Epoch: 13 batch 307: backward pass 69.406 ms, 25.71 s total | |
[ 2023-10-07 23:04:32 ] Completed Epoch: 13 batch 307: computing loss 1,407.106 ms, 27.12 s total | |
EPOCH: [13], BATCH: [307/889], loss: 0.382, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.020, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 307 | |
[ 2023-10-07 23:04:33 ] Completed saving temp checkpoint 1,112.016 ms, 28.23 s total | |
[ 2023-10-07 23:04:33 ] Completed replacing temp checkpoint with checkpoint 48.038 ms, 28.28 s total | |
[ 2023-10-07 23:04:33 ] Completed Epoch: 13 batch 308: moving batch data to device 24.594 ms, 28.31 s total | |
[ 2023-10-07 23:04:34 ] Completed Epoch: 13 batch 308: forward pass 310.615 ms, 28.62 s total | |
[ 2023-10-07 23:04:34 ] Completed Epoch: 13 batch 308: backward pass 78.591 ms, 28.70 s total | |
[ 2023-10-07 23:04:35 ] Completed Epoch: 13 batch 308: computing loss 1,394.855 ms, 30.09 s total | |
EPOCH: [13], BATCH: [308/889], loss: 0.410, loss_box_reg: 0.130, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 308 | |
[ 2023-10-07 23:04:37 ] Completed saving temp checkpoint 1,699.131 ms, 31.79 s total | |
[ 2023-10-07 23:04:37 ] Completed replacing temp checkpoint with checkpoint 74.590 ms, 31.86 s total | |
[ 2023-10-07 23:04:37 ] Completed Epoch: 13 batch 309: moving batch data to device 21.640 ms, 31.89 s total | |
[ 2023-10-07 23:04:37 ] Completed Epoch: 13 batch 309: forward pass 304.554 ms, 32.19 s total | |
[ 2023-10-07 23:04:37 ] Completed Epoch: 13 batch 309: backward pass 74.502 ms, 32.26 s total | |
[ 2023-10-07 23:04:39 ] Completed Epoch: 13 batch 309: computing loss 1,677.974 ms, 33.94 s total | |
EPOCH: [13], BATCH: [309/889], loss: 0.386, loss_box_reg: 0.113, loss_classifier: 0.102, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 309 | |
[ 2023-10-07 23:04:40 ] Completed saving temp checkpoint 1,324.402 ms, 35.27 s total | |
[ 2023-10-07 23:04:40 ] Completed replacing temp checkpoint with checkpoint 33.534 ms, 35.30 s total | |
[ 2023-10-07 23:04:40 ] Completed Epoch: 13 batch 310: moving batch data to device 21.241 ms, 35.32 s total | |
[ 2023-10-07 23:04:41 ] Completed Epoch: 13 batch 310: forward pass 400.110 ms, 35.72 s total | |
[ 2023-10-07 23:04:41 ] Completed Epoch: 13 batch 310: backward pass 38.415 ms, 35.76 s total | |
[ 2023-10-07 23:04:42 ] Completed Epoch: 13 batch 310: computing loss 1,153.256 ms, 36.91 s total | |
EPOCH: [13], BATCH: [310/889], loss: 0.436, loss_box_reg: 0.136, loss_classifier: 0.116, loss_mask: 0.133, loss_objectness: 0.020, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 310 | |
[ 2023-10-07 23:04:43 ] Completed saving temp checkpoint 1,155.327 ms, 38.07 s total | |
[ 2023-10-07 23:04:43 ] Completed replacing temp checkpoint with checkpoint 72.886 ms, 38.14 s total | |
[ 2023-10-07 23:04:43 ] Completed Epoch: 13 batch 311: moving batch data to device 24.239 ms, 38.17 s total | |
[ 2023-10-07 23:04:44 ] Completed Epoch: 13 batch 311: forward pass 326.253 ms, 38.49 s total | |
[ 2023-10-07 23:04:44 ] Completed Epoch: 13 batch 311: backward pass 57.155 ms, 38.55 s total | |
[ 2023-10-07 23:04:45 ] Completed Epoch: 13 batch 311: computing loss 1,431.964 ms, 39.98 s total | |
EPOCH: [13], BATCH: [311/889], loss: 0.413, loss_box_reg: 0.126, loss_classifier: 0.109, loss_mask: 0.128, loss_objectness: 0.018, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 311 | |
[ 2023-10-07 23:04:46 ] Completed saving temp checkpoint 1,126.873 ms, 41.11 s total | |
[ 2023-10-07 23:04:46 ] Completed replacing temp checkpoint with checkpoint 82.451 ms, 41.19 s total | |
[ 2023-10-07 23:04:46 ] Completed Epoch: 13 batch 312: moving batch data to device 24.011 ms, 41.21 s total | |
[ 2023-10-07 23:04:47 ] Completed Epoch: 13 batch 312: forward pass 328.376 ms, 41.54 s total | |
[ 2023-10-07 23:04:47 ] Completed Epoch: 13 batch 312: backward pass 74.964 ms, 41.62 s total | |
[ 2023-10-07 23:04:48 ] Completed Epoch: 13 batch 312: computing loss 1,542.955 ms, 43.16 s total | |
EPOCH: [13], BATCH: [312/889], loss: 0.407, loss_box_reg: 0.122, loss_classifier: 0.107, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 312 | |
[ 2023-10-07 23:04:50 ] Completed saving temp checkpoint 1,723.195 ms, 44.88 s total | |
[ 2023-10-07 23:04:50 ] Completed replacing temp checkpoint with checkpoint 90.924 ms, 44.98 s total | |
[ 2023-10-07 23:04:50 ] Completed Epoch: 13 batch 313: moving batch data to device 25.602 ms, 45.00 s total | |
[ 2023-10-07 23:04:51 ] Completed Epoch: 13 batch 313: forward pass 424.119 ms, 45.43 s total | |
[ 2023-10-07 23:04:51 ] Completed Epoch: 13 batch 313: backward pass 120.875 ms, 45.55 s total | |
[ 2023-10-07 23:04:52 ] Completed Epoch: 13 batch 313: computing loss 1,276.906 ms, 46.82 s total | |
EPOCH: [13], BATCH: [313/889], loss: 0.375, loss_box_reg: 0.113, loss_classifier: 0.092, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 313 | |
[ 2023-10-07 23:04:53 ] Completed saving temp checkpoint 1,257.744 ms, 48.08 s total | |
[ 2023-10-07 23:04:53 ] Completed replacing temp checkpoint with checkpoint 78.780 ms, 48.16 s total | |
[ 2023-10-07 23:04:53 ] Completed Epoch: 13 batch 314: moving batch data to device 25.905 ms, 48.19 s total | |
[ 2023-10-07 23:04:54 ] Completed Epoch: 13 batch 314: forward pass 310.520 ms, 48.50 s total | |
[ 2023-10-07 23:04:54 ] Completed Epoch: 13 batch 314: backward pass 97.505 ms, 48.59 s total | |
[ 2023-10-07 23:04:55 ] Completed Epoch: 13 batch 314: computing loss 1,381.827 ms, 49.98 s total | |
EPOCH: [13], BATCH: [314/889], loss: 0.359, loss_box_reg: 0.109, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 314 | |
[ 2023-10-07 23:04:57 ] Completed saving temp checkpoint 1,761.358 ms, 51.74 s total | |
[ 2023-10-07 23:04:57 ] Completed replacing temp checkpoint with checkpoint 57.995 ms, 51.79 s total | |
[ 2023-10-07 23:04:57 ] Completed Epoch: 13 batch 315: moving batch data to device 5.549 ms, 51.80 s total | |
[ 2023-10-07 23:04:57 ] Completed Epoch: 13 batch 315: forward pass 445.665 ms, 52.25 s total | |
[ 2023-10-07 23:04:57 ] Completed Epoch: 13 batch 315: backward pass 78.487 ms, 52.32 s total | |
[ 2023-10-07 23:04:59 ] Completed Epoch: 13 batch 315: computing loss 1,574.620 ms, 53.90 s total | |
EPOCH: [13], BATCH: [315/889], loss: 0.345, loss_box_reg: 0.103, loss_classifier: 0.080, loss_mask: 0.125, loss_objectness: 0.012, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 315 | |
[ 2023-10-07 23:05:00 ] Completed saving temp checkpoint 1,191.869 ms, 55.09 s total | |
[ 2023-10-07 23:05:00 ] Completed replacing temp checkpoint with checkpoint 59.433 ms, 55.15 s total | |
[ 2023-10-07 23:05:00 ] Completed Epoch: 13 batch 316: moving batch data to device 22.522 ms, 55.17 s total | |
[ 2023-10-07 23:05:01 ] Completed Epoch: 13 batch 316: forward pass 332.471 ms, 55.51 s total | |
[ 2023-10-07 23:05:01 ] Completed Epoch: 13 batch 316: backward pass 75.105 ms, 55.58 s total | |
[ 2023-10-07 23:05:02 ] Completed Epoch: 13 batch 316: computing loss 1,329.994 ms, 56.91 s total | |
EPOCH: [13], BATCH: [316/889], loss: 0.401, loss_box_reg: 0.121, loss_classifier: 0.097, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 316 | |
[ 2023-10-07 23:05:03 ] Completed saving temp checkpoint 1,077.108 ms, 57.99 s total | |
[ 2023-10-07 23:05:03 ] Completed replacing temp checkpoint with checkpoint 50.702 ms, 58.04 s total | |
[ 2023-10-07 23:05:03 ] Completed Epoch: 13 batch 317: moving batch data to device 20.158 ms, 58.06 s total | |
[ 2023-10-07 23:05:03 ] Completed Epoch: 13 batch 317: forward pass 315.937 ms, 58.37 s total | |
[ 2023-10-07 23:05:04 ] Completed Epoch: 13 batch 317: backward pass 35.615 ms, 58.41 s total | |
[ 2023-10-07 23:05:05 ] Completed Epoch: 13 batch 317: computing loss 1,420.966 ms, 59.83 s total | |
EPOCH: [13], BATCH: [317/889], loss: 0.382, loss_box_reg: 0.109, loss_classifier: 0.098, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 317 | |
[ 2023-10-07 23:05:06 ] Completed saving temp checkpoint 1,197.669 ms, 61.03 s total | |
[ 2023-10-07 23:05:06 ] Completed replacing temp checkpoint with checkpoint 77.506 ms, 61.11 s total | |
[ 2023-10-07 23:05:06 ] Completed Epoch: 13 batch 318: moving batch data to device 21.772 ms, 61.13 s total | |
[ 2023-10-07 23:05:07 ] Completed Epoch: 13 batch 318: forward pass 310.432 ms, 61.44 s total | |
[ 2023-10-07 23:05:07 ] Completed Epoch: 13 batch 318: backward pass 72.765 ms, 61.51 s total | |
[ 2023-10-07 23:05:08 ] Completed Epoch: 13 batch 318: computing loss 1,486.382 ms, 63.00 s total | |
EPOCH: [13], BATCH: [318/889], loss: 0.375, loss_box_reg: 0.108, loss_classifier: 0.093, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 318 | |
[ 2023-10-07 23:05:10 ] Completed saving temp checkpoint 1,562.765 ms, 64.56 s total | |
[ 2023-10-07 23:05:10 ] Completed replacing temp checkpoint with checkpoint 41.529 ms, 64.60 s total | |
[ 2023-10-07 23:05:10 ] Completed Epoch: 13 batch 319: moving batch data to device 7.443 ms, 64.61 s total | |
[ 2023-10-07 23:05:10 ] Completed Epoch: 13 batch 319: forward pass 431.437 ms, 65.04 s total | |
[ 2023-10-07 23:05:10 ] Completed Epoch: 13 batch 319: backward pass 97.587 ms, 65.14 s total | |
[ 2023-10-07 23:05:11 ] Completed Epoch: 13 batch 319: computing loss 1,235.868 ms, 66.37 s total | |
EPOCH: [13], BATCH: [319/889], loss: 0.341, loss_box_reg: 0.104, loss_classifier: 0.084, loss_mask: 0.125, loss_objectness: 0.012, loss_rpn_box_reg: 0.018 | |
Saving checkpoint at epoch 13 train batch 319 | |
[ 2023-10-07 23:05:13 ] Completed saving temp checkpoint 1,263.550 ms, 67.64 s total | |
[ 2023-10-07 23:05:13 ] Completed replacing temp checkpoint with checkpoint 66.271 ms, 67.70 s total | |
[ 2023-10-07 23:05:13 ] Completed Epoch: 13 batch 320: moving batch data to device 22.925 ms, 67.73 s total | |
[ 2023-10-07 23:05:13 ] Completed Epoch: 13 batch 320: forward pass 312.307 ms, 68.04 s total | |
[ 2023-10-07 23:05:13 ] Completed Epoch: 13 batch 320: backward pass 52.473 ms, 68.09 s total | |
[ 2023-10-07 23:05:15 ] Completed Epoch: 13 batch 320: computing loss 1,307.788 ms, 69.40 s total | |
EPOCH: [13], BATCH: [320/889], loss: 0.378, loss_box_reg: 0.109, loss_classifier: 0.094, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 320 | |
[ 2023-10-07 23:05:16 ] Completed saving temp checkpoint 1,235.148 ms, 70.63 s total | |
[ 2023-10-07 23:05:16 ] Completed replacing temp checkpoint with checkpoint 81.179 ms, 70.72 s total | |
[ 2023-10-07 23:05:16 ] Completed Epoch: 13 batch 321: moving batch data to device 21.750 ms, 70.74 s total | |
[ 2023-10-07 23:05:16 ] Completed Epoch: 13 batch 321: forward pass 322.865 ms, 71.06 s total | |
[ 2023-10-07 23:05:16 ] Completed Epoch: 13 batch 321: backward pass 68.438 ms, 71.13 s total | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 23:18:15 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 23:18:15 ] Completed importing Timer 0.022 ms, 0.00 s total | |
[ 2023-10-07 23:18:16 ] Completed importing everything else 641.526 ms, 0.64 s total | |
[ 2023-10-07 23:18:16 ] Completed defined other functions 0.025 ms, 0.64 s total | |
| distributed init (rank 0): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 1): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 23:18:19 ] Completed main preliminaries 2,914.096 ms, 3.56 s total | |
loading annotations into memory... | |
Done (t=12.76s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.30s) | |
creating index... | |
index created! | |
[ 2023-10-07 23:18:34 ] Completed loading data 14,805.068 ms, 18.36 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 23:18:34 ] Completed creating data samplers 110.107 ms, 18.47 s total | |
[ 2023-10-07 23:18:34 ] Completed creating data loaders 0.219 ms, 18.47 s total | |
[ 2023-10-07 23:18:35 ] Completed creating model and .to(device) 1,675.373 ms, 20.15 s total | |
[ 2023-10-07 23:18:36 ] Completed preparing model for distributed training 564.833 ms, 20.71 s total | |
[ 2023-10-07 23:18:36 ] Completed optimizer and scaler 0.585 ms, 20.71 s total | |
[ 2023-10-07 23:18:36 ] Completed learning rate schedulers 0.132 ms, 20.71 s total | |
[ 2023-10-07 23:18:37 ] Completed init coco evaluator 1,013.679 ms, 21.73 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 23:18:38 ] Completed retrieving checkpoint 824.760 ms, 22.55 s total | |
EPOCH :: 13 | |
[ 2023-10-07 23:18:38 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 23:18:38 ] Completed training preliminaries 0.870 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 321 | |
[ 2023-10-07 23:18:38 ] Completed Epoch: 13 batch 321: moving batch data to device 278.177 ms, 0.28 s total | |
[ 2023-10-07 23:18:44 ] Completed Epoch: 13 batch 321: forward pass 5,785.119 ms, 6.06 s total | |
[ 2023-10-07 23:18:44 ] Completed Epoch: 13 batch 321: backward pass 404.636 ms, 6.47 s total | |
[ 2023-10-07 23:18:45 ] Completed Epoch: 13 batch 321: computing loss 740.755 ms, 7.21 s total | |
EPOCH: [13], BATCH: [321/889], loss: 0.409, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 321 | |
[ 2023-10-07 23:18:46 ] Completed saving temp checkpoint 1,503.950 ms, 8.71 s total | |
[ 2023-10-07 23:18:47 ] Completed replacing temp checkpoint with checkpoint 157.773 ms, 8.87 s total | |
[ 2023-10-07 23:18:47 ] Completed Epoch: 13 batch 322: moving batch data to device 19.801 ms, 8.89 s total | |
[ 2023-10-07 23:18:47 ] Completed Epoch: 13 batch 322: forward pass 319.160 ms, 9.21 s total | |
[ 2023-10-07 23:18:47 ] Completed Epoch: 13 batch 322: backward pass 403.427 ms, 9.61 s total | |
[ 2023-10-07 23:18:48 ] Completed Epoch: 13 batch 322: computing loss 1,027.970 ms, 10.64 s total | |
EPOCH: [13], BATCH: [322/889], loss: 0.375, loss_box_reg: 0.109, loss_classifier: 0.089, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 322 | |
[ 2023-10-07 23:18:50 ] Completed saving temp checkpoint 1,332.080 ms, 11.97 s total | |
[ 2023-10-07 23:18:50 ] Completed replacing temp checkpoint with checkpoint 75.118 ms, 12.05 s total | |
[ 2023-10-07 23:18:50 ] Completed Epoch: 13 batch 323: moving batch data to device 20.907 ms, 12.07 s total | |
[ 2023-10-07 23:18:50 ] Completed Epoch: 13 batch 323: forward pass 316.478 ms, 12.39 s total | |
[ 2023-10-07 23:18:50 ] Completed Epoch: 13 batch 323: backward pass 68.078 ms, 12.45 s total | |
[ 2023-10-07 23:18:52 ] Completed Epoch: 13 batch 323: computing loss 1,531.039 ms, 13.99 s total | |
EPOCH: [13], BATCH: [323/889], loss: 0.413, loss_box_reg: 0.118, loss_classifier: 0.111, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 323 | |
[ 2023-10-07 23:18:53 ] Completed saving temp checkpoint 1,413.841 ms, 15.40 s total | |
[ 2023-10-07 23:18:53 ] Completed replacing temp checkpoint with checkpoint 88.066 ms, 15.49 s total | |
[ 2023-10-07 23:18:53 ] Completed Epoch: 13 batch 324: moving batch data to device 29.615 ms, 15.52 s total | |
[ 2023-10-07 23:18:54 ] Completed Epoch: 13 batch 324: forward pass 317.897 ms, 15.83 s total | |
[ 2023-10-07 23:18:54 ] Completed Epoch: 13 batch 324: backward pass 396.467 ms, 16.23 s total | |
[ 2023-10-07 23:18:55 ] Completed Epoch: 13 batch 324: computing loss 1,177.481 ms, 17.41 s total | |
EPOCH: [13], BATCH: [324/889], loss: 0.387, loss_box_reg: 0.113, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 324 | |
[ 2023-10-07 23:18:56 ] Completed saving temp checkpoint 1,129.828 ms, 18.54 s total | |
[ 2023-10-07 23:18:56 ] Completed replacing temp checkpoint with checkpoint 57.139 ms, 18.60 s total | |
[ 2023-10-07 23:18:56 ] Completed Epoch: 13 batch 325: moving batch data to device 21.841 ms, 18.62 s total | |
[ 2023-10-07 23:18:57 ] Completed Epoch: 13 batch 325: forward pass 326.708 ms, 18.94 s total | |
[ 2023-10-07 23:18:57 ] Completed Epoch: 13 batch 325: backward pass 91.485 ms, 19.04 s total | |
[ 2023-10-07 23:18:58 ] Completed Epoch: 13 batch 325: computing loss 1,578.181 ms, 20.61 s total | |
EPOCH: [13], BATCH: [325/889], loss: 0.382, loss_box_reg: 0.116, loss_classifier: 0.095, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 325 | |
[ 2023-10-07 23:19:00 ] Completed saving temp checkpoint 1,204.481 ms, 21.82 s total | |
[ 2023-10-07 23:19:00 ] Completed replacing temp checkpoint with checkpoint 50.852 ms, 21.87 s total | |
[ 2023-10-07 23:19:00 ] Completed Epoch: 13 batch 326: moving batch data to device 20.596 ms, 21.89 s total | |
[ 2023-10-07 23:19:00 ] Completed Epoch: 13 batch 326: forward pass 334.277 ms, 22.22 s total | |
[ 2023-10-07 23:19:00 ] Completed Epoch: 13 batch 326: backward pass 71.318 ms, 22.30 s total | |
[ 2023-10-07 23:19:01 ] Completed Epoch: 13 batch 326: computing loss 1,330.727 ms, 23.63 s total | |
EPOCH: [13], BATCH: [326/889], loss: 0.365, loss_box_reg: 0.109, loss_classifier: 0.091, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 326 | |
[ 2023-10-07 23:19:03 ] Completed saving temp checkpoint 1,310.637 ms, 24.94 s total | |
[ 2023-10-07 23:19:03 ] Completed replacing temp checkpoint with checkpoint 76.460 ms, 25.01 s total | |
[ 2023-10-07 23:19:03 ] Completed Epoch: 13 batch 327: moving batch data to device 19.581 ms, 25.03 s total | |
[ 2023-10-07 23:19:03 ] Completed Epoch: 13 batch 327: forward pass 322.169 ms, 25.35 s total | |
[ 2023-10-07 23:19:03 ] Completed Epoch: 13 batch 327: backward pass 69.885 ms, 25.42 s total | |
[ 2023-10-07 23:19:05 ] Completed Epoch: 13 batch 327: computing loss 1,410.949 ms, 26.84 s total | |
EPOCH: [13], BATCH: [327/889], loss: 0.338, loss_box_reg: 0.105, loss_classifier: 0.083, loss_mask: 0.125, loss_objectness: 0.012, loss_rpn_box_reg: 0.014 | |
Saving checkpoint at epoch 13 train batch 327 | |
[ 2023-10-07 23:19:06 ] Completed saving temp checkpoint 1,484.391 ms, 28.32 s total | |
[ 2023-10-07 23:19:06 ] Completed replacing temp checkpoint with checkpoint 41.104 ms, 28.36 s total | |
[ 2023-10-07 23:19:06 ] Completed Epoch: 13 batch 328: moving batch data to device 5.811 ms, 28.37 s total | |
[ 2023-10-07 23:19:07 ] Completed Epoch: 13 batch 328: forward pass 448.668 ms, 28.82 s total | |
[ 2023-10-07 23:19:07 ] Completed Epoch: 13 batch 328: backward pass 84.196 ms, 28.90 s total | |
[ 2023-10-07 23:19:07 ] Completed Epoch: 13 batch 328: computing loss 869.091 ms, 29.77 s total | |
EPOCH: [13], BATCH: [328/889], loss: 0.379, loss_box_reg: 0.120, loss_classifier: 0.095, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 328 | |
[ 2023-10-07 23:19:09 ] Completed saving temp checkpoint 1,176.187 ms, 30.95 s total | |
[ 2023-10-07 23:19:09 ] Completed replacing temp checkpoint with checkpoint 72.518 ms, 31.02 s total | |
[ 2023-10-07 23:19:09 ] Completed Epoch: 13 batch 329: moving batch data to device 20.823 ms, 31.04 s total | |
[ 2023-10-07 23:19:09 ] Completed Epoch: 13 batch 329: forward pass 256.583 ms, 31.30 s total | |
[ 2023-10-07 23:19:09 ] Completed Epoch: 13 batch 329: backward pass 44.435 ms, 31.34 s total | |
[ 2023-10-07 23:19:09 ] Completed Epoch: 13 batch 329: computing loss 288.484 ms, 31.63 s total | |
EPOCH: [13], BATCH: [329/889], loss: 0.370, loss_box_reg: 0.110, loss_classifier: 0.099, loss_mask: 0.122, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 329 | |
[ 2023-10-07 23:19:12 ] Completed saving temp checkpoint 2,651.412 ms, 34.28 s total | |
[ 2023-10-07 23:19:12 ] Completed replacing temp checkpoint with checkpoint 52.028 ms, 34.33 s total | |
[ 2023-10-07 23:19:12 ] Completed Epoch: 13 batch 330: moving batch data to device 5.928 ms, 34.34 s total | |
[ 2023-10-07 23:19:12 ] Completed Epoch: 13 batch 330: forward pass 434.143 ms, 34.77 s total | |
[ 2023-10-07 23:19:13 ] Completed Epoch: 13 batch 330: backward pass 48.449 ms, 34.82 s total | |
[ 2023-10-07 23:19:13 ] Completed Epoch: 13 batch 330: computing loss 421.426 ms, 35.24 s total | |
EPOCH: [13], BATCH: [330/889], loss: 0.384, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 330 | |
[ 2023-10-07 23:19:14 ] Completed saving temp checkpoint 1,347.189 ms, 36.59 s total | |
[ 2023-10-07 23:19:15 ] Completed replacing temp checkpoint with checkpoint 476.493 ms, 37.07 s total | |
[ 2023-10-07 23:19:15 ] Completed Epoch: 13 batch 331: moving batch data to device 7.056 ms, 37.07 s total | |
[ 2023-10-07 23:19:15 ] Completed Epoch: 13 batch 331: forward pass 182.579 ms, 37.25 s total | |
[ 2023-10-07 23:19:15 ] Completed Epoch: 13 batch 331: backward pass 85.177 ms, 37.34 s total | |
[ 2023-10-07 23:19:15 ] Completed Epoch: 13 batch 331: computing loss 404.887 ms, 37.74 s total | |
EPOCH: [13], BATCH: [331/889], loss: 0.408, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 331 | |
[ 2023-10-07 23:19:17 ] Completed saving temp checkpoint 1,932.763 ms, 39.68 s total | |
[ 2023-10-07 23:19:18 ] Completed replacing temp checkpoint with checkpoint 610.041 ms, 40.29 s total | |
[ 2023-10-07 23:19:18 ] Completed Epoch: 13 batch 332: moving batch data to device 4.958 ms, 40.29 s total | |
[ 2023-10-07 23:19:18 ] Completed Epoch: 13 batch 332: forward pass 163.774 ms, 40.46 s total | |
[ 2023-10-07 23:19:18 ] Completed Epoch: 13 batch 332: backward pass 37.137 ms, 40.49 s total | |
[ 2023-10-07 23:19:19 ] Completed Epoch: 13 batch 332: computing loss 495.396 ms, 40.99 s total | |
EPOCH: [13], BATCH: [332/889], loss: 0.387, loss_box_reg: 0.115, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 332 | |
[ 2023-10-07 23:19:21 ] Completed saving temp checkpoint 2,056.961 ms, 43.05 s total | |
[ 2023-10-07 23:19:21 ] Completed replacing temp checkpoint with checkpoint 61.231 ms, 43.11 s total | |
[ 2023-10-07 23:19:21 ] Completed Epoch: 13 batch 333: moving batch data to device 8.198 ms, 43.12 s total | |
[ 2023-10-07 23:19:21 ] Completed Epoch: 13 batch 333: forward pass 164.047 ms, 43.28 s total | |
[ 2023-10-07 23:19:21 ] Completed Epoch: 13 batch 333: backward pass 153.606 ms, 43.43 s total | |
[ 2023-10-07 23:19:22 ] Completed Epoch: 13 batch 333: computing loss 440.974 ms, 43.87 s total | |
EPOCH: [13], BATCH: [333/889], loss: 0.413, loss_box_reg: 0.127, loss_classifier: 0.104, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 333 | |
[ 2023-10-07 23:19:24 ] Completed saving temp checkpoint 2,351.858 ms, 46.23 s total | |
[ 2023-10-07 23:19:24 ] Completed replacing temp checkpoint with checkpoint 89.276 ms, 46.32 s total | |
[ 2023-10-07 23:19:24 ] Completed Epoch: 13 batch 334: moving batch data to device 7.956 ms, 46.32 s total | |
[ 2023-10-07 23:19:24 ] Completed Epoch: 13 batch 334: forward pass 175.482 ms, 46.50 s total | |
[ 2023-10-07 23:19:24 ] Completed Epoch: 13 batch 334: backward pass 79.918 ms, 46.58 s total | |
[ 2023-10-07 23:19:25 ] Completed Epoch: 13 batch 334: computing loss 357.521 ms, 46.94 s total | |
EPOCH: [13], BATCH: [334/889], loss: 0.407, loss_box_reg: 0.127, loss_classifier: 0.104, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 334 | |
[ 2023-10-07 23:19:27 ] Completed saving temp checkpoint 1,894.754 ms, 48.83 s total | |
[ 2023-10-07 23:19:28 ] Completed replacing temp checkpoint with checkpoint 982.677 ms, 49.81 s total | |
[ 2023-10-07 23:19:28 ] Completed Epoch: 13 batch 335: moving batch data to device 5.759 ms, 49.82 s total | |
[ 2023-10-07 23:19:28 ] Completed Epoch: 13 batch 335: forward pass 159.817 ms, 49.98 s total | |
[ 2023-10-07 23:19:28 ] Completed Epoch: 13 batch 335: backward pass 61.291 ms, 50.04 s total | |
[ 2023-10-07 23:19:28 ] Completed Epoch: 13 batch 335: computing loss 349.832 ms, 50.39 s total | |
EPOCH: [13], BATCH: [335/889], loss: 0.350, loss_box_reg: 0.104, loss_classifier: 0.087, loss_mask: 0.123, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 335 | |
[ 2023-10-07 23:19:30 ] Completed saving temp checkpoint 1,816.666 ms, 52.21 s total | |
[ 2023-10-07 23:19:30 ] Completed replacing temp checkpoint with checkpoint 55.035 ms, 52.26 s total | |
[ 2023-10-07 23:19:30 ] Completed Epoch: 13 batch 336: moving batch data to device 8.858 ms, 52.27 s total | |
[ 2023-10-07 23:19:30 ] Completed Epoch: 13 batch 336: forward pass 157.371 ms, 52.43 s total | |
[ 2023-10-07 23:19:30 ] Completed Epoch: 13 batch 336: backward pass 154.859 ms, 52.58 s total | |
[ 2023-10-07 23:19:31 ] Completed Epoch: 13 batch 336: computing loss 331.870 ms, 52.91 s total | |
EPOCH: [13], BATCH: [336/889], loss: 0.358, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 336 | |
[ 2023-10-07 23:19:32 ] Completed saving temp checkpoint 1,762.175 ms, 54.68 s total | |
[ 2023-10-07 23:19:33 ] Completed replacing temp checkpoint with checkpoint 620.930 ms, 55.30 s total | |
[ 2023-10-07 23:19:33 ] Completed Epoch: 13 batch 337: moving batch data to device 6.934 ms, 55.30 s total | |
[ 2023-10-07 23:19:33 ] Completed Epoch: 13 batch 337: forward pass 165.216 ms, 55.47 s total | |
[ 2023-10-07 23:19:33 ] Completed Epoch: 13 batch 337: backward pass 73.175 ms, 55.54 s total | |
[ 2023-10-07 23:19:34 ] Completed Epoch: 13 batch 337: computing loss 373.905 ms, 55.92 s total | |
EPOCH: [13], BATCH: [337/889], loss: 0.360, loss_box_reg: 0.106, loss_classifier: 0.087, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 337 | |
[ 2023-10-07 23:19:35 ] Completed saving temp checkpoint 1,486.968 ms, 57.40 s total | |
[ 2023-10-07 23:19:35 ] Completed replacing temp checkpoint with checkpoint 96.844 ms, 57.50 s total | |
[ 2023-10-07 23:19:35 ] Completed Epoch: 13 batch 338: moving batch data to device 5.998 ms, 57.51 s total | |
[ 2023-10-07 23:19:35 ] Completed Epoch: 13 batch 338: forward pass 164.582 ms, 57.67 s total | |
[ 2023-10-07 23:19:35 ] Completed Epoch: 13 batch 338: backward pass 64.191 ms, 57.74 s total | |
[ 2023-10-07 23:19:36 ] Completed Epoch: 13 batch 338: computing loss 375.411 ms, 58.11 s total | |
EPOCH: [13], BATCH: [338/889], loss: 0.396, loss_box_reg: 0.118, loss_classifier: 0.103, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 338 | |
[ 2023-10-07 23:19:38 ] Completed saving temp checkpoint 1,887.530 ms, 60.00 s total | |
[ 2023-10-07 23:19:38 ] Completed replacing temp checkpoint with checkpoint 92.703 ms, 60.09 s total | |
[ 2023-10-07 23:19:38 ] Completed Epoch: 13 batch 339: moving batch data to device 5.161 ms, 60.10 s total | |
[ 2023-10-07 23:19:38 ] Completed Epoch: 13 batch 339: forward pass 163.000 ms, 60.26 s total | |
[ 2023-10-07 23:19:38 ] Completed Epoch: 13 batch 339: backward pass 64.061 ms, 60.32 s total | |
[ 2023-10-07 23:19:39 ] Completed Epoch: 13 batch 339: computing loss 479.046 ms, 60.80 s total | |
EPOCH: [13], BATCH: [339/889], loss: 0.341, loss_box_reg: 0.100, loss_classifier: 0.085, loss_mask: 0.122, loss_objectness: 0.012, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 339 | |
[ 2023-10-07 23:19:41 ] Completed saving temp checkpoint 2,113.673 ms, 62.92 s total | |
[ 2023-10-07 23:19:42 ] Completed replacing temp checkpoint with checkpoint 920.496 ms, 63.84 s total | |
[ 2023-10-07 23:19:42 ] Completed Epoch: 13 batch 340: moving batch data to device 8.945 ms, 63.85 s total | |
[ 2023-10-07 23:19:42 ] Completed Epoch: 13 batch 340: forward pass 156.035 ms, 64.00 s total | |
[ 2023-10-07 23:19:42 ] Completed Epoch: 13 batch 340: backward pass 83.486 ms, 64.09 s total | |
[ 2023-10-07 23:19:42 ] Completed Epoch: 13 batch 340: computing loss 330.806 ms, 64.42 s total | |
EPOCH: [13], BATCH: [340/889], loss: 0.382, loss_box_reg: 0.114, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 340 | |
[ 2023-10-07 23:19:45 ] Completed saving temp checkpoint 2,637.513 ms, 67.05 s total | |
[ 2023-10-07 23:19:45 ] Completed replacing temp checkpoint with checkpoint 66.049 ms, 67.12 s total | |
[ 2023-10-07 23:19:45 ] Completed Epoch: 13 batch 341: moving batch data to device 8.325 ms, 67.13 s total | |
[ 2023-10-07 23:19:45 ] Completed Epoch: 13 batch 341: forward pass 166.794 ms, 67.29 s total | |
[ 2023-10-07 23:19:45 ] Completed Epoch: 13 batch 341: backward pass 76.817 ms, 67.37 s total | |
[ 2023-10-07 23:19:46 ] Completed Epoch: 13 batch 341: computing loss 511.830 ms, 67.88 s total | |
EPOCH: [13], BATCH: [341/889], loss: 0.402, loss_box_reg: 0.123, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 341 | |
[ 2023-10-07 23:19:47 ] Completed saving temp checkpoint 1,557.509 ms, 69.44 s total | |
[ 2023-10-07 23:19:48 ] Completed replacing temp checkpoint with checkpoint 491.105 ms, 69.93 s total | |
[ 2023-10-07 23:19:48 ] Completed Epoch: 13 batch 342: moving batch data to device 8.116 ms, 69.94 s total | |
[ 2023-10-07 23:19:48 ] Completed Epoch: 13 batch 342: forward pass 156.847 ms, 70.10 s total | |
[ 2023-10-07 23:19:48 ] Completed Epoch: 13 batch 342: backward pass 74.633 ms, 70.17 s total | |
[ 2023-10-07 23:19:48 ] Completed Epoch: 13 batch 342: computing loss 495.586 ms, 70.67 s total | |
EPOCH: [13], BATCH: [342/889], loss: 0.380, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 342 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 23:32:47 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 23:32:47 ] Completed importing Timer 0.031 ms, 0.00 s total | |
[ 2023-10-07 23:32:48 ] Completed importing everything else 494.857 ms, 0.49 s total | |
[ 2023-10-07 23:32:48 ] Completed defined other functions 0.023 ms, 0.49 s total | |
| distributed init (rank 0): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 5): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 23:32:51 ] Completed main preliminaries 2,950.913 ms, 3.45 s total | |
loading annotations into memory... | |
Done (t=12.48s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-07 23:33:05 ] Completed loading data 14,440.212 ms, 17.89 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 23:33:05 ] Completed creating data samplers 98.215 ms, 17.98 s total | |
[ 2023-10-07 23:33:05 ] Completed creating data loaders 0.206 ms, 17.98 s total | |
[ 2023-10-07 23:33:06 ] Completed creating model and .to(device) 734.790 ms, 18.72 s total | |
[ 2023-10-07 23:33:08 ] Completed preparing model for distributed training 1,503.595 ms, 20.22 s total | |
[ 2023-10-07 23:33:08 ] Completed optimizer and scaler 0.568 ms, 20.22 s total | |
[ 2023-10-07 23:33:08 ] Completed learning rate schedulers 0.249 ms, 20.22 s total | |
[ 2023-10-07 23:33:09 ] Completed init coco evaluator 976.995 ms, 21.20 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 23:33:10 ] Completed retrieving checkpoint 915.699 ms, 22.12 s total | |
EPOCH :: 13 | |
[ 2023-10-07 23:33:10 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 23:33:10 ] Completed training preliminaries 0.894 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 342 | |
[ 2023-10-07 23:33:10 ] Completed Epoch: 13 batch 342: moving batch data to device 302.752 ms, 0.30 s total | |
[ 2023-10-07 23:33:15 ] Completed Epoch: 13 batch 342: forward pass 5,517.893 ms, 5.82 s total | |
[ 2023-10-07 23:33:16 ] Completed Epoch: 13 batch 342: backward pass 262.761 ms, 6.08 s total | |
[ 2023-10-07 23:33:17 ] Completed Epoch: 13 batch 342: computing loss 991.985 ms, 7.08 s total | |
EPOCH: [13], BATCH: [342/889], loss: 0.383, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 342 | |
[ 2023-10-07 23:33:18 ] Completed saving temp checkpoint 1,198.801 ms, 8.28 s total | |
[ 2023-10-07 23:33:18 ] Completed replacing temp checkpoint with checkpoint 177.211 ms, 8.45 s total | |
[ 2023-10-07 23:33:18 ] Completed Epoch: 13 batch 343: moving batch data to device 20.730 ms, 8.47 s total | |
[ 2023-10-07 23:33:18 ] Completed Epoch: 13 batch 343: forward pass 319.295 ms, 8.79 s total | |
[ 2023-10-07 23:33:19 ] Completed Epoch: 13 batch 343: backward pass 414.537 ms, 9.21 s total | |
[ 2023-10-07 23:33:20 ] Completed Epoch: 13 batch 343: computing loss 943.688 ms, 10.15 s total | |
EPOCH: [13], BATCH: [343/889], loss: 0.361, loss_box_reg: 0.103, loss_classifier: 0.093, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 343 | |
[ 2023-10-07 23:33:21 ] Completed saving temp checkpoint 1,683.340 ms, 11.83 s total | |
[ 2023-10-07 23:33:21 ] Completed replacing temp checkpoint with checkpoint 77.303 ms, 11.91 s total | |
[ 2023-10-07 23:33:21 ] Completed Epoch: 13 batch 344: moving batch data to device 20.894 ms, 11.93 s total | |
[ 2023-10-07 23:33:22 ] Completed Epoch: 13 batch 344: forward pass 312.740 ms, 12.24 s total | |
[ 2023-10-07 23:33:22 ] Completed Epoch: 13 batch 344: backward pass 83.460 ms, 12.33 s total | |
[ 2023-10-07 23:33:23 ] Completed Epoch: 13 batch 344: computing loss 1,361.220 ms, 13.69 s total | |
EPOCH: [13], BATCH: [344/889], loss: 0.365, loss_box_reg: 0.107, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 344 | |
[ 2023-10-07 23:33:25 ] Completed saving temp checkpoint 1,322.759 ms, 15.01 s total | |
[ 2023-10-07 23:33:25 ] Completed replacing temp checkpoint with checkpoint 70.827 ms, 15.08 s total | |
[ 2023-10-07 23:33:25 ] Completed Epoch: 13 batch 345: moving batch data to device 19.250 ms, 15.10 s total | |
[ 2023-10-07 23:33:25 ] Completed Epoch: 13 batch 345: forward pass 321.074 ms, 15.42 s total | |
[ 2023-10-07 23:33:25 ] Completed Epoch: 13 batch 345: backward pass 67.750 ms, 15.49 s total | |
[ 2023-10-07 23:33:27 ] Completed Epoch: 13 batch 345: computing loss 2,071.751 ms, 17.56 s total | |
EPOCH: [13], BATCH: [345/889], loss: 0.367, loss_box_reg: 0.105, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 345 | |
[ 2023-10-07 23:33:28 ] Completed saving temp checkpoint 1,114.762 ms, 18.68 s total | |
[ 2023-10-07 23:33:28 ] Completed replacing temp checkpoint with checkpoint 61.837 ms, 18.74 s total | |
[ 2023-10-07 23:33:28 ] Completed Epoch: 13 batch 346: moving batch data to device 26.316 ms, 18.77 s total | |
[ 2023-10-07 23:33:29 ] Completed Epoch: 13 batch 346: forward pass 321.675 ms, 19.09 s total | |
[ 2023-10-07 23:33:29 ] Completed Epoch: 13 batch 346: backward pass 81.498 ms, 19.17 s total | |
[ 2023-10-07 23:33:31 ] Completed Epoch: 13 batch 346: computing loss 1,852.726 ms, 21.02 s total | |
EPOCH: [13], BATCH: [346/889], loss: 0.413, loss_box_reg: 0.123, loss_classifier: 0.109, loss_mask: 0.133, loss_objectness: 0.019, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 346 | |
[ 2023-10-07 23:33:32 ] Completed saving temp checkpoint 1,634.594 ms, 22.66 s total | |
[ 2023-10-07 23:33:32 ] Completed replacing temp checkpoint with checkpoint 45.039 ms, 22.70 s total | |
[ 2023-10-07 23:33:32 ] Completed Epoch: 13 batch 347: moving batch data to device 18.529 ms, 22.72 s total | |
[ 2023-10-07 23:33:33 ] Completed Epoch: 13 batch 347: forward pass 305.981 ms, 23.03 s total | |
[ 2023-10-07 23:33:33 ] Completed Epoch: 13 batch 347: backward pass 70.307 ms, 23.10 s total | |
[ 2023-10-07 23:33:34 ] Completed Epoch: 13 batch 347: computing loss 1,507.570 ms, 24.60 s total | |
EPOCH: [13], BATCH: [347/889], loss: 0.413, loss_box_reg: 0.129, loss_classifier: 0.109, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 347 | |
[ 2023-10-07 23:33:35 ] Completed saving temp checkpoint 1,226.077 ms, 25.83 s total | |
[ 2023-10-07 23:33:35 ] Completed replacing temp checkpoint with checkpoint 55.635 ms, 25.89 s total | |
[ 2023-10-07 23:33:35 ] Completed Epoch: 13 batch 348: moving batch data to device 18.613 ms, 25.90 s total | |
[ 2023-10-07 23:33:36 ] Completed Epoch: 13 batch 348: forward pass 383.111 ms, 26.29 s total | |
[ 2023-10-07 23:33:36 ] Completed Epoch: 13 batch 348: backward pass 55.111 ms, 26.34 s total | |
[ 2023-10-07 23:33:37 ] Completed Epoch: 13 batch 348: computing loss 1,142.869 ms, 27.49 s total | |
EPOCH: [13], BATCH: [348/889], loss: 0.388, loss_box_reg: 0.117, loss_classifier: 0.101, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 348 | |
[ 2023-10-07 23:33:38 ] Completed saving temp checkpoint 1,174.962 ms, 28.66 s total | |
[ 2023-10-07 23:33:38 ] Completed replacing temp checkpoint with checkpoint 47.639 ms, 28.71 s total | |
[ 2023-10-07 23:33:38 ] Completed Epoch: 13 batch 349: moving batch data to device 20.743 ms, 28.73 s total | |
[ 2023-10-07 23:33:39 ] Completed Epoch: 13 batch 349: forward pass 306.627 ms, 29.04 s total | |
[ 2023-10-07 23:33:39 ] Completed Epoch: 13 batch 349: backward pass 71.743 ms, 29.11 s total | |
[ 2023-10-07 23:33:40 ] Completed Epoch: 13 batch 349: computing loss 1,246.800 ms, 30.35 s total | |
EPOCH: [13], BATCH: [349/889], loss: 0.401, loss_box_reg: 0.125, loss_classifier: 0.102, loss_mask: 0.135, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 349 | |
[ 2023-10-07 23:33:41 ] Completed saving temp checkpoint 959.687 ms, 31.31 s total | |
[ 2023-10-07 23:33:41 ] Completed replacing temp checkpoint with checkpoint 55.856 ms, 31.37 s total | |
[ 2023-10-07 23:33:41 ] Completed Epoch: 13 batch 350: moving batch data to device 20.964 ms, 31.39 s total | |
[ 2023-10-07 23:33:41 ] Completed Epoch: 13 batch 350: forward pass 328.210 ms, 31.72 s total | |
[ 2023-10-07 23:33:41 ] Completed Epoch: 13 batch 350: backward pass 39.729 ms, 31.76 s total | |
[ 2023-10-07 23:33:43 ] Completed Epoch: 13 batch 350: computing loss 1,392.202 ms, 33.15 s total | |
EPOCH: [13], BATCH: [350/889], loss: 0.387, loss_box_reg: 0.121, loss_classifier: 0.095, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 350 | |
[ 2023-10-07 23:33:44 ] Completed saving temp checkpoint 1,023.027 ms, 34.17 s total | |
[ 2023-10-07 23:33:44 ] Completed replacing temp checkpoint with checkpoint 56.690 ms, 34.23 s total | |
[ 2023-10-07 23:33:44 ] Completed Epoch: 13 batch 351: moving batch data to device 21.409 ms, 34.25 s total | |
[ 2023-10-07 23:33:44 ] Completed Epoch: 13 batch 351: forward pass 325.704 ms, 34.58 s total | |
[ 2023-10-07 23:33:44 ] Completed Epoch: 13 batch 351: backward pass 46.213 ms, 34.62 s total | |
[ 2023-10-07 23:33:46 ] Completed Epoch: 13 batch 351: computing loss 1,642.433 ms, 36.27 s total | |
EPOCH: [13], BATCH: [351/889], loss: 0.444, loss_box_reg: 0.135, loss_classifier: 0.116, loss_mask: 0.142, loss_objectness: 0.017, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 351 | |
[ 2023-10-07 23:33:47 ] Completed saving temp checkpoint 1,061.084 ms, 37.33 s total | |
[ 2023-10-07 23:33:47 ] Completed replacing temp checkpoint with checkpoint 72.389 ms, 37.40 s total | |
[ 2023-10-07 23:33:47 ] Completed Epoch: 13 batch 352: moving batch data to device 23.281 ms, 37.42 s total | |
[ 2023-10-07 23:33:47 ] Completed Epoch: 13 batch 352: forward pass 329.441 ms, 37.75 s total | |
[ 2023-10-07 23:33:47 ] Completed Epoch: 13 batch 352: backward pass 113.344 ms, 37.87 s total | |
[ 2023-10-07 23:33:49 ] Completed Epoch: 13 batch 352: computing loss 1,419.980 ms, 39.29 s total | |
EPOCH: [13], BATCH: [352/889], loss: 0.400, loss_box_reg: 0.121, loss_classifier: 0.103, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 352 | |
[ 2023-10-07 23:33:50 ] Completed saving temp checkpoint 1,275.732 ms, 40.56 s total | |
[ 2023-10-07 23:33:50 ] Completed replacing temp checkpoint with checkpoint 47.744 ms, 40.61 s total | |
[ 2023-10-07 23:33:50 ] Completed Epoch: 13 batch 353: moving batch data to device 23.691 ms, 40.63 s total | |
[ 2023-10-07 23:33:50 ] Completed Epoch: 13 batch 353: forward pass 330.090 ms, 40.96 s total | |
[ 2023-10-07 23:33:51 ] Completed Epoch: 13 batch 353: backward pass 75.218 ms, 41.04 s total | |
[ 2023-10-07 23:33:52 ] Completed Epoch: 13 batch 353: computing loss 1,348.106 ms, 42.39 s total | |
EPOCH: [13], BATCH: [353/889], loss: 0.373, loss_box_reg: 0.112, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 353 | |
[ 2023-10-07 23:33:54 ] Completed saving temp checkpoint 1,920.569 ms, 44.31 s total | |
[ 2023-10-07 23:33:54 ] Completed replacing temp checkpoint with checkpoint 79.363 ms, 44.39 s total | |
[ 2023-10-07 23:33:54 ] Completed Epoch: 13 batch 354: moving batch data to device 22.917 ms, 44.41 s total | |
[ 2023-10-07 23:33:54 ] Completed Epoch: 13 batch 354: forward pass 364.436 ms, 44.77 s total | |
[ 2023-10-07 23:33:54 ] Completed Epoch: 13 batch 354: backward pass 89.313 ms, 44.86 s total | |
[ 2023-10-07 23:33:56 ] Completed Epoch: 13 batch 354: computing loss 1,315.139 ms, 46.18 s total | |
EPOCH: [13], BATCH: [354/889], loss: 0.402, loss_box_reg: 0.117, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.020, loss_rpn_box_reg: 0.036 | |
Saving checkpoint at epoch 13 train batch 354 | |
[ 2023-10-07 23:33:57 ] Completed saving temp checkpoint 1,360.318 ms, 47.54 s total | |
[ 2023-10-07 23:33:57 ] Completed replacing temp checkpoint with checkpoint 38.209 ms, 47.58 s total | |
[ 2023-10-07 23:33:57 ] Completed Epoch: 13 batch 355: moving batch data to device 21.975 ms, 47.60 s total | |
[ 2023-10-07 23:33:58 ] Completed Epoch: 13 batch 355: forward pass 384.107 ms, 47.98 s total | |
[ 2023-10-07 23:33:58 ] Completed Epoch: 13 batch 355: backward pass 64.964 ms, 48.05 s total | |
[ 2023-10-07 23:34:00 ] Completed Epoch: 13 batch 355: computing loss 2,054.364 ms, 50.10 s total | |
EPOCH: [13], BATCH: [355/889], loss: 0.385, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 355 | |
[ 2023-10-07 23:34:01 ] Completed saving temp checkpoint 1,339.994 ms, 51.44 s total | |
[ 2023-10-07 23:34:01 ] Completed replacing temp checkpoint with checkpoint 68.263 ms, 51.51 s total | |
[ 2023-10-07 23:34:01 ] Completed Epoch: 13 batch 356: moving batch data to device 24.354 ms, 51.53 s total | |
[ 2023-10-07 23:34:01 ] Completed Epoch: 13 batch 356: forward pass 404.173 ms, 51.94 s total | |
[ 2023-10-07 23:34:02 ] Completed Epoch: 13 batch 356: backward pass 74.507 ms, 52.01 s total | |
[ 2023-10-07 23:34:03 ] Completed Epoch: 13 batch 356: computing loss 1,342.839 ms, 53.36 s total | |
EPOCH: [13], BATCH: [356/889], loss: 0.338, loss_box_reg: 0.101, loss_classifier: 0.086, loss_mask: 0.118, loss_objectness: 0.012, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 356 | |
[ 2023-10-07 23:34:05 ] Completed saving temp checkpoint 1,723.358 ms, 55.08 s total | |
[ 2023-10-07 23:34:05 ] Completed replacing temp checkpoint with checkpoint 74.056 ms, 55.15 s total | |
[ 2023-10-07 23:34:05 ] Completed Epoch: 13 batch 357: moving batch data to device 12.970 ms, 55.17 s total | |
[ 2023-10-07 23:34:05 ] Completed Epoch: 13 batch 357: forward pass 373.815 ms, 55.54 s total | |
[ 2023-10-07 23:34:05 ] Completed Epoch: 13 batch 357: backward pass 71.087 ms, 55.61 s total | |
[ 2023-10-07 23:34:06 ] Completed Epoch: 13 batch 357: computing loss 1,151.223 ms, 56.76 s total | |
EPOCH: [13], BATCH: [357/889], loss: 0.386, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.131, loss_objectness: 0.013, loss_rpn_box_reg: 0.019 | |
Saving checkpoint at epoch 13 train batch 357 | |
[ 2023-10-07 23:34:07 ] Completed saving temp checkpoint 1,146.887 ms, 57.91 s total | |
[ 2023-10-07 23:34:07 ] Completed replacing temp checkpoint with checkpoint 35.700 ms, 57.94 s total | |
[ 2023-10-07 23:34:07 ] Completed Epoch: 13 batch 358: moving batch data to device 22.465 ms, 57.97 s total | |
[ 2023-10-07 23:34:08 ] Completed Epoch: 13 batch 358: forward pass 325.907 ms, 58.29 s total | |
[ 2023-10-07 23:34:08 ] Completed Epoch: 13 batch 358: backward pass 75.351 ms, 58.37 s total | |
[ 2023-10-07 23:34:09 ] Completed Epoch: 13 batch 358: computing loss 1,243.902 ms, 59.61 s total | |
EPOCH: [13], BATCH: [358/889], loss: 0.369, loss_box_reg: 0.111, loss_classifier: 0.089, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 358 | |
[ 2023-10-07 23:34:10 ] Completed saving temp checkpoint 1,272.832 ms, 60.89 s total | |
[ 2023-10-07 23:34:10 ] Completed replacing temp checkpoint with checkpoint 59.639 ms, 60.94 s total | |
[ 2023-10-07 23:34:10 ] Completed Epoch: 13 batch 359: moving batch data to device 21.379 ms, 60.97 s total | |
[ 2023-10-07 23:34:11 ] Completed Epoch: 13 batch 359: forward pass 333.195 ms, 61.30 s total | |
[ 2023-10-07 23:34:11 ] Completed Epoch: 13 batch 359: backward pass 67.879 ms, 61.37 s total | |
[ 2023-10-07 23:34:12 ] Completed Epoch: 13 batch 359: computing loss 1,261.087 ms, 62.63 s total | |
EPOCH: [13], BATCH: [359/889], loss: 0.402, loss_box_reg: 0.121, loss_classifier: 0.105, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 359 | |
[ 2023-10-07 23:34:13 ] Completed saving temp checkpoint 1,088.679 ms, 63.72 s total | |
[ 2023-10-07 23:34:13 ] Completed replacing temp checkpoint with checkpoint 58.503 ms, 63.78 s total | |
[ 2023-10-07 23:34:13 ] Completed Epoch: 13 batch 360: moving batch data to device 22.417 ms, 63.80 s total | |
[ 2023-10-07 23:34:14 ] Completed Epoch: 13 batch 360: forward pass 328.434 ms, 64.13 s total | |
[ 2023-10-07 23:34:14 ] Completed Epoch: 13 batch 360: backward pass 37.743 ms, 64.16 s total | |
[ 2023-10-07 23:34:15 ] Completed Epoch: 13 batch 360: computing loss 1,255.995 ms, 65.42 s total | |
EPOCH: [13], BATCH: [360/889], loss: 0.402, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 360 | |
[ 2023-10-07 23:34:16 ] Completed saving temp checkpoint 956.155 ms, 66.38 s total | |
[ 2023-10-07 23:34:16 ] Completed replacing temp checkpoint with checkpoint 59.376 ms, 66.44 s total | |
[ 2023-10-07 23:34:16 ] Completed Epoch: 13 batch 361: moving batch data to device 21.456 ms, 66.46 s total | |
[ 2023-10-07 23:34:16 ] Completed Epoch: 13 batch 361: forward pass 330.692 ms, 66.79 s total | |
[ 2023-10-07 23:34:16 ] Completed Epoch: 13 batch 361: backward pass 53.448 ms, 66.84 s total | |
[ 2023-10-07 23:34:18 ] Completed Epoch: 13 batch 361: computing loss 1,737.514 ms, 68.58 s total | |
EPOCH: [13], BATCH: [361/889], loss: 0.378, loss_box_reg: 0.108, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 361 | |
[ 2023-10-07 23:34:19 ] Completed saving temp checkpoint 1,093.964 ms, 69.67 s total | |
[ 2023-10-07 23:34:19 ] Completed replacing temp checkpoint with checkpoint 68.408 ms, 69.74 s total | |
[ 2023-10-07 23:34:19 ] Completed Epoch: 13 batch 362: moving batch data to device 23.580 ms, 69.76 s total | |
[ 2023-10-07 23:34:20 ] Completed Epoch: 13 batch 362: forward pass 333.161 ms, 70.10 s total | |
[ 2023-10-07 23:34:20 ] Completed Epoch: 13 batch 362: backward pass 38.347 ms, 70.14 s total | |
[ 2023-10-07 23:34:21 ] Completed Epoch: 13 batch 362: computing loss 1,685.245 ms, 71.82 s total | |
EPOCH: [13], BATCH: [362/889], loss: 0.408, loss_box_reg: 0.123, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 362 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-07 23:47:24 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 23:47:24 ] Completed importing Timer 0.023 ms, 0.00 s total | |
[ 2023-10-07 23:47:24 ] Completed importing everything else 515.048 ms, 0.52 s total | |
[ 2023-10-07 23:47:24 ] Completed defined other functions 0.022 ms, 0.52 s total | |
| distributed init (rank 0): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 2): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-07 23:47:33 ] Completed main preliminaries 8,253.865 ms, 8.77 s total | |
loading annotations into memory... | |
Done (t=10.81s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.27s) | |
creating index... | |
index created! | |
[ 2023-10-07 23:47:45 ] Completed loading data 12,592.059 ms, 21.36 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-07 23:47:45 ] Completed creating data samplers 97.896 ms, 21.46 s total | |
[ 2023-10-07 23:47:45 ] Completed creating data loaders 0.201 ms, 21.46 s total | |
[ 2023-10-07 23:47:46 ] Completed creating model and .to(device) 644.435 ms, 22.10 s total | |
[ 2023-10-07 23:47:48 ] Completed preparing model for distributed training 2,345.585 ms, 24.45 s total | |
[ 2023-10-07 23:47:48 ] Completed optimizer and scaler 0.612 ms, 24.45 s total | |
[ 2023-10-07 23:47:48 ] Completed learning rate schedulers 0.234 ms, 24.45 s total | |
[ 2023-10-07 23:47:49 ] Completed init coco evaluator 970.920 ms, 25.42 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-07 23:47:50 ] Completed retrieving checkpoint 876.522 ms, 26.30 s total | |
EPOCH :: 13 | |
[ 2023-10-07 23:47:50 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-07 23:47:50 ] Completed training preliminaries 0.846 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 362 | |
[ 2023-10-07 23:47:51 ] Completed Epoch: 13 batch 362: moving batch data to device 493.186 ms, 0.49 s total | |
[ 2023-10-07 23:47:52 ] Completed Epoch: 13 batch 362: forward pass 1,225.079 ms, 1.72 s total | |
[ 2023-10-07 23:47:52 ] Completed Epoch: 13 batch 362: backward pass 162.234 ms, 1.88 s total | |
[ 2023-10-07 23:47:53 ] Completed Epoch: 13 batch 362: computing loss 563.206 ms, 2.44 s total | |
EPOCH: [13], BATCH: [362/889], loss: 0.409, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.138, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 362 | |
[ 2023-10-07 23:47:54 ] Completed saving temp checkpoint 1,027.034 ms, 3.47 s total | |
[ 2023-10-07 23:47:54 ] Completed replacing temp checkpoint with checkpoint 148.988 ms, 3.62 s total | |
[ 2023-10-07 23:47:54 ] Completed Epoch: 13 batch 363: moving batch data to device 4.902 ms, 3.63 s total | |
[ 2023-10-07 23:47:54 ] Completed Epoch: 13 batch 363: forward pass 211.946 ms, 3.84 s total | |
[ 2023-10-07 23:47:54 ] Completed Epoch: 13 batch 363: backward pass 251.583 ms, 4.09 s total | |
[ 2023-10-07 23:47:54 ] Completed Epoch: 13 batch 363: computing loss 96.260 ms, 4.19 s total | |
EPOCH: [13], BATCH: [363/889], loss: 0.370, loss_box_reg: 0.107, loss_classifier: 0.094, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 363 | |
[ 2023-10-07 23:47:55 ] Completed saving temp checkpoint 967.078 ms, 5.15 s total | |
[ 2023-10-07 23:47:55 ] Completed replacing temp checkpoint with checkpoint 66.191 ms, 5.22 s total | |
[ 2023-10-07 23:47:55 ] Completed Epoch: 13 batch 364: moving batch data to device 2.361 ms, 5.22 s total | |
[ 2023-10-07 23:47:56 ] Completed Epoch: 13 batch 364: forward pass 110.610 ms, 5.33 s total | |
[ 2023-10-07 23:47:56 ] Completed Epoch: 13 batch 364: backward pass 119.743 ms, 5.45 s total | |
[ 2023-10-07 23:47:56 ] Completed Epoch: 13 batch 364: computing loss 101.803 ms, 5.55 s total | |
EPOCH: [13], BATCH: [364/889], loss: 0.375, loss_box_reg: 0.114, loss_classifier: 0.088, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 364 | |
[ 2023-10-07 23:47:57 ] Completed saving temp checkpoint 1,091.830 ms, 6.64 s total | |
[ 2023-10-07 23:47:57 ] Completed replacing temp checkpoint with checkpoint 58.145 ms, 6.70 s total | |
[ 2023-10-07 23:47:57 ] Completed Epoch: 13 batch 365: moving batch data to device 62.499 ms, 6.77 s total | |
[ 2023-10-07 23:47:57 ] Completed Epoch: 13 batch 365: forward pass 106.019 ms, 6.87 s total | |
[ 2023-10-07 23:47:57 ] Completed Epoch: 13 batch 365: backward pass 82.787 ms, 6.95 s total | |
[ 2023-10-07 23:47:57 ] Completed Epoch: 13 batch 365: computing loss 122.762 ms, 7.08 s total | |
EPOCH: [13], BATCH: [365/889], loss: 0.373, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 365 | |
[ 2023-10-07 23:47:59 ] Completed saving temp checkpoint 1,534.573 ms, 8.61 s total | |
[ 2023-10-07 23:47:59 ] Completed replacing temp checkpoint with checkpoint 27.969 ms, 8.64 s total | |
[ 2023-10-07 23:47:59 ] Completed Epoch: 13 batch 366: moving batch data to device 2.295 ms, 8.64 s total | |
[ 2023-10-07 23:47:59 ] Completed Epoch: 13 batch 366: forward pass 106.308 ms, 8.75 s total | |
[ 2023-10-07 23:47:59 ] Completed Epoch: 13 batch 366: backward pass 33.797 ms, 8.78 s total | |
[ 2023-10-07 23:47:59 ] Completed Epoch: 13 batch 366: computing loss 179.739 ms, 8.96 s total | |
EPOCH: [13], BATCH: [366/889], loss: 0.382, loss_box_reg: 0.113, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 366 | |
[ 2023-10-07 23:48:01 ] Completed saving temp checkpoint 1,949.756 ms, 10.91 s total | |
[ 2023-10-07 23:48:01 ] Completed replacing temp checkpoint with checkpoint 66.658 ms, 10.98 s total | |
[ 2023-10-07 23:48:01 ] Completed Epoch: 13 batch 367: moving batch data to device 4.493 ms, 10.98 s total | |
[ 2023-10-07 23:48:01 ] Completed Epoch: 13 batch 367: forward pass 104.361 ms, 11.09 s total | |
[ 2023-10-07 23:48:01 ] Completed Epoch: 13 batch 367: backward pass 79.752 ms, 11.17 s total | |
[ 2023-10-07 23:48:02 ] Completed Epoch: 13 batch 367: computing loss 124.332 ms, 11.29 s total | |
EPOCH: [13], BATCH: [367/889], loss: 0.382, loss_box_reg: 0.118, loss_classifier: 0.097, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 367 | |
[ 2023-10-07 23:48:03 ] Completed saving temp checkpoint 1,309.782 ms, 12.60 s total | |
[ 2023-10-07 23:48:03 ] Completed replacing temp checkpoint with checkpoint 48.290 ms, 12.65 s total | |
[ 2023-10-07 23:48:03 ] Completed Epoch: 13 batch 368: moving batch data to device 3.990 ms, 12.65 s total | |
[ 2023-10-07 23:48:03 ] Completed Epoch: 13 batch 368: forward pass 111.543 ms, 12.76 s total | |
[ 2023-10-07 23:48:03 ] Completed Epoch: 13 batch 368: backward pass 81.619 ms, 12.85 s total | |
[ 2023-10-07 23:48:03 ] Completed Epoch: 13 batch 368: computing loss 124.674 ms, 12.97 s total | |
EPOCH: [13], BATCH: [368/889], loss: 0.388, loss_box_reg: 0.115, loss_classifier: 0.098, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 368 | |
[ 2023-10-07 23:48:05 ] Completed saving temp checkpoint 1,522.662 ms, 14.49 s total | |
[ 2023-10-07 23:48:05 ] Completed replacing temp checkpoint with checkpoint 100.141 ms, 14.59 s total | |
[ 2023-10-07 23:48:05 ] Completed Epoch: 13 batch 369: moving batch data to device 7.041 ms, 14.60 s total | |
[ 2023-10-07 23:48:05 ] Completed Epoch: 13 batch 369: forward pass 108.958 ms, 14.71 s total | |
[ 2023-10-07 23:48:05 ] Completed Epoch: 13 batch 369: backward pass 69.310 ms, 14.78 s total | |
[ 2023-10-07 23:48:05 ] Completed Epoch: 13 batch 369: computing loss 124.962 ms, 14.90 s total | |
EPOCH: [13], BATCH: [369/889], loss: 0.375, loss_box_reg: 0.112, loss_classifier: 0.093, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 369 | |
[ 2023-10-07 23:48:07 ] Completed saving temp checkpoint 1,372.162 ms, 16.28 s total | |
[ 2023-10-07 23:48:07 ] Completed replacing temp checkpoint with checkpoint 92.533 ms, 16.37 s total | |
[ 2023-10-07 23:48:07 ] Completed Epoch: 13 batch 370: moving batch data to device 3.018 ms, 16.37 s total | |
[ 2023-10-07 23:48:07 ] Completed Epoch: 13 batch 370: forward pass 169.624 ms, 16.54 s total | |
[ 2023-10-07 23:48:07 ] Completed Epoch: 13 batch 370: backward pass 67.342 ms, 16.61 s total | |
[ 2023-10-07 23:48:07 ] Completed Epoch: 13 batch 370: computing loss 145.552 ms, 16.75 s total | |
EPOCH: [13], BATCH: [370/889], loss: 0.346, loss_box_reg: 0.107, loss_classifier: 0.085, loss_mask: 0.122, loss_objectness: 0.013, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 370 | |
[ 2023-10-07 23:48:09 ] Completed saving temp checkpoint 1,651.696 ms, 18.41 s total | |
[ 2023-10-07 23:48:09 ] Completed replacing temp checkpoint with checkpoint 100.076 ms, 18.51 s total | |
[ 2023-10-07 23:48:09 ] Completed Epoch: 13 batch 371: moving batch data to device 9.638 ms, 18.52 s total | |
[ 2023-10-07 23:48:09 ] Completed Epoch: 13 batch 371: forward pass 108.913 ms, 18.62 s total | |
[ 2023-10-07 23:48:09 ] Completed Epoch: 13 batch 371: backward pass 77.859 ms, 18.70 s total | |
[ 2023-10-07 23:48:09 ] Completed Epoch: 13 batch 371: computing loss 110.183 ms, 18.81 s total | |
EPOCH: [13], BATCH: [371/889], loss: 0.354, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.121, loss_objectness: 0.013, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 371 | |
[ 2023-10-07 23:48:11 ] Completed saving temp checkpoint 1,673.481 ms, 20.49 s total | |
[ 2023-10-07 23:48:11 ] Completed replacing temp checkpoint with checkpoint 91.700 ms, 20.58 s total | |
[ 2023-10-07 23:48:11 ] Completed Epoch: 13 batch 372: moving batch data to device 7.860 ms, 20.59 s total | |
[ 2023-10-07 23:48:11 ] Completed Epoch: 13 batch 372: forward pass 112.286 ms, 20.70 s total | |
[ 2023-10-07 23:48:11 ] Completed Epoch: 13 batch 372: backward pass 57.925 ms, 20.76 s total | |
[ 2023-10-07 23:48:11 ] Completed Epoch: 13 batch 372: computing loss 134.790 ms, 20.89 s total | |
EPOCH: [13], BATCH: [372/889], loss: 0.407, loss_box_reg: 0.125, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.020, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 372 | |
[ 2023-10-07 23:48:13 ] Completed saving temp checkpoint 1,913.322 ms, 22.80 s total | |
[ 2023-10-07 23:48:13 ] Completed replacing temp checkpoint with checkpoint 61.025 ms, 22.87 s total | |
[ 2023-10-07 23:48:13 ] Completed Epoch: 13 batch 373: moving batch data to device 5.316 ms, 22.87 s total | |
[ 2023-10-07 23:48:13 ] Completed Epoch: 13 batch 373: forward pass 104.895 ms, 22.98 s total | |
[ 2023-10-07 23:48:13 ] Completed Epoch: 13 batch 373: backward pass 88.657 ms, 23.06 s total | |
[ 2023-10-07 23:48:13 ] Completed Epoch: 13 batch 373: computing loss 104.703 ms, 23.17 s total | |
EPOCH: [13], BATCH: [373/889], loss: 0.370, loss_box_reg: 0.112, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.019 | |
Saving checkpoint at epoch 13 train batch 373 | |
[ 2023-10-07 23:48:15 ] Completed saving temp checkpoint 1,665.656 ms, 24.83 s total | |
[ 2023-10-07 23:48:15 ] Completed replacing temp checkpoint with checkpoint 43.104 ms, 24.88 s total | |
[ 2023-10-07 23:48:15 ] Completed Epoch: 13 batch 374: moving batch data to device 4.945 ms, 24.88 s total | |
[ 2023-10-07 23:48:15 ] Completed Epoch: 13 batch 374: forward pass 107.574 ms, 24.99 s total | |
[ 2023-10-07 23:48:15 ] Completed Epoch: 13 batch 374: backward pass 78.182 ms, 25.07 s total | |
[ 2023-10-07 23:48:15 ] Completed Epoch: 13 batch 374: computing loss 111.556 ms, 25.18 s total | |
EPOCH: [13], BATCH: [374/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 374 | |
[ 2023-10-07 23:48:17 ] Completed saving temp checkpoint 1,866.382 ms, 27.05 s total | |
[ 2023-10-07 23:48:17 ] Completed replacing temp checkpoint with checkpoint 57.519 ms, 27.10 s total | |
[ 2023-10-07 23:48:17 ] Completed Epoch: 13 batch 375: moving batch data to device 8.798 ms, 27.11 s total | |
[ 2023-10-07 23:48:17 ] Completed Epoch: 13 batch 375: forward pass 105.348 ms, 27.22 s total | |
[ 2023-10-07 23:48:18 ] Completed Epoch: 13 batch 375: backward pass 73.175 ms, 27.29 s total | |
[ 2023-10-07 23:48:18 ] Completed Epoch: 13 batch 375: computing loss 124.368 ms, 27.42 s total | |
EPOCH: [13], BATCH: [375/889], loss: 0.400, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.134, loss_objectness: 0.019, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 375 | |
[ 2023-10-07 23:48:19 ] Completed saving temp checkpoint 1,751.293 ms, 29.17 s total | |
[ 2023-10-07 23:48:19 ] Completed replacing temp checkpoint with checkpoint 65.426 ms, 29.23 s total | |
[ 2023-10-07 23:48:19 ] Completed Epoch: 13 batch 376: moving batch data to device 7.237 ms, 29.24 s total | |
[ 2023-10-07 23:48:20 ] Completed Epoch: 13 batch 376: forward pass 109.062 ms, 29.35 s total | |
[ 2023-10-07 23:48:20 ] Completed Epoch: 13 batch 376: backward pass 80.903 ms, 29.43 s total | |
[ 2023-10-07 23:48:20 ] Completed Epoch: 13 batch 376: computing loss 110.329 ms, 29.54 s total | |
EPOCH: [13], BATCH: [376/889], loss: 0.387, loss_box_reg: 0.120, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 376 | |
[ 2023-10-07 23:48:22 ] Completed saving temp checkpoint 2,177.750 ms, 31.72 s total | |
[ 2023-10-07 23:48:22 ] Completed replacing temp checkpoint with checkpoint 74.440 ms, 31.79 s total | |
[ 2023-10-07 23:48:22 ] Completed Epoch: 13 batch 377: moving batch data to device 5.085 ms, 31.80 s total | |
[ 2023-10-07 23:48:22 ] Completed Epoch: 13 batch 377: forward pass 106.509 ms, 31.90 s total | |
[ 2023-10-07 23:48:22 ] Completed Epoch: 13 batch 377: backward pass 73.767 ms, 31.98 s total | |
[ 2023-10-07 23:48:22 ] Completed Epoch: 13 batch 377: computing loss 126.992 ms, 32.10 s total | |
EPOCH: [13], BATCH: [377/889], loss: 0.413, loss_box_reg: 0.127, loss_classifier: 0.103, loss_mask: 0.134, loss_objectness: 0.019, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 377 | |
[ 2023-10-07 23:48:24 ] Completed saving temp checkpoint 1,570.130 ms, 33.67 s total | |
[ 2023-10-07 23:48:24 ] Completed replacing temp checkpoint with checkpoint 37.146 ms, 33.71 s total | |
[ 2023-10-07 23:48:24 ] Completed Epoch: 13 batch 378: moving batch data to device 6.902 ms, 33.72 s total | |
[ 2023-10-07 23:48:24 ] Completed Epoch: 13 batch 378: forward pass 107.489 ms, 33.83 s total | |
[ 2023-10-07 23:48:24 ] Completed Epoch: 13 batch 378: backward pass 80.317 ms, 33.91 s total | |
[ 2023-10-07 23:48:24 ] Completed Epoch: 13 batch 378: computing loss 115.563 ms, 34.02 s total | |
EPOCH: [13], BATCH: [378/889], loss: 0.366, loss_box_reg: 0.105, loss_classifier: 0.084, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 378 | |
[ 2023-10-07 23:48:26 ] Completed saving temp checkpoint 1,783.015 ms, 35.80 s total | |
[ 2023-10-07 23:48:26 ] Completed replacing temp checkpoint with checkpoint 73.394 ms, 35.88 s total | |
[ 2023-10-07 23:48:26 ] Completed Epoch: 13 batch 379: moving batch data to device 6.756 ms, 35.88 s total | |
[ 2023-10-07 23:48:26 ] Completed Epoch: 13 batch 379: forward pass 105.477 ms, 35.99 s total | |
[ 2023-10-07 23:48:26 ] Completed Epoch: 13 batch 379: backward pass 72.866 ms, 36.06 s total | |
[ 2023-10-07 23:48:26 ] Completed Epoch: 13 batch 379: computing loss 112.501 ms, 36.18 s total | |
EPOCH: [13], BATCH: [379/889], loss: 0.377, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 379 | |
[ 2023-10-07 23:48:28 ] Completed saving temp checkpoint 1,740.085 ms, 37.92 s total | |
[ 2023-10-07 23:48:28 ] Completed replacing temp checkpoint with checkpoint 51.078 ms, 37.97 s total | |
[ 2023-10-07 23:48:28 ] Completed Epoch: 13 batch 380: moving batch data to device 4.776 ms, 37.97 s total | |
[ 2023-10-07 23:48:28 ] Completed Epoch: 13 batch 380: forward pass 103.542 ms, 38.08 s total | |
[ 2023-10-07 23:48:28 ] Completed Epoch: 13 batch 380: backward pass 71.719 ms, 38.15 s total | |
[ 2023-10-07 23:48:29 ] Completed Epoch: 13 batch 380: computing loss 203.538 ms, 38.35 s total | |
EPOCH: [13], BATCH: [380/889], loss: 0.386, loss_box_reg: 0.115, loss_classifier: 0.096, loss_mask: 0.133, loss_objectness: 0.013, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 380 | |
[ 2023-10-07 23:48:30 ] Completed saving temp checkpoint 1,626.568 ms, 39.98 s total | |
[ 2023-10-07 23:48:30 ] Completed replacing temp checkpoint with checkpoint 71.523 ms, 40.05 s total | |
[ 2023-10-07 23:48:30 ] Completed Epoch: 13 batch 381: moving batch data to device 7.019 ms, 40.06 s total | |
[ 2023-10-07 23:48:30 ] Completed Epoch: 13 batch 381: forward pass 102.579 ms, 40.16 s total | |
[ 2023-10-07 23:48:30 ] Completed Epoch: 13 batch 381: backward pass 33.896 ms, 40.19 s total | |
[ 2023-10-07 23:48:31 ] Completed Epoch: 13 batch 381: computing loss 155.006 ms, 40.35 s total | |
EPOCH: [13], BATCH: [381/889], loss: 0.371, loss_box_reg: 0.112, loss_classifier: 0.087, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 381 | |
[ 2023-10-07 23:48:32 ] Completed saving temp checkpoint 1,488.113 ms, 41.84 s total | |
[ 2023-10-07 23:48:32 ] Completed replacing temp checkpoint with checkpoint 46.609 ms, 41.88 s total | |
[ 2023-10-07 23:48:32 ] Completed Epoch: 13 batch 382: moving batch data to device 5.370 ms, 41.89 s total | |
[ 2023-10-07 23:48:32 ] Completed Epoch: 13 batch 382: forward pass 106.807 ms, 41.99 s total | |
[ 2023-10-07 23:48:32 ] Completed Epoch: 13 batch 382: backward pass 38.508 ms, 42.03 s total | |
[ 2023-10-07 23:48:32 ] Completed Epoch: 13 batch 382: computing loss 151.446 ms, 42.18 s total | |
EPOCH: [13], BATCH: [382/889], loss: 0.389, loss_box_reg: 0.118, loss_classifier: 0.095, loss_mask: 0.136, loss_objectness: 0.013, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 382 | |
[ 2023-10-07 23:48:35 ] Completed saving temp checkpoint 2,117.420 ms, 44.30 s total | |
[ 2023-10-07 23:48:35 ] Completed replacing temp checkpoint with checkpoint 31.635 ms, 44.33 s total | |
[ 2023-10-07 23:48:35 ] Completed Epoch: 13 batch 383: moving batch data to device 8.080 ms, 44.34 s total | |
[ 2023-10-07 23:48:35 ] Completed Epoch: 13 batch 383: forward pass 104.409 ms, 44.45 s total | |
[ 2023-10-07 23:48:35 ] Completed Epoch: 13 batch 383: backward pass 56.629 ms, 44.50 s total | |
[ 2023-10-07 23:48:35 ] Completed Epoch: 13 batch 383: computing loss 142.091 ms, 44.64 s total | |
EPOCH: [13], BATCH: [383/889], loss: 0.404, loss_box_reg: 0.119, loss_classifier: 0.103, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 383 | |
[ 2023-10-07 23:48:36 ] Completed saving temp checkpoint 1,296.659 ms, 45.94 s total | |
[ 2023-10-07 23:48:36 ] Completed replacing temp checkpoint with checkpoint 68.046 ms, 46.01 s total | |
[ 2023-10-07 23:48:36 ] Completed Epoch: 13 batch 384: moving batch data to device 8.051 ms, 46.02 s total | |
[ 2023-10-07 23:48:36 ] Completed Epoch: 13 batch 384: forward pass 103.593 ms, 46.12 s total | |
[ 2023-10-07 23:48:36 ] Completed Epoch: 13 batch 384: backward pass 78.081 ms, 46.20 s total | |
[ 2023-10-07 23:48:37 ] Completed Epoch: 13 batch 384: computing loss 117.721 ms, 46.32 s total | |
EPOCH: [13], BATCH: [384/889], loss: 0.376, loss_box_reg: 0.107, loss_classifier: 0.094, loss_mask: 0.123, loss_objectness: 0.024, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 384 | |
[ 2023-10-07 23:48:38 ] Completed saving temp checkpoint 1,350.469 ms, 47.67 s total | |
[ 2023-10-07 23:48:38 ] Completed replacing temp checkpoint with checkpoint 50.772 ms, 47.72 s total | |
[ 2023-10-07 23:48:38 ] Completed Epoch: 13 batch 385: moving batch data to device 7.990 ms, 47.73 s total | |
[ 2023-10-07 23:48:38 ] Completed Epoch: 13 batch 385: forward pass 105.723 ms, 47.83 s total | |
[ 2023-10-07 23:48:38 ] Completed Epoch: 13 batch 385: backward pass 31.400 ms, 47.86 s total | |
[ 2023-10-07 23:48:38 ] Completed Epoch: 13 batch 385: computing loss 138.899 ms, 48.00 s total | |
EPOCH: [13], BATCH: [385/889], loss: 0.396, loss_box_reg: 0.118, loss_classifier: 0.095, loss_mask: 0.144, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 385 | |
[ 2023-10-07 23:48:39 ] Completed saving temp checkpoint 1,215.342 ms, 49.22 s total | |
[ 2023-10-07 23:48:40 ] Completed replacing temp checkpoint with checkpoint 83.291 ms, 49.30 s total | |
[ 2023-10-07 23:48:40 ] Completed Epoch: 13 batch 386: moving batch data to device 9.583 ms, 49.31 s total | |
[ 2023-10-07 23:48:40 ] Completed Epoch: 13 batch 386: forward pass 104.164 ms, 49.41 s total | |
[ 2023-10-07 23:48:40 ] Completed Epoch: 13 batch 386: backward pass 38.731 ms, 49.45 s total | |
[ 2023-10-07 23:48:40 ] Completed Epoch: 13 batch 386: computing loss 155.418 ms, 49.61 s total | |
EPOCH: [13], BATCH: [386/889], loss: 0.393, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 386 | |
[ 2023-10-07 23:48:41 ] Completed saving temp checkpoint 1,297.559 ms, 50.91 s total | |
[ 2023-10-07 23:48:41 ] Completed replacing temp checkpoint with checkpoint 67.528 ms, 50.97 s total | |
[ 2023-10-07 23:48:41 ] Completed Epoch: 13 batch 387: moving batch data to device 6.555 ms, 50.98 s total | |
[ 2023-10-07 23:48:41 ] Completed Epoch: 13 batch 387: forward pass 110.481 ms, 51.09 s total | |
[ 2023-10-07 23:48:41 ] Completed Epoch: 13 batch 387: backward pass 68.880 ms, 51.16 s total | |
[ 2023-10-07 23:48:42 ] Completed Epoch: 13 batch 387: computing loss 122.511 ms, 51.28 s total | |
EPOCH: [13], BATCH: [387/889], loss: 0.389, loss_box_reg: 0.117, loss_classifier: 0.097, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 387 | |
[ 2023-10-07 23:48:43 ] Completed saving temp checkpoint 1,133.769 ms, 52.42 s total | |
[ 2023-10-07 23:48:43 ] Completed replacing temp checkpoint with checkpoint 77.255 ms, 52.49 s total | |
[ 2023-10-07 23:48:43 ] Completed Epoch: 13 batch 388: moving batch data to device 7.039 ms, 52.50 s total | |
[ 2023-10-07 23:48:43 ] Completed Epoch: 13 batch 388: forward pass 103.424 ms, 52.60 s total | |
[ 2023-10-07 23:48:43 ] Completed Epoch: 13 batch 388: backward pass 72.432 ms, 52.68 s total | |
[ 2023-10-07 23:48:43 ] Completed Epoch: 13 batch 388: computing loss 123.787 ms, 52.80 s total | |
EPOCH: [13], BATCH: [388/889], loss: 0.397, loss_box_reg: 0.114, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 388 | |
[ 2023-10-07 23:48:44 ] Completed saving temp checkpoint 1,237.926 ms, 54.04 s total | |
[ 2023-10-07 23:48:44 ] Completed replacing temp checkpoint with checkpoint 56.034 ms, 54.09 s total | |
[ 2023-10-07 23:48:44 ] Completed Epoch: 13 batch 389: moving batch data to device 4.636 ms, 54.10 s total | |
[ 2023-10-07 23:48:44 ] Completed Epoch: 13 batch 389: forward pass 100.810 ms, 54.20 s total | |
[ 2023-10-07 23:48:44 ] Completed Epoch: 13 batch 389: backward pass 47.355 ms, 54.25 s total | |
[ 2023-10-07 23:48:45 ] Completed Epoch: 13 batch 389: computing loss 122.380 ms, 54.37 s total | |
EPOCH: [13], BATCH: [389/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.035 | |
Saving checkpoint at epoch 13 train batch 389 | |
[ 2023-10-07 23:48:46 ] Completed saving temp checkpoint 1,151.704 ms, 55.52 s total | |
[ 2023-10-07 23:48:46 ] Completed replacing temp checkpoint with checkpoint 64.797 ms, 55.58 s total | |
[ 2023-10-07 23:48:46 ] Completed Epoch: 13 batch 390: moving batch data to device 4.861 ms, 55.59 s total | |
[ 2023-10-07 23:48:46 ] Completed Epoch: 13 batch 390: forward pass 104.399 ms, 55.69 s total | |
[ 2023-10-07 23:48:46 ] Completed Epoch: 13 batch 390: backward pass 34.300 ms, 55.73 s total | |
[ 2023-10-07 23:48:46 ] Completed Epoch: 13 batch 390: computing loss 158.703 ms, 55.89 s total | |
EPOCH: [13], BATCH: [390/889], loss: 0.409, loss_box_reg: 0.130, loss_classifier: 0.102, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 390 | |
[ 2023-10-07 23:48:48 ] Completed saving temp checkpoint 1,404.513 ms, 57.29 s total | |
[ 2023-10-07 23:48:48 ] Completed replacing temp checkpoint with checkpoint 80.087 ms, 57.37 s total | |
[ 2023-10-07 23:48:48 ] Completed Epoch: 13 batch 391: moving batch data to device 6.115 ms, 57.38 s total | |
[ 2023-10-07 23:48:48 ] Completed Epoch: 13 batch 391: forward pass 102.127 ms, 57.48 s total | |
[ 2023-10-07 23:48:48 ] Completed Epoch: 13 batch 391: backward pass 47.419 ms, 57.53 s total | |
[ 2023-10-07 23:48:48 ] Completed Epoch: 13 batch 391: computing loss 145.631 ms, 57.67 s total | |
EPOCH: [13], BATCH: [391/889], loss: 0.394, loss_box_reg: 0.125, loss_classifier: 0.098, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 391 | |
[ 2023-10-07 23:48:49 ] Completed saving temp checkpoint 1,117.501 ms, 58.79 s total | |
[ 2023-10-07 23:48:49 ] Completed replacing temp checkpoint with checkpoint 40.987 ms, 58.83 s total | |
[ 2023-10-07 23:48:49 ] Completed Epoch: 13 batch 392: moving batch data to device 4.396 ms, 58.84 s total | |
[ 2023-10-07 23:48:49 ] Completed Epoch: 13 batch 392: forward pass 106.345 ms, 58.94 s total | |
[ 2023-10-07 23:48:49 ] Completed Epoch: 13 batch 392: backward pass 36.088 ms, 58.98 s total | |
[ 2023-10-07 23:48:49 ] Completed Epoch: 13 batch 392: computing loss 155.103 ms, 59.13 s total | |
EPOCH: [13], BATCH: [392/889], loss: 0.362, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 392 | |
[ 2023-10-07 23:48:51 ] Completed saving temp checkpoint 1,975.229 ms, 61.11 s total | |
[ 2023-10-07 23:48:51 ] Completed replacing temp checkpoint with checkpoint 82.394 ms, 61.19 s total | |
[ 2023-10-07 23:48:51 ] Completed Epoch: 13 batch 393: moving batch data to device 6.410 ms, 61.20 s total | |
[ 2023-10-07 23:48:52 ] Completed Epoch: 13 batch 393: forward pass 111.458 ms, 61.31 s total | |
[ 2023-10-07 23:48:52 ] Completed Epoch: 13 batch 393: backward pass 73.696 ms, 61.38 s total | |
[ 2023-10-07 23:48:52 ] Completed Epoch: 13 batch 393: computing loss 114.262 ms, 61.50 s total | |
EPOCH: [13], BATCH: [393/889], loss: 0.390, loss_box_reg: 0.120, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 393 | |
[ 2023-10-07 23:48:53 ] Completed saving temp checkpoint 1,220.980 ms, 62.72 s total | |
[ 2023-10-07 23:48:53 ] Completed replacing temp checkpoint with checkpoint 41.441 ms, 62.76 s total | |
[ 2023-10-07 23:48:53 ] Completed Epoch: 13 batch 394: moving batch data to device 4.497 ms, 62.76 s total | |
[ 2023-10-07 23:48:53 ] Completed Epoch: 13 batch 394: forward pass 101.206 ms, 62.86 s total | |
[ 2023-10-07 23:48:53 ] Completed Epoch: 13 batch 394: backward pass 75.207 ms, 62.94 s total | |
[ 2023-10-07 23:48:53 ] Completed Epoch: 13 batch 394: computing loss 97.040 ms, 63.04 s total | |
EPOCH: [13], BATCH: [394/889], loss: 0.383, loss_box_reg: 0.118, loss_classifier: 0.097, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 394 | |
[ 2023-10-07 23:48:55 ] Completed saving temp checkpoint 1,625.280 ms, 64.66 s total | |
[ 2023-10-07 23:48:55 ] Completed replacing temp checkpoint with checkpoint 44.880 ms, 64.71 s total | |
[ 2023-10-07 23:48:55 ] Completed Epoch: 13 batch 395: moving batch data to device 4.977 ms, 64.71 s total | |
[ 2023-10-07 23:48:55 ] Completed Epoch: 13 batch 395: forward pass 107.839 ms, 64.82 s total | |
[ 2023-10-07 23:48:55 ] Completed Epoch: 13 batch 395: backward pass 70.974 ms, 64.89 s total | |
[ 2023-10-07 23:48:55 ] Completed Epoch: 13 batch 395: computing loss 111.584 ms, 65.00 s total | |
EPOCH: [13], BATCH: [395/889], loss: 0.368, loss_box_reg: 0.112, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 395 | |
[ 2023-10-07 23:48:57 ] Completed saving temp checkpoint 1,782.452 ms, 66.79 s total | |
[ 2023-10-07 23:48:57 ] Completed replacing temp checkpoint with checkpoint 43.479 ms, 66.83 s total | |
[ 2023-10-07 23:48:57 ] Completed Epoch: 13 batch 396: moving batch data to device 5.201 ms, 66.83 s total | |
[ 2023-10-07 23:48:57 ] Completed Epoch: 13 batch 396: forward pass 104.990 ms, 66.94 s total | |
[ 2023-10-07 23:48:57 ] Completed Epoch: 13 batch 396: backward pass 43.994 ms, 66.98 s total | |
[ 2023-10-07 23:48:57 ] Completed Epoch: 13 batch 396: computing loss 148.510 ms, 67.13 s total | |
EPOCH: [13], BATCH: [396/889], loss: 0.400, loss_box_reg: 0.124, loss_classifier: 0.101, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 396 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 00:09:11 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 00:09:11 ] Completed importing Timer 0.019 ms, 0.00 s total | |
[ 2023-10-08 00:09:12 ] Completed importing everything else 568.003 ms, 0.57 s total | |
[ 2023-10-08 00:09:12 ] Completed defined other functions 0.021 ms, 0.57 s total | |
| distributed init (rank 4): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 1): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 00:09:20 ] Completed main preliminaries 7,607.749 ms, 8.18 s total | |
loading annotations into memory... | |
Done (t=10.76s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.27s) | |
creating index... | |
index created! | |
[ 2023-10-08 00:09:32 ] Completed loading data 12,547.218 ms, 20.72 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 00:09:32 ] Completed creating data samplers 98.207 ms, 20.82 s total | |
[ 2023-10-08 00:09:32 ] Completed creating data loaders 0.194 ms, 20.82 s total | |
[ 2023-10-08 00:09:33 ] Completed creating model and .to(device) 639.926 ms, 21.46 s total | |
[ 2023-10-08 00:09:35 ] Completed preparing model for distributed training 1,730.414 ms, 23.19 s total | |
[ 2023-10-08 00:09:35 ] Completed optimizer and scaler 0.636 ms, 23.19 s total | |
[ 2023-10-08 00:09:35 ] Completed learning rate schedulers 0.265 ms, 23.19 s total | |
[ 2023-10-08 00:09:36 ] Completed init coco evaluator 959.684 ms, 24.15 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 00:09:36 ] Completed retrieving checkpoint 841.078 ms, 24.99 s total | |
EPOCH :: 13 | |
[ 2023-10-08 00:09:36 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 00:09:36 ] Completed training preliminaries 0.875 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 396 | |
[ 2023-10-08 00:09:37 ] Completed Epoch: 13 batch 396: moving batch data to device 464.246 ms, 0.47 s total | |
[ 2023-10-08 00:09:38 ] Completed Epoch: 13 batch 396: forward pass 1,334.083 ms, 1.80 s total | |
[ 2023-10-08 00:09:38 ] Completed Epoch: 13 batch 396: backward pass 171.036 ms, 1.97 s total | |
[ 2023-10-08 00:09:39 ] Completed Epoch: 13 batch 396: computing loss 182.057 ms, 2.15 s total | |
EPOCH: [13], BATCH: [396/889], loss: 0.402, loss_box_reg: 0.123, loss_classifier: 0.103, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 396 | |
[ 2023-10-08 00:09:40 ] Completed saving temp checkpoint 968.499 ms, 3.12 s total | |
[ 2023-10-08 00:09:40 ] Completed replacing temp checkpoint with checkpoint 180.485 ms, 3.30 s total | |
[ 2023-10-08 00:09:40 ] Completed Epoch: 13 batch 397: moving batch data to device 59.216 ms, 3.36 s total | |
[ 2023-10-08 00:09:40 ] Completed Epoch: 13 batch 397: forward pass 113.800 ms, 3.47 s total | |
[ 2023-10-08 00:09:40 ] Completed Epoch: 13 batch 397: backward pass 77.970 ms, 3.55 s total | |
[ 2023-10-08 00:09:40 ] Completed Epoch: 13 batch 397: computing loss 140.631 ms, 3.69 s total | |
EPOCH: [13], BATCH: [397/889], loss: 0.382, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.126, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 397 | |
[ 2023-10-08 00:09:41 ] Completed saving temp checkpoint 1,077.484 ms, 4.77 s total | |
[ 2023-10-08 00:09:41 ] Completed replacing temp checkpoint with checkpoint 34.308 ms, 4.80 s total | |
[ 2023-10-08 00:09:41 ] Completed Epoch: 13 batch 398: moving batch data to device 3.354 ms, 4.81 s total | |
[ 2023-10-08 00:09:41 ] Completed Epoch: 13 batch 398: forward pass 112.446 ms, 4.92 s total | |
[ 2023-10-08 00:09:41 ] Completed Epoch: 13 batch 398: backward pass 84.266 ms, 5.00 s total | |
[ 2023-10-08 00:09:42 ] Completed Epoch: 13 batch 398: computing loss 133.887 ms, 5.14 s total | |
EPOCH: [13], BATCH: [398/889], loss: 0.377, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.125, loss_objectness: 0.017, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 398 | |
[ 2023-10-08 00:09:42 ] Completed saving temp checkpoint 967.043 ms, 6.11 s total | |
[ 2023-10-08 00:09:43 ] Completed replacing temp checkpoint with checkpoint 69.118 ms, 6.17 s total | |
[ 2023-10-08 00:09:43 ] Completed Epoch: 13 batch 399: moving batch data to device 20.891 ms, 6.20 s total | |
[ 2023-10-08 00:09:43 ] Completed Epoch: 13 batch 399: forward pass 209.693 ms, 6.41 s total | |
[ 2023-10-08 00:09:43 ] Completed Epoch: 13 batch 399: backward pass 78.746 ms, 6.48 s total | |
[ 2023-10-08 00:09:43 ] Completed Epoch: 13 batch 399: computing loss 139.851 ms, 6.62 s total | |
EPOCH: [13], BATCH: [399/889], loss: 0.356, loss_box_reg: 0.105, loss_classifier: 0.089, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 399 | |
[ 2023-10-08 00:09:44 ] Completed saving temp checkpoint 1,216.887 ms, 7.84 s total | |
[ 2023-10-08 00:09:44 ] Completed replacing temp checkpoint with checkpoint 62.667 ms, 7.90 s total | |
[ 2023-10-08 00:09:44 ] Completed Epoch: 13 batch 400: moving batch data to device 5.584 ms, 7.91 s total | |
[ 2023-10-08 00:09:45 ] Completed Epoch: 13 batch 400: forward pass 223.588 ms, 8.13 s total | |
[ 2023-10-08 00:09:45 ] Completed Epoch: 13 batch 400: backward pass 36.227 ms, 8.17 s total | |
[ 2023-10-08 00:09:45 ] Completed Epoch: 13 batch 400: computing loss 166.526 ms, 8.34 s total | |
EPOCH: [13], BATCH: [400/889], loss: 0.358, loss_box_reg: 0.107, loss_classifier: 0.087, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 400 | |
[ 2023-10-08 00:09:46 ] Completed saving temp checkpoint 953.912 ms, 9.29 s total | |
[ 2023-10-08 00:09:46 ] Completed replacing temp checkpoint with checkpoint 54.316 ms, 9.34 s total | |
[ 2023-10-08 00:09:46 ] Completed Epoch: 13 batch 401: moving batch data to device 2.499 ms, 9.35 s total | |
[ 2023-10-08 00:09:46 ] Completed Epoch: 13 batch 401: forward pass 110.621 ms, 9.46 s total | |
[ 2023-10-08 00:09:46 ] Completed Epoch: 13 batch 401: backward pass 81.333 ms, 9.54 s total | |
[ 2023-10-08 00:09:46 ] Completed Epoch: 13 batch 401: computing loss 125.349 ms, 9.66 s total | |
EPOCH: [13], BATCH: [401/889], loss: 0.396, loss_box_reg: 0.121, loss_classifier: 0.103, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 401 | |
[ 2023-10-08 00:09:48 ] Completed saving temp checkpoint 1,522.659 ms, 11.19 s total | |
[ 2023-10-08 00:09:48 ] Completed replacing temp checkpoint with checkpoint 83.748 ms, 11.27 s total | |
[ 2023-10-08 00:09:48 ] Completed Epoch: 13 batch 402: moving batch data to device 3.752 ms, 11.27 s total | |
[ 2023-10-08 00:09:48 ] Completed Epoch: 13 batch 402: forward pass 108.143 ms, 11.38 s total | |
[ 2023-10-08 00:09:48 ] Completed Epoch: 13 batch 402: backward pass 97.312 ms, 11.48 s total | |
[ 2023-10-08 00:09:48 ] Completed Epoch: 13 batch 402: computing loss 115.408 ms, 11.59 s total | |
EPOCH: [13], BATCH: [402/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.109, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 402 | |
[ 2023-10-08 00:09:49 ] Completed saving temp checkpoint 958.594 ms, 12.55 s total | |
[ 2023-10-08 00:09:49 ] Completed replacing temp checkpoint with checkpoint 45.184 ms, 12.60 s total | |
[ 2023-10-08 00:09:49 ] Completed Epoch: 13 batch 403: moving batch data to device 5.178 ms, 12.60 s total | |
[ 2023-10-08 00:09:49 ] Completed Epoch: 13 batch 403: forward pass 104.928 ms, 12.71 s total | |
[ 2023-10-08 00:09:49 ] Completed Epoch: 13 batch 403: backward pass 124.811 ms, 12.83 s total | |
[ 2023-10-08 00:09:49 ] Completed Epoch: 13 batch 403: computing loss 83.152 ms, 12.92 s total | |
EPOCH: [13], BATCH: [403/889], loss: 0.378, loss_box_reg: 0.115, loss_classifier: 0.100, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 403 | |
[ 2023-10-08 00:09:50 ] Completed saving temp checkpoint 1,071.455 ms, 13.99 s total | |
[ 2023-10-08 00:09:50 ] Completed replacing temp checkpoint with checkpoint 68.797 ms, 14.06 s total | |
[ 2023-10-08 00:09:50 ] Completed Epoch: 13 batch 404: moving batch data to device 3.360 ms, 14.06 s total | |
[ 2023-10-08 00:09:51 ] Completed Epoch: 13 batch 404: forward pass 106.990 ms, 14.17 s total | |
[ 2023-10-08 00:09:51 ] Completed Epoch: 13 batch 404: backward pass 69.227 ms, 14.24 s total | |
[ 2023-10-08 00:09:51 ] Completed Epoch: 13 batch 404: computing loss 124.393 ms, 14.36 s total | |
EPOCH: [13], BATCH: [404/889], loss: 0.400, loss_box_reg: 0.121, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 404 | |
[ 2023-10-08 00:09:52 ] Completed saving temp checkpoint 984.532 ms, 15.35 s total | |
[ 2023-10-08 00:09:52 ] Completed replacing temp checkpoint with checkpoint 71.505 ms, 15.42 s total | |
[ 2023-10-08 00:09:52 ] Completed Epoch: 13 batch 405: moving batch data to device 9.926 ms, 15.43 s total | |
[ 2023-10-08 00:09:52 ] Completed Epoch: 13 batch 405: forward pass 105.710 ms, 15.53 s total | |
[ 2023-10-08 00:09:52 ] Completed Epoch: 13 batch 405: backward pass 40.474 ms, 15.57 s total | |
[ 2023-10-08 00:09:52 ] Completed Epoch: 13 batch 405: computing loss 152.714 ms, 15.73 s total | |
EPOCH: [13], BATCH: [405/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 405 | |
[ 2023-10-08 00:09:53 ] Completed saving temp checkpoint 1,131.871 ms, 16.86 s total | |
[ 2023-10-08 00:09:53 ] Completed replacing temp checkpoint with checkpoint 66.503 ms, 16.92 s total | |
[ 2023-10-08 00:09:53 ] Completed Epoch: 13 batch 406: moving batch data to device 9.081 ms, 16.93 s total | |
[ 2023-10-08 00:09:53 ] Completed Epoch: 13 batch 406: forward pass 107.662 ms, 17.04 s total | |
[ 2023-10-08 00:09:54 ] Completed Epoch: 13 batch 406: backward pass 83.092 ms, 17.12 s total | |
[ 2023-10-08 00:09:54 ] Completed Epoch: 13 batch 406: computing loss 117.158 ms, 17.24 s total | |
EPOCH: [13], BATCH: [406/889], loss: 0.427, loss_box_reg: 0.132, loss_classifier: 0.112, loss_mask: 0.129, loss_objectness: 0.021, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 406 | |
[ 2023-10-08 00:09:55 ] Completed saving temp checkpoint 998.686 ms, 18.24 s total | |
[ 2023-10-08 00:09:55 ] Completed replacing temp checkpoint with checkpoint 69.365 ms, 18.31 s total | |
[ 2023-10-08 00:09:55 ] Completed Epoch: 13 batch 407: moving batch data to device 9.616 ms, 18.32 s total | |
[ 2023-10-08 00:09:55 ] Completed Epoch: 13 batch 407: forward pass 107.858 ms, 18.43 s total | |
[ 2023-10-08 00:09:55 ] Completed Epoch: 13 batch 407: backward pass 39.585 ms, 18.47 s total | |
[ 2023-10-08 00:09:55 ] Completed Epoch: 13 batch 407: computing loss 155.974 ms, 18.62 s total | |
EPOCH: [13], BATCH: [407/889], loss: 0.346, loss_box_reg: 0.100, loss_classifier: 0.082, loss_mask: 0.124, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 407 | |
[ 2023-10-08 00:09:56 ] Completed saving temp checkpoint 1,156.981 ms, 19.78 s total | |
[ 2023-10-08 00:09:56 ] Completed replacing temp checkpoint with checkpoint 59.010 ms, 19.84 s total | |
[ 2023-10-08 00:09:56 ] Completed Epoch: 13 batch 408: moving batch data to device 8.004 ms, 19.85 s total | |
[ 2023-10-08 00:09:56 ] Completed Epoch: 13 batch 408: forward pass 105.285 ms, 19.95 s total | |
[ 2023-10-08 00:09:56 ] Completed Epoch: 13 batch 408: backward pass 72.263 ms, 20.02 s total | |
[ 2023-10-08 00:09:57 ] Completed Epoch: 13 batch 408: computing loss 123.374 ms, 20.15 s total | |
EPOCH: [13], BATCH: [408/889], loss: 0.358, loss_box_reg: 0.102, loss_classifier: 0.089, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 408 | |
[ 2023-10-08 00:09:58 ] Completed saving temp checkpoint 993.402 ms, 21.14 s total | |
[ 2023-10-08 00:09:58 ] Completed replacing temp checkpoint with checkpoint 71.679 ms, 21.21 s total | |
[ 2023-10-08 00:09:58 ] Completed Epoch: 13 batch 409: moving batch data to device 5.925 ms, 21.22 s total | |
[ 2023-10-08 00:09:58 ] Completed Epoch: 13 batch 409: forward pass 106.876 ms, 21.32 s total | |
[ 2023-10-08 00:09:58 ] Completed Epoch: 13 batch 409: backward pass 31.971 ms, 21.36 s total | |
[ 2023-10-08 00:09:58 ] Completed Epoch: 13 batch 409: computing loss 164.053 ms, 21.52 s total | |
EPOCH: [13], BATCH: [409/889], loss: 0.410, loss_box_reg: 0.122, loss_classifier: 0.102, loss_mask: 0.139, loss_objectness: 0.016, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 409 | |
[ 2023-10-08 00:09:59 ] Completed saving temp checkpoint 1,539.490 ms, 23.06 s total | |
[ 2023-10-08 00:10:00 ] Completed replacing temp checkpoint with checkpoint 76.489 ms, 23.14 s total | |
[ 2023-10-08 00:10:00 ] Completed Epoch: 13 batch 410: moving batch data to device 6.481 ms, 23.14 s total | |
[ 2023-10-08 00:10:00 ] Completed Epoch: 13 batch 410: forward pass 104.378 ms, 23.25 s total | |
[ 2023-10-08 00:10:00 ] Completed Epoch: 13 batch 410: backward pass 73.428 ms, 23.32 s total | |
[ 2023-10-08 00:10:00 ] Completed Epoch: 13 batch 410: computing loss 124.198 ms, 23.45 s total | |
EPOCH: [13], BATCH: [410/889], loss: 0.354, loss_box_reg: 0.106, loss_classifier: 0.082, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 410 | |
[ 2023-10-08 00:10:02 ] Completed saving temp checkpoint 1,690.232 ms, 25.14 s total | |
[ 2023-10-08 00:10:02 ] Completed replacing temp checkpoint with checkpoint 79.079 ms, 25.21 s total | |
[ 2023-10-08 00:10:02 ] Completed Epoch: 13 batch 411: moving batch data to device 7.765 ms, 25.22 s total | |
[ 2023-10-08 00:10:02 ] Completed Epoch: 13 batch 411: forward pass 103.767 ms, 25.33 s total | |
[ 2023-10-08 00:10:02 ] Completed Epoch: 13 batch 411: backward pass 72.064 ms, 25.40 s total | |
[ 2023-10-08 00:10:02 ] Completed Epoch: 13 batch 411: computing loss 124.269 ms, 25.52 s total | |
EPOCH: [13], BATCH: [411/889], loss: 0.401, loss_box_reg: 0.123, loss_classifier: 0.105, loss_mask: 0.133, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 411 | |
[ 2023-10-08 00:10:03 ] Completed saving temp checkpoint 1,225.109 ms, 26.75 s total | |
[ 2023-10-08 00:10:03 ] Completed replacing temp checkpoint with checkpoint 42.839 ms, 26.79 s total | |
[ 2023-10-08 00:10:03 ] Completed Epoch: 13 batch 412: moving batch data to device 5.001 ms, 26.80 s total | |
[ 2023-10-08 00:10:03 ] Completed Epoch: 13 batch 412: forward pass 105.178 ms, 26.90 s total | |
[ 2023-10-08 00:10:03 ] Completed Epoch: 13 batch 412: backward pass 71.978 ms, 26.97 s total | |
[ 2023-10-08 00:10:03 ] Completed Epoch: 13 batch 412: computing loss 120.017 ms, 27.09 s total | |
EPOCH: [13], BATCH: [412/889], loss: 0.372, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 412 | |
[ 2023-10-08 00:10:05 ] Completed saving temp checkpoint 1,602.358 ms, 28.69 s total | |
[ 2023-10-08 00:10:05 ] Completed replacing temp checkpoint with checkpoint 105.759 ms, 28.80 s total | |
[ 2023-10-08 00:10:05 ] Completed Epoch: 13 batch 413: moving batch data to device 8.318 ms, 28.81 s total | |
[ 2023-10-08 00:10:05 ] Completed Epoch: 13 batch 413: forward pass 106.159 ms, 28.92 s total | |
[ 2023-10-08 00:10:05 ] Completed Epoch: 13 batch 413: backward pass 46.327 ms, 28.96 s total | |
[ 2023-10-08 00:10:05 ] Completed Epoch: 13 batch 413: computing loss 148.327 ms, 29.11 s total | |
EPOCH: [13], BATCH: [413/889], loss: 0.384, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.127, loss_objectness: 0.020, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 413 | |
[ 2023-10-08 00:10:07 ] Completed saving temp checkpoint 1,715.622 ms, 30.83 s total | |
[ 2023-10-08 00:10:07 ] Completed replacing temp checkpoint with checkpoint 65.504 ms, 30.89 s total | |
[ 2023-10-08 00:10:07 ] Completed Epoch: 13 batch 414: moving batch data to device 7.711 ms, 30.90 s total | |
[ 2023-10-08 00:10:07 ] Completed Epoch: 13 batch 414: forward pass 109.577 ms, 31.01 s total | |
[ 2023-10-08 00:10:07 ] Completed Epoch: 13 batch 414: backward pass 80.269 ms, 31.09 s total | |
[ 2023-10-08 00:10:08 ] Completed Epoch: 13 batch 414: computing loss 112.593 ms, 31.20 s total | |
EPOCH: [13], BATCH: [414/889], loss: 0.345, loss_box_reg: 0.101, loss_classifier: 0.083, loss_mask: 0.123, loss_objectness: 0.017, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 414 | |
[ 2023-10-08 00:10:09 ] Completed saving temp checkpoint 996.668 ms, 32.20 s total | |
[ 2023-10-08 00:10:09 ] Completed replacing temp checkpoint with checkpoint 69.654 ms, 32.27 s total | |
[ 2023-10-08 00:10:09 ] Completed Epoch: 13 batch 415: moving batch data to device 7.258 ms, 32.27 s total | |
[ 2023-10-08 00:10:09 ] Completed Epoch: 13 batch 415: forward pass 107.855 ms, 32.38 s total | |
[ 2023-10-08 00:10:09 ] Completed Epoch: 13 batch 415: backward pass 78.175 ms, 32.46 s total | |
[ 2023-10-08 00:10:09 ] Completed Epoch: 13 batch 415: computing loss 113.852 ms, 32.57 s total | |
EPOCH: [13], BATCH: [415/889], loss: 0.354, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 415 | |
[ 2023-10-08 00:10:10 ] Completed saving temp checkpoint 988.153 ms, 33.56 s total | |
[ 2023-10-08 00:10:10 ] Completed replacing temp checkpoint with checkpoint 70.198 ms, 33.63 s total | |
[ 2023-10-08 00:10:10 ] Completed Epoch: 13 batch 416: moving batch data to device 8.468 ms, 33.64 s total | |
[ 2023-10-08 00:10:10 ] Completed Epoch: 13 batch 416: forward pass 106.398 ms, 33.75 s total | |
[ 2023-10-08 00:10:10 ] Completed Epoch: 13 batch 416: backward pass 42.163 ms, 33.79 s total | |
[ 2023-10-08 00:10:10 ] Completed Epoch: 13 batch 416: computing loss 140.533 ms, 33.93 s total | |
EPOCH: [13], BATCH: [416/889], loss: 0.375, loss_box_reg: 0.116, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 416 | |
[ 2023-10-08 00:10:11 ] Completed saving temp checkpoint 959.335 ms, 34.89 s total | |
[ 2023-10-08 00:10:11 ] Completed replacing temp checkpoint with checkpoint 51.264 ms, 34.94 s total | |
[ 2023-10-08 00:10:11 ] Completed Epoch: 13 batch 417: moving batch data to device 8.485 ms, 34.95 s total | |
[ 2023-10-08 00:10:11 ] Completed Epoch: 13 batch 417: forward pass 108.838 ms, 35.06 s total | |
[ 2023-10-08 00:10:12 ] Completed Epoch: 13 batch 417: backward pass 76.837 ms, 35.14 s total | |
[ 2023-10-08 00:10:12 ] Completed Epoch: 13 batch 417: computing loss 119.379 ms, 35.25 s total | |
EPOCH: [13], BATCH: [417/889], loss: 0.362, loss_box_reg: 0.110, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 417 | |
[ 2023-10-08 00:10:13 ] Completed saving temp checkpoint 1,181.724 ms, 36.44 s total | |
[ 2023-10-08 00:10:13 ] Completed replacing temp checkpoint with checkpoint 78.737 ms, 36.51 s total | |
[ 2023-10-08 00:10:13 ] Completed Epoch: 13 batch 418: moving batch data to device 7.097 ms, 36.52 s total | |
[ 2023-10-08 00:10:13 ] Completed Epoch: 13 batch 418: forward pass 107.082 ms, 36.63 s total | |
[ 2023-10-08 00:10:13 ] Completed Epoch: 13 batch 418: backward pass 75.453 ms, 36.70 s total | |
[ 2023-10-08 00:10:13 ] Completed Epoch: 13 batch 418: computing loss 117.590 ms, 36.82 s total | |
EPOCH: [13], BATCH: [418/889], loss: 0.372, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 418 | |
[ 2023-10-08 00:10:14 ] Completed saving temp checkpoint 972.739 ms, 37.79 s total | |
[ 2023-10-08 00:10:14 ] Completed replacing temp checkpoint with checkpoint 26.786 ms, 37.82 s total | |
[ 2023-10-08 00:10:14 ] Completed Epoch: 13 batch 419: moving batch data to device 5.971 ms, 37.83 s total | |
[ 2023-10-08 00:10:14 ] Completed Epoch: 13 batch 419: forward pass 104.280 ms, 37.93 s total | |
[ 2023-10-08 00:10:14 ] Completed Epoch: 13 batch 419: backward pass 78.307 ms, 38.01 s total | |
[ 2023-10-08 00:10:15 ] Completed Epoch: 13 batch 419: computing loss 106.303 ms, 38.12 s total | |
EPOCH: [13], BATCH: [419/889], loss: 0.348, loss_box_reg: 0.104, loss_classifier: 0.082, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 419 | |
[ 2023-10-08 00:10:16 ] Completed saving temp checkpoint 1,090.195 ms, 39.21 s total | |
[ 2023-10-08 00:10:16 ] Completed replacing temp checkpoint with checkpoint 73.979 ms, 39.28 s total | |
[ 2023-10-08 00:10:16 ] Completed Epoch: 13 batch 420: moving batch data to device 8.612 ms, 39.29 s total | |
[ 2023-10-08 00:10:16 ] Completed Epoch: 13 batch 420: forward pass 105.264 ms, 39.39 s total | |
[ 2023-10-08 00:10:16 ] Completed Epoch: 13 batch 420: backward pass 73.417 ms, 39.47 s total | |
[ 2023-10-08 00:10:16 ] Completed Epoch: 13 batch 420: computing loss 122.085 ms, 39.59 s total | |
EPOCH: [13], BATCH: [420/889], loss: 0.380, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 420 | |
[ 2023-10-08 00:10:17 ] Completed saving temp checkpoint 1,051.361 ms, 40.64 s total | |
[ 2023-10-08 00:10:17 ] Completed replacing temp checkpoint with checkpoint 79.444 ms, 40.72 s total | |
[ 2023-10-08 00:10:17 ] Completed Epoch: 13 batch 421: moving batch data to device 6.457 ms, 40.73 s total | |
[ 2023-10-08 00:10:17 ] Completed Epoch: 13 batch 421: forward pass 107.342 ms, 40.83 s total | |
[ 2023-10-08 00:10:17 ] Completed Epoch: 13 batch 421: backward pass 35.905 ms, 40.87 s total | |
[ 2023-10-08 00:10:17 ] Completed Epoch: 13 batch 421: computing loss 134.440 ms, 41.01 s total | |
EPOCH: [13], BATCH: [421/889], loss: 0.370, loss_box_reg: 0.104, loss_classifier: 0.091, loss_mask: 0.126, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 421 | |
[ 2023-10-08 00:10:19 ] Completed saving temp checkpoint 1,425.464 ms, 42.43 s total | |
[ 2023-10-08 00:10:19 ] Completed replacing temp checkpoint with checkpoint 68.341 ms, 42.50 s total | |
[ 2023-10-08 00:10:19 ] Completed Epoch: 13 batch 422: moving batch data to device 8.662 ms, 42.51 s total | |
[ 2023-10-08 00:10:19 ] Completed Epoch: 13 batch 422: forward pass 110.967 ms, 42.62 s total | |
[ 2023-10-08 00:10:19 ] Completed Epoch: 13 batch 422: backward pass 69.929 ms, 42.69 s total | |
[ 2023-10-08 00:10:19 ] Completed Epoch: 13 batch 422: computing loss 129.456 ms, 42.82 s total | |
EPOCH: [13], BATCH: [422/889], loss: 0.383, loss_box_reg: 0.122, loss_classifier: 0.090, loss_mask: 0.132, loss_objectness: 0.013, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 422 | |
[ 2023-10-08 00:10:21 ] Completed saving temp checkpoint 1,396.568 ms, 44.21 s total | |
[ 2023-10-08 00:10:21 ] Completed replacing temp checkpoint with checkpoint 80.047 ms, 44.29 s total | |
[ 2023-10-08 00:10:21 ] Completed Epoch: 13 batch 423: moving batch data to device 6.923 ms, 44.30 s total | |
[ 2023-10-08 00:10:21 ] Completed Epoch: 13 batch 423: forward pass 110.219 ms, 44.41 s total | |
[ 2023-10-08 00:10:21 ] Completed Epoch: 13 batch 423: backward pass 76.882 ms, 44.49 s total | |
[ 2023-10-08 00:10:21 ] Completed Epoch: 13 batch 423: computing loss 112.401 ms, 44.60 s total | |
EPOCH: [13], BATCH: [423/889], loss: 0.377, loss_box_reg: 0.110, loss_classifier: 0.093, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 423 | |
[ 2023-10-08 00:10:22 ] Completed saving temp checkpoint 1,252.446 ms, 45.85 s total | |
[ 2023-10-08 00:10:22 ] Completed replacing temp checkpoint with checkpoint 50.373 ms, 45.90 s total | |
[ 2023-10-08 00:10:22 ] Completed Epoch: 13 batch 424: moving batch data to device 7.364 ms, 45.91 s total | |
[ 2023-10-08 00:10:22 ] Completed Epoch: 13 batch 424: forward pass 109.785 ms, 46.02 s total | |
[ 2023-10-08 00:10:22 ] Completed Epoch: 13 batch 424: backward pass 72.388 ms, 46.09 s total | |
[ 2023-10-08 00:10:23 ] Completed Epoch: 13 batch 424: computing loss 102.446 ms, 46.20 s total | |
EPOCH: [13], BATCH: [424/889], loss: 0.389, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 424 | |
[ 2023-10-08 00:10:24 ] Completed saving temp checkpoint 1,183.574 ms, 47.38 s total | |
[ 2023-10-08 00:10:24 ] Completed replacing temp checkpoint with checkpoint 89.904 ms, 47.47 s total | |
[ 2023-10-08 00:10:24 ] Completed Epoch: 13 batch 425: moving batch data to device 7.415 ms, 47.48 s total | |
[ 2023-10-08 00:10:24 ] Completed Epoch: 13 batch 425: forward pass 105.981 ms, 47.58 s total | |
[ 2023-10-08 00:10:24 ] Completed Epoch: 13 batch 425: backward pass 59.368 ms, 47.64 s total | |
[ 2023-10-08 00:10:24 ] Completed Epoch: 13 batch 425: computing loss 134.380 ms, 47.78 s total | |
EPOCH: [13], BATCH: [425/889], loss: 0.377, loss_box_reg: 0.116, loss_classifier: 0.095, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 425 | |
[ 2023-10-08 00:10:26 ] Completed saving temp checkpoint 1,943.491 ms, 49.72 s total | |
[ 2023-10-08 00:10:26 ] Completed replacing temp checkpoint with checkpoint 108.407 ms, 49.83 s total | |
[ 2023-10-08 00:10:26 ] Completed Epoch: 13 batch 426: moving batch data to device 9.293 ms, 49.84 s total | |
[ 2023-10-08 00:10:26 ] Completed Epoch: 13 batch 426: forward pass 101.954 ms, 49.94 s total | |
[ 2023-10-08 00:10:26 ] Completed Epoch: 13 batch 426: backward pass 41.970 ms, 49.98 s total | |
[ 2023-10-08 00:10:27 ] Completed Epoch: 13 batch 426: computing loss 142.402 ms, 50.12 s total | |
EPOCH: [13], BATCH: [426/889], loss: 0.382, loss_box_reg: 0.115, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 426 | |
[ 2023-10-08 00:10:28 ] Completed saving temp checkpoint 1,306.116 ms, 51.43 s total | |
[ 2023-10-08 00:10:28 ] Completed replacing temp checkpoint with checkpoint 88.309 ms, 51.52 s total | |
[ 2023-10-08 00:10:28 ] Completed Epoch: 13 batch 427: moving batch data to device 7.107 ms, 51.53 s total | |
[ 2023-10-08 00:10:28 ] Completed Epoch: 13 batch 427: forward pass 104.430 ms, 51.63 s total | |
[ 2023-10-08 00:10:28 ] Completed Epoch: 13 batch 427: backward pass 33.783 ms, 51.66 s total | |
[ 2023-10-08 00:10:28 ] Completed Epoch: 13 batch 427: computing loss 154.129 ms, 51.82 s total | |
EPOCH: [13], BATCH: [427/889], loss: 0.365, loss_box_reg: 0.111, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 427 | |
[ 2023-10-08 00:10:29 ] Completed saving temp checkpoint 1,201.799 ms, 53.02 s total | |
[ 2023-10-08 00:10:29 ] Completed replacing temp checkpoint with checkpoint 63.996 ms, 53.08 s total | |
[ 2023-10-08 00:10:29 ] Completed Epoch: 13 batch 428: moving batch data to device 8.677 ms, 53.09 s total | |
[ 2023-10-08 00:10:30 ] Completed Epoch: 13 batch 428: forward pass 104.601 ms, 53.20 s total | |
[ 2023-10-08 00:10:30 ] Completed Epoch: 13 batch 428: backward pass 71.831 ms, 53.27 s total | |
[ 2023-10-08 00:10:30 ] Completed Epoch: 13 batch 428: computing loss 124.018 ms, 53.39 s total | |
EPOCH: [13], BATCH: [428/889], loss: 0.348, loss_box_reg: 0.107, loss_classifier: 0.082, loss_mask: 0.127, loss_objectness: 0.012, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 428 | |
[ 2023-10-08 00:10:31 ] Completed saving temp checkpoint 954.882 ms, 54.35 s total | |
[ 2023-10-08 00:10:31 ] Completed replacing temp checkpoint with checkpoint 75.514 ms, 54.42 s total | |
[ 2023-10-08 00:10:31 ] Completed Epoch: 13 batch 429: moving batch data to device 7.001 ms, 54.43 s total | |
[ 2023-10-08 00:10:31 ] Completed Epoch: 13 batch 429: forward pass 105.870 ms, 54.54 s total | |
[ 2023-10-08 00:10:31 ] Completed Epoch: 13 batch 429: backward pass 74.038 ms, 54.61 s total | |
[ 2023-10-08 00:10:31 ] Completed Epoch: 13 batch 429: computing loss 120.416 ms, 54.73 s total | |
EPOCH: [13], BATCH: [429/889], loss: 0.395, loss_box_reg: 0.122, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 429 | |
[ 2023-10-08 00:10:32 ] Completed saving temp checkpoint 945.189 ms, 55.68 s total | |
[ 2023-10-08 00:10:32 ] Completed replacing temp checkpoint with checkpoint 72.102 ms, 55.75 s total | |
[ 2023-10-08 00:10:32 ] Completed Epoch: 13 batch 430: moving batch data to device 8.776 ms, 55.76 s total | |
[ 2023-10-08 00:10:32 ] Completed Epoch: 13 batch 430: forward pass 101.526 ms, 55.86 s total | |
[ 2023-10-08 00:10:32 ] Completed Epoch: 13 batch 430: backward pass 76.065 ms, 55.93 s total | |
[ 2023-10-08 00:10:32 ] Completed Epoch: 13 batch 430: computing loss 95.837 ms, 56.03 s total | |
EPOCH: [13], BATCH: [430/889], loss: 0.400, loss_box_reg: 0.115, loss_classifier: 0.109, loss_mask: 0.126, loss_objectness: 0.017, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 430 | |
[ 2023-10-08 00:10:33 ] Completed saving temp checkpoint 951.885 ms, 56.98 s total | |
[ 2023-10-08 00:10:33 ] Completed replacing temp checkpoint with checkpoint 63.036 ms, 57.04 s total | |
[ 2023-10-08 00:10:33 ] Completed Epoch: 13 batch 431: moving batch data to device 7.279 ms, 57.05 s total | |
[ 2023-10-08 00:10:34 ] Completed Epoch: 13 batch 431: forward pass 108.168 ms, 57.16 s total | |
[ 2023-10-08 00:10:34 ] Completed Epoch: 13 batch 431: backward pass 71.093 ms, 57.23 s total | |
[ 2023-10-08 00:10:34 ] Completed Epoch: 13 batch 431: computing loss 123.645 ms, 57.36 s total | |
EPOCH: [13], BATCH: [431/889], loss: 0.412, loss_box_reg: 0.129, loss_classifier: 0.103, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 431 | |
[ 2023-10-08 00:10:35 ] Completed saving temp checkpoint 1,067.476 ms, 58.42 s total | |
[ 2023-10-08 00:10:35 ] Completed replacing temp checkpoint with checkpoint 69.122 ms, 58.49 s total | |
[ 2023-10-08 00:10:35 ] Completed Epoch: 13 batch 432: moving batch data to device 5.125 ms, 58.50 s total | |
[ 2023-10-08 00:10:35 ] Completed Epoch: 13 batch 432: forward pass 110.949 ms, 58.61 s total | |
[ 2023-10-08 00:10:35 ] Completed Epoch: 13 batch 432: backward pass 79.536 ms, 58.69 s total | |
[ 2023-10-08 00:10:35 ] Completed Epoch: 13 batch 432: computing loss 119.756 ms, 58.81 s total | |
EPOCH: [13], BATCH: [432/889], loss: 0.371, loss_box_reg: 0.107, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 432 | |
[ 2023-10-08 00:10:37 ] Completed saving temp checkpoint 1,321.100 ms, 60.13 s total | |
[ 2023-10-08 00:10:37 ] Completed replacing temp checkpoint with checkpoint 68.820 ms, 60.20 s total | |
[ 2023-10-08 00:10:37 ] Completed Epoch: 13 batch 433: moving batch data to device 8.519 ms, 60.21 s total | |
[ 2023-10-08 00:10:37 ] Completed Epoch: 13 batch 433: forward pass 106.738 ms, 60.31 s total | |
[ 2023-10-08 00:10:37 ] Completed Epoch: 13 batch 433: backward pass 49.450 ms, 60.36 s total | |
[ 2023-10-08 00:10:37 ] Completed Epoch: 13 batch 433: computing loss 124.465 ms, 60.49 s total | |
EPOCH: [13], BATCH: [433/889], loss: 0.404, loss_box_reg: 0.117, loss_classifier: 0.104, loss_mask: 0.139, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 433 | |
[ 2023-10-08 00:10:38 ] Completed saving temp checkpoint 1,624.244 ms, 62.11 s total | |
[ 2023-10-08 00:10:39 ] Completed replacing temp checkpoint with checkpoint 81.543 ms, 62.19 s total | |
[ 2023-10-08 00:10:39 ] Completed Epoch: 13 batch 434: moving batch data to device 9.682 ms, 62.20 s total | |
[ 2023-10-08 00:10:39 ] Completed Epoch: 13 batch 434: forward pass 109.819 ms, 62.31 s total | |
[ 2023-10-08 00:10:39 ] Completed Epoch: 13 batch 434: backward pass 79.493 ms, 62.39 s total | |
[ 2023-10-08 00:10:39 ] Completed Epoch: 13 batch 434: computing loss 118.891 ms, 62.51 s total | |
EPOCH: [13], BATCH: [434/889], loss: 0.408, loss_box_reg: 0.128, loss_classifier: 0.104, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 434 | |
[ 2023-10-08 00:10:40 ] Completed saving temp checkpoint 1,449.955 ms, 63.96 s total | |
[ 2023-10-08 00:10:40 ] Completed replacing temp checkpoint with checkpoint 65.405 ms, 64.03 s total | |
[ 2023-10-08 00:10:40 ] Completed Epoch: 13 batch 435: moving batch data to device 6.607 ms, 64.03 s total | |
[ 2023-10-08 00:10:41 ] Completed Epoch: 13 batch 435: forward pass 109.764 ms, 64.14 s total | |
[ 2023-10-08 00:10:41 ] Completed Epoch: 13 batch 435: backward pass 71.658 ms, 64.21 s total | |
[ 2023-10-08 00:10:41 ] Completed Epoch: 13 batch 435: computing loss 120.854 ms, 64.33 s total | |
EPOCH: [13], BATCH: [435/889], loss: 0.385, loss_box_reg: 0.115, loss_classifier: 0.092, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 435 | |
[ 2023-10-08 00:10:42 ] Completed saving temp checkpoint 1,630.021 ms, 65.96 s total | |
[ 2023-10-08 00:10:42 ] Completed replacing temp checkpoint with checkpoint 58.745 ms, 66.02 s total | |
[ 2023-10-08 00:10:42 ] Completed Epoch: 13 batch 436: moving batch data to device 6.784 ms, 66.03 s total | |
[ 2023-10-08 00:10:43 ] Completed Epoch: 13 batch 436: forward pass 101.636 ms, 66.13 s total | |
[ 2023-10-08 00:10:43 ] Completed Epoch: 13 batch 436: backward pass 79.814 ms, 66.21 s total | |
[ 2023-10-08 00:10:43 ] Completed Epoch: 13 batch 436: computing loss 114.576 ms, 66.33 s total | |
EPOCH: [13], BATCH: [436/889], loss: 0.360, loss_box_reg: 0.106, loss_classifier: 0.091, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 436 | |
[ 2023-10-08 00:10:44 ] Completed saving temp checkpoint 1,023.315 ms, 67.35 s total | |
[ 2023-10-08 00:10:44 ] Completed replacing temp checkpoint with checkpoint 56.390 ms, 67.41 s total | |
[ 2023-10-08 00:10:44 ] Completed Epoch: 13 batch 437: moving batch data to device 4.311 ms, 67.41 s total | |
[ 2023-10-08 00:10:44 ] Completed Epoch: 13 batch 437: forward pass 106.662 ms, 67.52 s total | |
[ 2023-10-08 00:10:44 ] Completed Epoch: 13 batch 437: backward pass 34.966 ms, 67.55 s total | |
[ 2023-10-08 00:10:44 ] Completed Epoch: 13 batch 437: computing loss 131.644 ms, 67.68 s total | |
EPOCH: [13], BATCH: [437/889], loss: 0.392, loss_box_reg: 0.117, loss_classifier: 0.095, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 437 | |
[ 2023-10-08 00:10:45 ] Completed saving temp checkpoint 1,135.008 ms, 68.82 s total | |
[ 2023-10-08 00:10:45 ] Completed replacing temp checkpoint with checkpoint 61.065 ms, 68.88 s total | |
[ 2023-10-08 00:10:45 ] Completed Epoch: 13 batch 438: moving batch data to device 7.632 ms, 68.89 s total | |
[ 2023-10-08 00:10:45 ] Completed Epoch: 13 batch 438: forward pass 107.459 ms, 68.99 s total | |
[ 2023-10-08 00:10:45 ] Completed Epoch: 13 batch 438: backward pass 71.343 ms, 69.07 s total | |
[ 2023-10-08 00:10:46 ] Completed Epoch: 13 batch 438: computing loss 105.443 ms, 69.17 s total | |
EPOCH: [13], BATCH: [438/889], loss: 0.398, loss_box_reg: 0.126, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 438 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 00:26:47 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 00:26:47 ] Completed importing Timer 0.026 ms, 0.00 s total | |
[ 2023-10-08 00:26:48 ] Completed importing everything else 561.263 ms, 0.56 s total | |
[ 2023-10-08 00:26:48 ] Completed defined other functions 0.024 ms, 0.56 s total | |
| distributed init (rank 1): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 3): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 00:26:51 ] Completed main preliminaries 3,091.399 ms, 3.65 s total | |
loading annotations into memory... | |
Done (t=10.26s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.27s) | |
creating index... | |
index created! | |
[ 2023-10-08 00:27:03 ] Completed loading data 11,979.255 ms, 15.63 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 00:27:03 ] Completed creating data samplers 93.937 ms, 15.73 s total | |
[ 2023-10-08 00:27:03 ] Completed creating data loaders 0.194 ms, 15.73 s total | |
[ 2023-10-08 00:27:03 ] Completed creating model and .to(device) 647.964 ms, 16.37 s total | |
[ 2023-10-08 00:27:06 ] Completed preparing model for distributed training 2,742.250 ms, 19.12 s total | |
[ 2023-10-08 00:27:06 ] Completed optimizer and scaler 0.568 ms, 19.12 s total | |
[ 2023-10-08 00:27:06 ] Completed learning rate schedulers 0.206 ms, 19.12 s total | |
[ 2023-10-08 00:27:07 ] Completed init coco evaluator 954.332 ms, 20.07 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 00:27:08 ] Completed retrieving checkpoint 897.476 ms, 20.97 s total | |
EPOCH :: 13 | |
[ 2023-10-08 00:27:08 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 00:27:08 ] Completed training preliminaries 0.929 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 438 | |
[ 2023-10-08 00:27:09 ] Completed Epoch: 13 batch 438: moving batch data to device 466.706 ms, 0.47 s total | |
[ 2023-10-08 00:27:10 ] Completed Epoch: 13 batch 438: forward pass 1,012.193 ms, 1.48 s total | |
[ 2023-10-08 00:27:10 ] Completed Epoch: 13 batch 438: backward pass 185.818 ms, 1.67 s total | |
[ 2023-10-08 00:27:10 ] Completed Epoch: 13 batch 438: computing loss 472.224 ms, 2.14 s total | |
EPOCH: [13], BATCH: [438/889], loss: 0.404, loss_box_reg: 0.129, loss_classifier: 0.102, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 438 | |
[ 2023-10-08 00:27:11 ] Completed saving temp checkpoint 1,046.752 ms, 3.18 s total | |
[ 2023-10-08 00:27:11 ] Completed replacing temp checkpoint with checkpoint 146.926 ms, 3.33 s total | |
[ 2023-10-08 00:27:11 ] Completed Epoch: 13 batch 439: moving batch data to device 2.879 ms, 3.33 s total | |
[ 2023-10-08 00:27:11 ] Completed Epoch: 13 batch 439: forward pass 108.444 ms, 3.44 s total | |
[ 2023-10-08 00:27:12 ] Completed Epoch: 13 batch 439: backward pass 81.614 ms, 3.52 s total | |
[ 2023-10-08 00:27:12 ] Completed Epoch: 13 batch 439: computing loss 138.446 ms, 3.66 s total | |
EPOCH: [13], BATCH: [439/889], loss: 0.377, loss_box_reg: 0.115, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 439 | |
[ 2023-10-08 00:27:13 ] Completed saving temp checkpoint 930.346 ms, 4.59 s total | |
[ 2023-10-08 00:27:13 ] Completed replacing temp checkpoint with checkpoint 54.635 ms, 4.65 s total | |
[ 2023-10-08 00:27:13 ] Completed Epoch: 13 batch 440: moving batch data to device 4.822 ms, 4.65 s total | |
[ 2023-10-08 00:27:13 ] Completed Epoch: 13 batch 440: forward pass 110.622 ms, 4.76 s total | |
[ 2023-10-08 00:27:13 ] Completed Epoch: 13 batch 440: backward pass 89.442 ms, 4.85 s total | |
[ 2023-10-08 00:27:13 ] Completed Epoch: 13 batch 440: computing loss 127.149 ms, 4.98 s total | |
EPOCH: [13], BATCH: [440/889], loss: 0.368, loss_box_reg: 0.109, loss_classifier: 0.097, loss_mask: 0.121, loss_objectness: 0.017, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 440 | |
[ 2023-10-08 00:27:14 ] Completed saving temp checkpoint 1,006.189 ms, 5.99 s total | |
[ 2023-10-08 00:27:14 ] Completed replacing temp checkpoint with checkpoint 62.406 ms, 6.05 s total | |
[ 2023-10-08 00:27:14 ] Completed Epoch: 13 batch 441: moving batch data to device 60.081 ms, 6.11 s total | |
[ 2023-10-08 00:27:14 ] Completed Epoch: 13 batch 441: forward pass 106.278 ms, 6.21 s total | |
[ 2023-10-08 00:27:14 ] Completed Epoch: 13 batch 441: backward pass 39.206 ms, 6.25 s total | |
[ 2023-10-08 00:27:14 ] Completed Epoch: 13 batch 441: computing loss 170.487 ms, 6.42 s total | |
EPOCH: [13], BATCH: [441/889], loss: 0.381, loss_box_reg: 0.113, loss_classifier: 0.092, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.035 | |
Saving checkpoint at epoch 13 train batch 441 | |
[ 2023-10-08 00:27:15 ] Completed saving temp checkpoint 983.290 ms, 7.41 s total | |
[ 2023-10-08 00:27:16 ] Completed replacing temp checkpoint with checkpoint 67.286 ms, 7.48 s total | |
[ 2023-10-08 00:27:16 ] Completed Epoch: 13 batch 442: moving batch data to device 4.200 ms, 7.48 s total | |
[ 2023-10-08 00:27:16 ] Completed Epoch: 13 batch 442: forward pass 111.924 ms, 7.59 s total | |
[ 2023-10-08 00:27:16 ] Completed Epoch: 13 batch 442: backward pass 80.811 ms, 7.67 s total | |
[ 2023-10-08 00:27:16 ] Completed Epoch: 13 batch 442: computing loss 128.474 ms, 7.80 s total | |
EPOCH: [13], BATCH: [442/889], loss: 0.407, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 442 | |
[ 2023-10-08 00:27:17 ] Completed saving temp checkpoint 1,034.166 ms, 8.83 s total | |
[ 2023-10-08 00:27:17 ] Completed replacing temp checkpoint with checkpoint 71.574 ms, 8.91 s total | |
[ 2023-10-08 00:27:17 ] Completed Epoch: 13 batch 443: moving batch data to device 3.767 ms, 8.91 s total | |
[ 2023-10-08 00:27:17 ] Completed Epoch: 13 batch 443: forward pass 107.761 ms, 9.02 s total | |
[ 2023-10-08 00:27:17 ] Completed Epoch: 13 batch 443: backward pass 73.438 ms, 9.09 s total | |
[ 2023-10-08 00:27:17 ] Completed Epoch: 13 batch 443: computing loss 138.865 ms, 9.23 s total | |
EPOCH: [13], BATCH: [443/889], loss: 0.396, loss_box_reg: 0.119, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 443 | |
[ 2023-10-08 00:27:18 ] Completed saving temp checkpoint 908.331 ms, 10.14 s total | |
[ 2023-10-08 00:27:18 ] Completed replacing temp checkpoint with checkpoint 45.678 ms, 10.18 s total | |
[ 2023-10-08 00:27:18 ] Completed Epoch: 13 batch 444: moving batch data to device 3.848 ms, 10.19 s total | |
[ 2023-10-08 00:27:18 ] Completed Epoch: 13 batch 444: forward pass 185.089 ms, 10.37 s total | |
[ 2023-10-08 00:27:18 ] Completed Epoch: 13 batch 444: backward pass 54.344 ms, 10.43 s total | |
[ 2023-10-08 00:27:19 ] Completed Epoch: 13 batch 444: computing loss 251.128 ms, 10.68 s total | |
EPOCH: [13], BATCH: [444/889], loss: 0.410, loss_box_reg: 0.126, loss_classifier: 0.103, loss_mask: 0.135, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 444 | |
[ 2023-10-08 00:27:20 ] Completed saving temp checkpoint 1,024.248 ms, 11.70 s total | |
[ 2023-10-08 00:27:20 ] Completed replacing temp checkpoint with checkpoint 72.850 ms, 11.78 s total | |
[ 2023-10-08 00:27:20 ] Completed Epoch: 13 batch 445: moving batch data to device 7.246 ms, 11.78 s total | |
[ 2023-10-08 00:27:20 ] Completed Epoch: 13 batch 445: forward pass 108.208 ms, 11.89 s total | |
[ 2023-10-08 00:27:20 ] Completed Epoch: 13 batch 445: backward pass 74.781 ms, 11.97 s total | |
[ 2023-10-08 00:27:20 ] Completed Epoch: 13 batch 445: computing loss 117.956 ms, 12.08 s total | |
EPOCH: [13], BATCH: [445/889], loss: 0.382, loss_box_reg: 0.110, loss_classifier: 0.096, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 445 | |
[ 2023-10-08 00:27:21 ] Completed saving temp checkpoint 794.574 ms, 12.88 s total | |
[ 2023-10-08 00:27:21 ] Completed replacing temp checkpoint with checkpoint 60.623 ms, 12.94 s total | |
[ 2023-10-08 00:27:21 ] Completed Epoch: 13 batch 446: moving batch data to device 4.231 ms, 12.94 s total | |
[ 2023-10-08 00:27:21 ] Completed Epoch: 13 batch 446: forward pass 114.677 ms, 13.06 s total | |
[ 2023-10-08 00:27:21 ] Completed Epoch: 13 batch 446: backward pass 74.685 ms, 13.13 s total | |
[ 2023-10-08 00:27:21 ] Completed Epoch: 13 batch 446: computing loss 120.701 ms, 13.25 s total | |
EPOCH: [13], BATCH: [446/889], loss: 0.391, loss_box_reg: 0.119, loss_classifier: 0.096, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 446 | |
[ 2023-10-08 00:27:23 ] Completed saving temp checkpoint 1,246.891 ms, 14.50 s total | |
[ 2023-10-08 00:27:23 ] Completed replacing temp checkpoint with checkpoint 69.354 ms, 14.57 s total | |
[ 2023-10-08 00:27:23 ] Completed Epoch: 13 batch 447: moving batch data to device 6.937 ms, 14.58 s total | |
[ 2023-10-08 00:27:23 ] Completed Epoch: 13 batch 447: forward pass 106.765 ms, 14.68 s total | |
[ 2023-10-08 00:27:23 ] Completed Epoch: 13 batch 447: backward pass 39.750 ms, 14.72 s total | |
[ 2023-10-08 00:27:23 ] Completed Epoch: 13 batch 447: computing loss 150.955 ms, 14.87 s total | |
EPOCH: [13], BATCH: [447/889], loss: 0.382, loss_box_reg: 0.109, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 447 | |
[ 2023-10-08 00:27:24 ] Completed saving temp checkpoint 894.944 ms, 15.77 s total | |
[ 2023-10-08 00:27:24 ] Completed replacing temp checkpoint with checkpoint 56.535 ms, 15.83 s total | |
[ 2023-10-08 00:27:24 ] Completed Epoch: 13 batch 448: moving batch data to device 4.997 ms, 15.83 s total | |
[ 2023-10-08 00:27:24 ] Completed Epoch: 13 batch 448: forward pass 105.805 ms, 15.94 s total | |
[ 2023-10-08 00:27:24 ] Completed Epoch: 13 batch 448: backward pass 80.304 ms, 16.02 s total | |
[ 2023-10-08 00:27:24 ] Completed Epoch: 13 batch 448: computing loss 109.684 ms, 16.13 s total | |
EPOCH: [13], BATCH: [448/889], loss: 0.390, loss_box_reg: 0.112, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 448 | |
[ 2023-10-08 00:27:26 ] Completed saving temp checkpoint 1,769.599 ms, 17.90 s total | |
[ 2023-10-08 00:27:26 ] Completed replacing temp checkpoint with checkpoint 72.178 ms, 17.97 s total | |
[ 2023-10-08 00:27:26 ] Completed Epoch: 13 batch 449: moving batch data to device 5.390 ms, 17.97 s total | |
[ 2023-10-08 00:27:26 ] Completed Epoch: 13 batch 449: forward pass 106.190 ms, 18.08 s total | |
[ 2023-10-08 00:27:26 ] Completed Epoch: 13 batch 449: backward pass 73.987 ms, 18.15 s total | |
[ 2023-10-08 00:27:26 ] Completed Epoch: 13 batch 449: computing loss 115.214 ms, 18.27 s total | |
EPOCH: [13], BATCH: [449/889], loss: 0.406, loss_box_reg: 0.121, loss_classifier: 0.102, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 449 | |
[ 2023-10-08 00:27:27 ] Completed saving temp checkpoint 1,042.814 ms, 19.31 s total | |
[ 2023-10-08 00:27:27 ] Completed replacing temp checkpoint with checkpoint 48.396 ms, 19.36 s total | |
[ 2023-10-08 00:27:27 ] Completed Epoch: 13 batch 450: moving batch data to device 4.678 ms, 19.36 s total | |
[ 2023-10-08 00:27:28 ] Completed Epoch: 13 batch 450: forward pass 107.892 ms, 19.47 s total | |
[ 2023-10-08 00:27:28 ] Completed Epoch: 13 batch 450: backward pass 50.137 ms, 19.52 s total | |
[ 2023-10-08 00:27:28 ] Completed Epoch: 13 batch 450: computing loss 143.301 ms, 19.67 s total | |
EPOCH: [13], BATCH: [450/889], loss: 0.412, loss_box_reg: 0.126, loss_classifier: 0.106, loss_mask: 0.132, loss_objectness: 0.020, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 450 | |
[ 2023-10-08 00:27:30 ] Completed saving temp checkpoint 2,050.121 ms, 21.72 s total | |
[ 2023-10-08 00:27:30 ] Completed replacing temp checkpoint with checkpoint 64.998 ms, 21.78 s total | |
[ 2023-10-08 00:27:30 ] Completed Epoch: 13 batch 451: moving batch data to device 4.809 ms, 21.79 s total | |
[ 2023-10-08 00:27:30 ] Completed Epoch: 13 batch 451: forward pass 102.926 ms, 21.89 s total | |
[ 2023-10-08 00:27:30 ] Completed Epoch: 13 batch 451: backward pass 81.644 ms, 21.97 s total | |
[ 2023-10-08 00:27:30 ] Completed Epoch: 13 batch 451: computing loss 115.423 ms, 22.09 s total | |
EPOCH: [13], BATCH: [451/889], loss: 0.418, loss_box_reg: 0.125, loss_classifier: 0.105, loss_mask: 0.136, loss_objectness: 0.019, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 451 | |
[ 2023-10-08 00:27:31 ] Completed saving temp checkpoint 965.228 ms, 23.05 s total | |
[ 2023-10-08 00:27:31 ] Completed replacing temp checkpoint with checkpoint 68.568 ms, 23.12 s total | |
[ 2023-10-08 00:27:31 ] Completed Epoch: 13 batch 452: moving batch data to device 7.143 ms, 23.13 s total | |
[ 2023-10-08 00:27:31 ] Completed Epoch: 13 batch 452: forward pass 103.507 ms, 23.23 s total | |
[ 2023-10-08 00:27:31 ] Completed Epoch: 13 batch 452: backward pass 51.675 ms, 23.28 s total | |
[ 2023-10-08 00:27:31 ] Completed Epoch: 13 batch 452: computing loss 149.339 ms, 23.43 s total | |
EPOCH: [13], BATCH: [452/889], loss: 0.370, loss_box_reg: 0.109, loss_classifier: 0.090, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 452 | |
[ 2023-10-08 00:27:33 ] Completed saving temp checkpoint 1,104.294 ms, 24.54 s total | |
[ 2023-10-08 00:27:33 ] Completed replacing temp checkpoint with checkpoint 73.873 ms, 24.61 s total | |
[ 2023-10-08 00:27:33 ] Completed Epoch: 13 batch 453: moving batch data to device 7.403 ms, 24.62 s total | |
[ 2023-10-08 00:27:33 ] Completed Epoch: 13 batch 453: forward pass 107.599 ms, 24.72 s total | |
[ 2023-10-08 00:27:33 ] Completed Epoch: 13 batch 453: backward pass 71.845 ms, 24.80 s total | |
[ 2023-10-08 00:27:33 ] Completed Epoch: 13 batch 453: computing loss 117.709 ms, 24.91 s total | |
EPOCH: [13], BATCH: [453/889], loss: 0.393, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 453 | |
[ 2023-10-08 00:27:34 ] Completed saving temp checkpoint 948.605 ms, 25.86 s total | |
[ 2023-10-08 00:27:34 ] Completed replacing temp checkpoint with checkpoint 67.362 ms, 25.93 s total | |
[ 2023-10-08 00:27:34 ] Completed Epoch: 13 batch 454: moving batch data to device 6.855 ms, 25.94 s total | |
[ 2023-10-08 00:27:34 ] Completed Epoch: 13 batch 454: forward pass 109.587 ms, 26.05 s total | |
[ 2023-10-08 00:27:34 ] Completed Epoch: 13 batch 454: backward pass 49.519 ms, 26.10 s total | |
[ 2023-10-08 00:27:34 ] Completed Epoch: 13 batch 454: computing loss 142.130 ms, 26.24 s total | |
EPOCH: [13], BATCH: [454/889], loss: 0.381, loss_box_reg: 0.117, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 454 | |
[ 2023-10-08 00:27:35 ] Completed saving temp checkpoint 1,201.299 ms, 27.44 s total | |
[ 2023-10-08 00:27:36 ] Completed replacing temp checkpoint with checkpoint 74.214 ms, 27.51 s total | |
[ 2023-10-08 00:27:36 ] Completed Epoch: 13 batch 455: moving batch data to device 8.654 ms, 27.52 s total | |
[ 2023-10-08 00:27:36 ] Completed Epoch: 13 batch 455: forward pass 106.475 ms, 27.63 s total | |
[ 2023-10-08 00:27:36 ] Completed Epoch: 13 batch 455: backward pass 75.284 ms, 27.70 s total | |
[ 2023-10-08 00:27:36 ] Completed Epoch: 13 batch 455: computing loss 119.655 ms, 27.82 s total | |
EPOCH: [13], BATCH: [455/889], loss: 0.391, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 455 | |
[ 2023-10-08 00:27:37 ] Completed saving temp checkpoint 1,010.907 ms, 28.83 s total | |
[ 2023-10-08 00:27:37 ] Completed replacing temp checkpoint with checkpoint 72.131 ms, 28.91 s total | |
[ 2023-10-08 00:27:37 ] Completed Epoch: 13 batch 456: moving batch data to device 8.182 ms, 28.92 s total | |
[ 2023-10-08 00:27:37 ] Completed Epoch: 13 batch 456: forward pass 100.909 ms, 29.02 s total | |
[ 2023-10-08 00:27:37 ] Completed Epoch: 13 batch 456: backward pass 49.347 ms, 29.07 s total | |
[ 2023-10-08 00:27:37 ] Completed Epoch: 13 batch 456: computing loss 142.818 ms, 29.21 s total | |
EPOCH: [13], BATCH: [456/889], loss: 0.364, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 456 | |
[ 2023-10-08 00:27:38 ] Completed saving temp checkpoint 1,039.352 ms, 30.25 s total | |
[ 2023-10-08 00:27:38 ] Completed replacing temp checkpoint with checkpoint 48.820 ms, 30.30 s total | |
[ 2023-10-08 00:27:38 ] Completed Epoch: 13 batch 457: moving batch data to device 7.210 ms, 30.30 s total | |
[ 2023-10-08 00:27:38 ] Completed Epoch: 13 batch 457: forward pass 104.599 ms, 30.41 s total | |
[ 2023-10-08 00:27:38 ] Completed Epoch: 13 batch 457: backward pass 38.459 ms, 30.45 s total | |
[ 2023-10-08 00:27:39 ] Completed Epoch: 13 batch 457: computing loss 157.175 ms, 30.60 s total | |
EPOCH: [13], BATCH: [457/889], loss: 0.380, loss_box_reg: 0.111, loss_classifier: 0.100, loss_mask: 0.124, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 457 | |
[ 2023-10-08 00:27:40 ] Completed saving temp checkpoint 978.910 ms, 31.58 s total | |
[ 2023-10-08 00:27:40 ] Completed replacing temp checkpoint with checkpoint 50.745 ms, 31.63 s total | |
[ 2023-10-08 00:27:40 ] Completed Epoch: 13 batch 458: moving batch data to device 5.231 ms, 31.64 s total | |
[ 2023-10-08 00:27:40 ] Completed Epoch: 13 batch 458: forward pass 103.627 ms, 31.74 s total | |
[ 2023-10-08 00:27:40 ] Completed Epoch: 13 batch 458: backward pass 34.002 ms, 31.78 s total | |
[ 2023-10-08 00:27:40 ] Completed Epoch: 13 batch 458: computing loss 157.623 ms, 31.93 s total | |
EPOCH: [13], BATCH: [458/889], loss: 0.378, loss_box_reg: 0.118, loss_classifier: 0.093, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 458 | |
[ 2023-10-08 00:27:41 ] Completed saving temp checkpoint 1,209.694 ms, 33.14 s total | |
[ 2023-10-08 00:27:41 ] Completed replacing temp checkpoint with checkpoint 72.741 ms, 33.22 s total | |
[ 2023-10-08 00:27:41 ] Completed Epoch: 13 batch 459: moving batch data to device 7.548 ms, 33.22 s total | |
[ 2023-10-08 00:27:41 ] Completed Epoch: 13 batch 459: forward pass 108.812 ms, 33.33 s total | |
[ 2023-10-08 00:27:41 ] Completed Epoch: 13 batch 459: backward pass 81.325 ms, 33.41 s total | |
[ 2023-10-08 00:27:42 ] Completed Epoch: 13 batch 459: computing loss 107.860 ms, 33.52 s total | |
EPOCH: [13], BATCH: [459/889], loss: 0.412, loss_box_reg: 0.122, loss_classifier: 0.101, loss_mask: 0.139, loss_objectness: 0.016, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 459 | |
[ 2023-10-08 00:27:43 ] Completed saving temp checkpoint 1,216.412 ms, 34.74 s total | |
[ 2023-10-08 00:27:43 ] Completed replacing temp checkpoint with checkpoint 65.863 ms, 34.80 s total | |
[ 2023-10-08 00:27:43 ] Completed Epoch: 13 batch 460: moving batch data to device 4.960 ms, 34.81 s total | |
[ 2023-10-08 00:27:43 ] Completed Epoch: 13 batch 460: forward pass 105.510 ms, 34.91 s total | |
[ 2023-10-08 00:27:43 ] Completed Epoch: 13 batch 460: backward pass 45.692 ms, 34.96 s total | |
[ 2023-10-08 00:27:43 ] Completed Epoch: 13 batch 460: computing loss 143.720 ms, 35.10 s total | |
EPOCH: [13], BATCH: [460/889], loss: 0.346, loss_box_reg: 0.102, loss_classifier: 0.083, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 460 | |
[ 2023-10-08 00:27:45 ] Completed saving temp checkpoint 1,617.656 ms, 36.72 s total | |
[ 2023-10-08 00:27:45 ] Completed replacing temp checkpoint with checkpoint 89.987 ms, 36.81 s total | |
[ 2023-10-08 00:27:45 ] Completed Epoch: 13 batch 461: moving batch data to device 8.021 ms, 36.82 s total | |
[ 2023-10-08 00:27:45 ] Completed Epoch: 13 batch 461: forward pass 104.611 ms, 36.92 s total | |
[ 2023-10-08 00:27:45 ] Completed Epoch: 13 batch 461: backward pass 42.164 ms, 36.97 s total | |
[ 2023-10-08 00:27:45 ] Completed Epoch: 13 batch 461: computing loss 141.510 ms, 37.11 s total | |
EPOCH: [13], BATCH: [461/889], loss: 0.407, loss_box_reg: 0.124, loss_classifier: 0.107, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 461 | |
[ 2023-10-08 00:27:46 ] Completed saving temp checkpoint 993.900 ms, 38.10 s total | |
[ 2023-10-08 00:27:46 ] Completed replacing temp checkpoint with checkpoint 71.706 ms, 38.17 s total | |
[ 2023-10-08 00:27:46 ] Completed Epoch: 13 batch 462: moving batch data to device 6.764 ms, 38.18 s total | |
[ 2023-10-08 00:27:46 ] Completed Epoch: 13 batch 462: forward pass 116.752 ms, 38.30 s total | |
[ 2023-10-08 00:27:46 ] Completed Epoch: 13 batch 462: backward pass 84.333 ms, 38.38 s total | |
[ 2023-10-08 00:27:47 ] Completed Epoch: 13 batch 462: computing loss 103.025 ms, 38.48 s total | |
EPOCH: [13], BATCH: [462/889], loss: 0.382, loss_box_reg: 0.114, loss_classifier: 0.101, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 462 | |
[ 2023-10-08 00:27:48 ] Completed saving temp checkpoint 1,728.020 ms, 40.21 s total | |
[ 2023-10-08 00:27:48 ] Completed replacing temp checkpoint with checkpoint 86.761 ms, 40.30 s total | |
[ 2023-10-08 00:27:48 ] Completed Epoch: 13 batch 463: moving batch data to device 9.055 ms, 40.31 s total | |
[ 2023-10-08 00:27:48 ] Completed Epoch: 13 batch 463: forward pass 105.415 ms, 40.41 s total | |
[ 2023-10-08 00:27:49 ] Completed Epoch: 13 batch 463: backward pass 81.602 ms, 40.50 s total | |
[ 2023-10-08 00:27:49 ] Completed Epoch: 13 batch 463: computing loss 108.276 ms, 40.60 s total | |
EPOCH: [13], BATCH: [463/889], loss: 0.415, loss_box_reg: 0.129, loss_classifier: 0.108, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 463 | |
[ 2023-10-08 00:27:50 ] Completed saving temp checkpoint 1,031.141 ms, 41.63 s total | |
[ 2023-10-08 00:27:50 ] Completed replacing temp checkpoint with checkpoint 70.420 ms, 41.71 s total | |
[ 2023-10-08 00:27:50 ] Completed Epoch: 13 batch 464: moving batch data to device 8.894 ms, 41.71 s total | |
[ 2023-10-08 00:27:50 ] Completed Epoch: 13 batch 464: forward pass 108.671 ms, 41.82 s total | |
[ 2023-10-08 00:27:50 ] Completed Epoch: 13 batch 464: backward pass 74.876 ms, 41.90 s total | |
[ 2023-10-08 00:27:50 ] Completed Epoch: 13 batch 464: computing loss 119.630 ms, 42.02 s total | |
EPOCH: [13], BATCH: [464/889], loss: 0.385, loss_box_reg: 0.111, loss_classifier: 0.093, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 464 | |
[ 2023-10-08 00:27:51 ] Completed saving temp checkpoint 1,139.162 ms, 43.16 s total | |
[ 2023-10-08 00:27:51 ] Completed replacing temp checkpoint with checkpoint 63.691 ms, 43.22 s total | |
[ 2023-10-08 00:27:51 ] Completed Epoch: 13 batch 465: moving batch data to device 7.099 ms, 43.23 s total | |
[ 2023-10-08 00:27:51 ] Completed Epoch: 13 batch 465: forward pass 106.853 ms, 43.33 s total | |
[ 2023-10-08 00:27:51 ] Completed Epoch: 13 batch 465: backward pass 34.916 ms, 43.37 s total | |
[ 2023-10-08 00:27:52 ] Completed Epoch: 13 batch 465: computing loss 370.792 ms, 43.74 s total | |
EPOCH: [13], BATCH: [465/889], loss: 0.384, loss_box_reg: 0.112, loss_classifier: 0.099, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 465 | |
[ 2023-10-08 00:27:53 ] Completed saving temp checkpoint 1,049.448 ms, 44.79 s total | |
[ 2023-10-08 00:27:53 ] Completed replacing temp checkpoint with checkpoint 72.676 ms, 44.86 s total | |
[ 2023-10-08 00:27:53 ] Completed Epoch: 13 batch 466: moving batch data to device 8.762 ms, 44.87 s total | |
[ 2023-10-08 00:27:53 ] Completed Epoch: 13 batch 466: forward pass 106.274 ms, 44.98 s total | |
[ 2023-10-08 00:27:53 ] Completed Epoch: 13 batch 466: backward pass 41.865 ms, 45.02 s total | |
[ 2023-10-08 00:27:53 ] Completed Epoch: 13 batch 466: computing loss 152.258 ms, 45.17 s total | |
EPOCH: [13], BATCH: [466/889], loss: 0.400, loss_box_reg: 0.116, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 466 | |
[ 2023-10-08 00:27:54 ] Completed saving temp checkpoint 1,093.764 ms, 46.26 s total | |
[ 2023-10-08 00:27:54 ] Completed replacing temp checkpoint with checkpoint 60.484 ms, 46.33 s total | |
[ 2023-10-08 00:27:54 ] Completed Epoch: 13 batch 467: moving batch data to device 6.267 ms, 46.33 s total | |
[ 2023-10-08 00:27:54 ] Completed Epoch: 13 batch 467: forward pass 104.425 ms, 46.44 s total | |
[ 2023-10-08 00:27:55 ] Completed Epoch: 13 batch 467: backward pass 79.081 ms, 46.51 s total | |
[ 2023-10-08 00:27:55 ] Completed Epoch: 13 batch 467: computing loss 113.438 ms, 46.63 s total | |
EPOCH: [13], BATCH: [467/889], loss: 0.354, loss_box_reg: 0.106, loss_classifier: 0.086, loss_mask: 0.123, loss_objectness: 0.017, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 467 | |
[ 2023-10-08 00:27:56 ] Completed saving temp checkpoint 1,087.961 ms, 47.72 s total | |
[ 2023-10-08 00:27:56 ] Completed replacing temp checkpoint with checkpoint 73.579 ms, 47.79 s total | |
[ 2023-10-08 00:27:56 ] Completed Epoch: 13 batch 468: moving batch data to device 8.476 ms, 47.80 s total | |
[ 2023-10-08 00:27:56 ] Completed Epoch: 13 batch 468: forward pass 103.971 ms, 47.90 s total | |
[ 2023-10-08 00:27:56 ] Completed Epoch: 13 batch 468: backward pass 73.076 ms, 47.98 s total | |
[ 2023-10-08 00:27:56 ] Completed Epoch: 13 batch 468: computing loss 120.382 ms, 48.10 s total | |
EPOCH: [13], BATCH: [468/889], loss: 0.400, loss_box_reg: 0.119, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.019, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 468 | |
[ 2023-10-08 00:27:57 ] Completed saving temp checkpoint 1,105.642 ms, 49.20 s total | |
[ 2023-10-08 00:27:57 ] Completed replacing temp checkpoint with checkpoint 48.036 ms, 49.25 s total | |
[ 2023-10-08 00:27:57 ] Completed Epoch: 13 batch 469: moving batch data to device 7.256 ms, 49.26 s total | |
[ 2023-10-08 00:27:57 ] Completed Epoch: 13 batch 469: forward pass 103.210 ms, 49.36 s total | |
[ 2023-10-08 00:27:57 ] Completed Epoch: 13 batch 469: backward pass 70.191 ms, 49.43 s total | |
[ 2023-10-08 00:27:58 ] Completed Epoch: 13 batch 469: computing loss 123.173 ms, 49.55 s total | |
EPOCH: [13], BATCH: [469/889], loss: 0.352, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.019 | |
Saving checkpoint at epoch 13 train batch 469 | |
[ 2023-10-08 00:27:59 ] Completed saving temp checkpoint 1,032.172 ms, 50.59 s total | |
[ 2023-10-08 00:27:59 ] Completed replacing temp checkpoint with checkpoint 46.422 ms, 50.63 s total | |
[ 2023-10-08 00:27:59 ] Completed Epoch: 13 batch 470: moving batch data to device 8.820 ms, 50.64 s total | |
[ 2023-10-08 00:27:59 ] Completed Epoch: 13 batch 470: forward pass 104.609 ms, 50.75 s total | |
[ 2023-10-08 00:27:59 ] Completed Epoch: 13 batch 470: backward pass 82.510 ms, 50.83 s total | |
[ 2023-10-08 00:27:59 ] Completed Epoch: 13 batch 470: computing loss 88.752 ms, 50.92 s total | |
EPOCH: [13], BATCH: [470/889], loss: 0.374, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 470 | |
[ 2023-10-08 00:28:00 ] Completed saving temp checkpoint 1,137.938 ms, 52.05 s total | |
[ 2023-10-08 00:28:00 ] Completed replacing temp checkpoint with checkpoint 60.209 ms, 52.11 s total | |
[ 2023-10-08 00:28:00 ] Completed Epoch: 13 batch 471: moving batch data to device 5.849 ms, 52.12 s total | |
[ 2023-10-08 00:28:00 ] Completed Epoch: 13 batch 471: forward pass 99.723 ms, 52.22 s total | |
[ 2023-10-08 00:28:00 ] Completed Epoch: 13 batch 471: backward pass 77.125 ms, 52.30 s total | |
[ 2023-10-08 00:28:00 ] Completed Epoch: 13 batch 471: computing loss 116.637 ms, 52.41 s total | |
EPOCH: [13], BATCH: [471/889], loss: 0.371, loss_box_reg: 0.113, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 471 | |
[ 2023-10-08 00:28:02 ] Completed saving temp checkpoint 1,306.294 ms, 53.72 s total | |
[ 2023-10-08 00:28:02 ] Completed replacing temp checkpoint with checkpoint 81.563 ms, 53.80 s total | |
[ 2023-10-08 00:28:02 ] Completed Epoch: 13 batch 472: moving batch data to device 7.002 ms, 53.81 s total | |
[ 2023-10-08 00:28:02 ] Completed Epoch: 13 batch 472: forward pass 108.315 ms, 53.92 s total | |
[ 2023-10-08 00:28:02 ] Completed Epoch: 13 batch 472: backward pass 81.526 ms, 54.00 s total | |
[ 2023-10-08 00:28:02 ] Completed Epoch: 13 batch 472: computing loss 89.539 ms, 54.09 s total | |
EPOCH: [13], BATCH: [472/889], loss: 0.401, loss_box_reg: 0.120, loss_classifier: 0.108, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 472 | |
[ 2023-10-08 00:28:04 ] Completed saving temp checkpoint 1,516.185 ms, 55.60 s total | |
[ 2023-10-08 00:28:04 ] Completed replacing temp checkpoint with checkpoint 89.486 ms, 55.69 s total | |
[ 2023-10-08 00:28:04 ] Completed Epoch: 13 batch 473: moving batch data to device 6.292 ms, 55.70 s total | |
[ 2023-10-08 00:28:04 ] Completed Epoch: 13 batch 473: forward pass 109.600 ms, 55.81 s total | |
[ 2023-10-08 00:28:04 ] Completed Epoch: 13 batch 473: backward pass 73.331 ms, 55.88 s total | |
[ 2023-10-08 00:28:04 ] Completed Epoch: 13 batch 473: computing loss 107.421 ms, 55.99 s total | |
EPOCH: [13], BATCH: [473/889], loss: 0.364, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 473 | |
[ 2023-10-08 00:28:05 ] Completed saving temp checkpoint 1,373.161 ms, 57.36 s total | |
[ 2023-10-08 00:28:05 ] Completed replacing temp checkpoint with checkpoint 66.297 ms, 57.43 s total | |
[ 2023-10-08 00:28:05 ] Completed Epoch: 13 batch 474: moving batch data to device 7.843 ms, 57.44 s total | |
[ 2023-10-08 00:28:06 ] Completed Epoch: 13 batch 474: forward pass 106.236 ms, 57.54 s total | |
[ 2023-10-08 00:28:06 ] Completed Epoch: 13 batch 474: backward pass 69.873 ms, 57.61 s total | |
[ 2023-10-08 00:28:06 ] Completed Epoch: 13 batch 474: computing loss 109.866 ms, 57.72 s total | |
EPOCH: [13], BATCH: [474/889], loss: 0.388, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.128, loss_objectness: 0.018, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 474 | |
[ 2023-10-08 00:28:07 ] Completed saving temp checkpoint 1,647.687 ms, 59.37 s total | |
[ 2023-10-08 00:28:07 ] Completed replacing temp checkpoint with checkpoint 44.047 ms, 59.42 s total | |
[ 2023-10-08 00:28:07 ] Completed Epoch: 13 batch 475: moving batch data to device 6.069 ms, 59.42 s total | |
[ 2023-10-08 00:28:08 ] Completed Epoch: 13 batch 475: forward pass 109.540 ms, 59.53 s total | |
[ 2023-10-08 00:28:08 ] Completed Epoch: 13 batch 475: backward pass 77.556 ms, 59.61 s total | |
[ 2023-10-08 00:28:08 ] Completed Epoch: 13 batch 475: computing loss 110.889 ms, 59.72 s total | |
EPOCH: [13], BATCH: [475/889], loss: 0.406, loss_box_reg: 0.127, loss_classifier: 0.104, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 475 | |
[ 2023-10-08 00:28:09 ] Completed saving temp checkpoint 1,068.043 ms, 60.79 s total | |
[ 2023-10-08 00:28:09 ] Completed replacing temp checkpoint with checkpoint 69.553 ms, 60.86 s total | |
[ 2023-10-08 00:28:09 ] Completed Epoch: 13 batch 476: moving batch data to device 10.060 ms, 60.87 s total | |
[ 2023-10-08 00:28:09 ] Completed Epoch: 13 batch 476: forward pass 104.652 ms, 60.97 s total | |
[ 2023-10-08 00:28:09 ] Completed Epoch: 13 batch 476: backward pass 76.827 ms, 61.05 s total | |
[ 2023-10-08 00:28:09 ] Completed Epoch: 13 batch 476: computing loss 109.068 ms, 61.16 s total | |
EPOCH: [13], BATCH: [476/889], loss: 0.400, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 476 | |
[ 2023-10-08 00:28:10 ] Completed saving temp checkpoint 1,149.827 ms, 62.31 s total | |
[ 2023-10-08 00:28:10 ] Completed replacing temp checkpoint with checkpoint 71.717 ms, 62.38 s total | |
[ 2023-10-08 00:28:10 ] Completed Epoch: 13 batch 477: moving batch data to device 9.432 ms, 62.39 s total | |
[ 2023-10-08 00:28:11 ] Completed Epoch: 13 batch 477: forward pass 108.758 ms, 62.50 s total | |
[ 2023-10-08 00:28:11 ] Completed Epoch: 13 batch 477: backward pass 36.685 ms, 62.53 s total | |
[ 2023-10-08 00:28:11 ] Completed Epoch: 13 batch 477: computing loss 160.386 ms, 62.69 s total | |
EPOCH: [13], BATCH: [477/889], loss: 0.400, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 477 | |
[ 2023-10-08 00:28:12 ] Completed saving temp checkpoint 971.962 ms, 63.67 s total | |
[ 2023-10-08 00:28:12 ] Completed replacing temp checkpoint with checkpoint 68.238 ms, 63.73 s total | |
[ 2023-10-08 00:28:12 ] Completed Epoch: 13 batch 478: moving batch data to device 10.215 ms, 63.75 s total | |
[ 2023-10-08 00:28:12 ] Completed Epoch: 13 batch 478: forward pass 104.188 ms, 63.85 s total | |
[ 2023-10-08 00:28:12 ] Completed Epoch: 13 batch 478: backward pass 56.562 ms, 63.91 s total | |
[ 2023-10-08 00:28:12 ] Completed Epoch: 13 batch 478: computing loss 138.515 ms, 64.04 s total | |
EPOCH: [13], BATCH: [478/889], loss: 0.396, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.131, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 478 | |
[ 2023-10-08 00:28:13 ] Completed saving temp checkpoint 1,137.559 ms, 65.18 s total | |
[ 2023-10-08 00:28:13 ] Completed replacing temp checkpoint with checkpoint 84.973 ms, 65.27 s total | |
[ 2023-10-08 00:28:13 ] Completed Epoch: 13 batch 479: moving batch data to device 6.846 ms, 65.27 s total | |
[ 2023-10-08 00:28:13 ] Completed Epoch: 13 batch 479: forward pass 102.797 ms, 65.38 s total | |
[ 2023-10-08 00:28:13 ] Completed Epoch: 13 batch 479: backward pass 70.725 ms, 65.45 s total | |
[ 2023-10-08 00:28:14 ] Completed Epoch: 13 batch 479: computing loss 118.000 ms, 65.57 s total | |
EPOCH: [13], BATCH: [479/889], loss: 0.426, loss_box_reg: 0.127, loss_classifier: 0.109, loss_mask: 0.138, loss_objectness: 0.022, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 479 | |
[ 2023-10-08 00:28:15 ] Completed saving temp checkpoint 1,011.411 ms, 66.58 s total | |
[ 2023-10-08 00:28:15 ] Completed replacing temp checkpoint with checkpoint 68.859 ms, 66.65 s total | |
[ 2023-10-08 00:28:15 ] Completed Epoch: 13 batch 480: moving batch data to device 7.093 ms, 66.65 s total | |
[ 2023-10-08 00:28:15 ] Completed Epoch: 13 batch 480: forward pass 105.908 ms, 66.76 s total | |
[ 2023-10-08 00:28:15 ] Completed Epoch: 13 batch 480: backward pass 70.688 ms, 66.83 s total | |
[ 2023-10-08 00:28:15 ] Completed Epoch: 13 batch 480: computing loss 120.359 ms, 66.95 s total | |
EPOCH: [13], BATCH: [480/889], loss: 0.381, loss_box_reg: 0.112, loss_classifier: 0.099, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 480 | |
[ 2023-10-08 00:28:16 ] Completed saving temp checkpoint 1,151.187 ms, 68.10 s total | |
[ 2023-10-08 00:28:16 ] Completed replacing temp checkpoint with checkpoint 80.634 ms, 68.18 s total | |
[ 2023-10-08 00:28:16 ] Completed Epoch: 13 batch 481: moving batch data to device 9.749 ms, 68.19 s total | |
[ 2023-10-08 00:28:16 ] Completed Epoch: 13 batch 481: forward pass 102.241 ms, 68.29 s total | |
[ 2023-10-08 00:28:16 ] Completed Epoch: 13 batch 481: backward pass 40.300 ms, 68.33 s total | |
[ 2023-10-08 00:28:17 ] Completed Epoch: 13 batch 481: computing loss 141.875 ms, 68.48 s total | |
EPOCH: [13], BATCH: [481/889], loss: 0.392, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.036 | |
Saving checkpoint at epoch 13 train batch 481 | |
[ 2023-10-08 00:28:18 ] Completed saving temp checkpoint 1,022.004 ms, 69.50 s total | |
[ 2023-10-08 00:28:18 ] Completed replacing temp checkpoint with checkpoint 67.224 ms, 69.56 s total | |
[ 2023-10-08 00:28:18 ] Completed Epoch: 13 batch 482: moving batch data to device 9.173 ms, 69.57 s total | |
[ 2023-10-08 00:28:18 ] Completed Epoch: 13 batch 482: forward pass 105.433 ms, 69.68 s total | |
[ 2023-10-08 00:28:18 ] Completed Epoch: 13 batch 482: backward pass 78.447 ms, 69.76 s total | |
[ 2023-10-08 00:28:18 ] Completed Epoch: 13 batch 482: computing loss 116.493 ms, 69.87 s total | |
EPOCH: [13], BATCH: [482/889], loss: 0.363, loss_box_reg: 0.109, loss_classifier: 0.089, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 482 | |
[ 2023-10-08 00:28:19 ] Completed saving temp checkpoint 1,174.242 ms, 71.05 s total | |
[ 2023-10-08 00:28:19 ] Completed replacing temp checkpoint with checkpoint 29.608 ms, 71.08 s total | |
[ 2023-10-08 00:28:19 ] Completed Epoch: 13 batch 483: moving batch data to device 4.394 ms, 71.08 s total | |
[ 2023-10-08 00:28:19 ] Completed Epoch: 13 batch 483: forward pass 104.354 ms, 71.19 s total | |
[ 2023-10-08 00:28:19 ] Completed Epoch: 13 batch 483: backward pass 71.039 ms, 71.26 s total | |
[ 2023-10-08 00:28:19 ] Completed Epoch: 13 batch 483: computing loss 113.142 ms, 71.37 s total | |
EPOCH: [13], BATCH: [483/889], loss: 0.410, loss_box_reg: 0.123, loss_classifier: 0.107, loss_mask: 0.136, loss_objectness: 0.020, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 483 | |
[ 2023-10-08 00:28:21 ] Completed saving temp checkpoint 1,342.829 ms, 72.71 s total | |
[ 2023-10-08 00:28:21 ] Completed replacing temp checkpoint with checkpoint 80.046 ms, 72.79 s total | |
[ 2023-10-08 00:28:21 ] Completed Epoch: 13 batch 484: moving batch data to device 6.781 ms, 72.80 s total | |
[ 2023-10-08 00:28:21 ] Completed Epoch: 13 batch 484: forward pass 104.698 ms, 72.91 s total | |
[ 2023-10-08 00:28:21 ] Completed Epoch: 13 batch 484: backward pass 69.686 ms, 72.98 s total | |
[ 2023-10-08 00:28:21 ] Completed Epoch: 13 batch 484: computing loss 123.560 ms, 73.10 s total | |
EPOCH: [13], BATCH: [484/889], loss: 0.394, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 484 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 00:41:36 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 00:41:36 ] Completed importing Timer 0.065 ms, 0.00 s total | |
[ 2023-10-08 00:41:37 ] Completed importing everything else 466.263 ms, 0.47 s total | |
[ 2023-10-08 00:41:37 ] Completed defined other functions 0.026 ms, 0.47 s total | |
| distributed init (rank 5): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 0): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 00:41:40 ] Completed main preliminaries 2,871.601 ms, 3.34 s total | |
loading annotations into memory... | |
Done (t=11.32s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-08 00:41:53 ] Completed loading data 13,267.920 ms, 16.61 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 00:41:53 ] Completed creating data samplers 104.993 ms, 16.71 s total | |
[ 2023-10-08 00:41:53 ] Completed creating data loaders 0.252 ms, 16.71 s total | |
[ 2023-10-08 00:41:54 ] Completed creating model and .to(device) 651.308 ms, 17.36 s total | |
[ 2023-10-08 00:41:55 ] Completed preparing model for distributed training 976.222 ms, 18.34 s total | |
[ 2023-10-08 00:41:55 ] Completed optimizer and scaler 0.597 ms, 18.34 s total | |
[ 2023-10-08 00:41:55 ] Completed learning rate schedulers 0.241 ms, 18.34 s total | |
[ 2023-10-08 00:41:56 ] Completed init coco evaluator 971.876 ms, 19.31 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 00:41:56 ] Completed retrieving checkpoint 854.506 ms, 20.17 s total | |
EPOCH :: 13 | |
[ 2023-10-08 00:41:56 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 00:41:56 ] Completed training preliminaries 0.881 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 484 | |
[ 2023-10-08 00:41:57 ] Completed Epoch: 13 batch 484: moving batch data to device 571.843 ms, 0.57 s total | |
[ 2023-10-08 00:41:58 ] Completed Epoch: 13 batch 484: forward pass 755.045 ms, 1.33 s total | |
[ 2023-10-08 00:41:58 ] Completed Epoch: 13 batch 484: backward pass 173.060 ms, 1.50 s total | |
[ 2023-10-08 00:41:58 ] Completed Epoch: 13 batch 484: computing loss 505.550 ms, 2.01 s total | |
EPOCH: [13], BATCH: [484/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.093, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 484 | |
[ 2023-10-08 00:41:59 ] Completed saving temp checkpoint 1,063.900 ms, 3.07 s total | |
[ 2023-10-08 00:42:00 ] Completed replacing temp checkpoint with checkpoint 185.103 ms, 3.26 s total | |
[ 2023-10-08 00:42:00 ] Completed Epoch: 13 batch 485: moving batch data to device 3.709 ms, 3.26 s total | |
[ 2023-10-08 00:42:00 ] Completed Epoch: 13 batch 485: forward pass 111.152 ms, 3.37 s total | |
[ 2023-10-08 00:42:00 ] Completed Epoch: 13 batch 485: backward pass 82.170 ms, 3.45 s total | |
[ 2023-10-08 00:42:00 ] Completed Epoch: 13 batch 485: computing loss 134.573 ms, 3.59 s total | |
EPOCH: [13], BATCH: [485/889], loss: 0.391, loss_box_reg: 0.115, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 485 | |
[ 2023-10-08 00:42:01 ] Completed saving temp checkpoint 960.725 ms, 4.55 s total | |
[ 2023-10-08 00:42:01 ] Completed replacing temp checkpoint with checkpoint 46.214 ms, 4.59 s total | |
[ 2023-10-08 00:42:01 ] Completed Epoch: 13 batch 486: moving batch data to device 3.038 ms, 4.60 s total | |
[ 2023-10-08 00:42:01 ] Completed Epoch: 13 batch 486: forward pass 109.465 ms, 4.71 s total | |
[ 2023-10-08 00:42:01 ] Completed Epoch: 13 batch 486: backward pass 75.852 ms, 4.78 s total | |
[ 2023-10-08 00:42:01 ] Completed Epoch: 13 batch 486: computing loss 144.669 ms, 4.93 s total | |
EPOCH: [13], BATCH: [486/889], loss: 0.414, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.138, loss_objectness: 0.021, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 486 | |
[ 2023-10-08 00:42:02 ] Completed saving temp checkpoint 1,103.491 ms, 6.03 s total | |
[ 2023-10-08 00:42:02 ] Completed replacing temp checkpoint with checkpoint 72.578 ms, 6.10 s total | |
[ 2023-10-08 00:42:02 ] Completed Epoch: 13 batch 487: moving batch data to device 3.517 ms, 6.11 s total | |
[ 2023-10-08 00:42:03 ] Completed Epoch: 13 batch 487: forward pass 109.319 ms, 6.22 s total | |
[ 2023-10-08 00:42:03 ] Completed Epoch: 13 batch 487: backward pass 90.213 ms, 6.31 s total | |
[ 2023-10-08 00:42:03 ] Completed Epoch: 13 batch 487: computing loss 128.069 ms, 6.43 s total | |
EPOCH: [13], BATCH: [487/889], loss: 0.403, loss_box_reg: 0.129, loss_classifier: 0.102, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 487 | |
[ 2023-10-08 00:42:04 ] Completed saving temp checkpoint 925.122 ms, 7.36 s total | |
[ 2023-10-08 00:42:04 ] Completed replacing temp checkpoint with checkpoint 67.649 ms, 7.43 s total | |
[ 2023-10-08 00:42:04 ] Completed Epoch: 13 batch 488: moving batch data to device 10.500 ms, 7.44 s total | |
[ 2023-10-08 00:42:04 ] Completed Epoch: 13 batch 488: forward pass 224.915 ms, 7.66 s total | |
[ 2023-10-08 00:42:04 ] Completed Epoch: 13 batch 488: backward pass 32.195 ms, 7.69 s total | |
[ 2023-10-08 00:42:04 ] Completed Epoch: 13 batch 488: computing loss 186.831 ms, 7.88 s total | |
EPOCH: [13], BATCH: [488/889], loss: 0.415, loss_box_reg: 0.125, loss_classifier: 0.112, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 488 | |
[ 2023-10-08 00:42:06 ] Completed saving temp checkpoint 1,295.748 ms, 9.18 s total | |
[ 2023-10-08 00:42:06 ] Completed replacing temp checkpoint with checkpoint 75.964 ms, 9.25 s total | |
[ 2023-10-08 00:42:06 ] Completed Epoch: 13 batch 489: moving batch data to device 3.773 ms, 9.26 s total | |
[ 2023-10-08 00:42:06 ] Completed Epoch: 13 batch 489: forward pass 110.258 ms, 9.37 s total | |
[ 2023-10-08 00:42:06 ] Completed Epoch: 13 batch 489: backward pass 92.444 ms, 9.46 s total | |
[ 2023-10-08 00:42:06 ] Completed Epoch: 13 batch 489: computing loss 125.618 ms, 9.59 s total | |
EPOCH: [13], BATCH: [489/889], loss: 0.363, loss_box_reg: 0.109, loss_classifier: 0.095, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 489 | |
[ 2023-10-08 00:42:07 ] Completed saving temp checkpoint 1,471.975 ms, 11.06 s total | |
[ 2023-10-08 00:42:07 ] Completed replacing temp checkpoint with checkpoint 76.089 ms, 11.13 s total | |
[ 2023-10-08 00:42:08 ] Completed Epoch: 13 batch 490: moving batch data to device 7.613 ms, 11.14 s total | |
[ 2023-10-08 00:42:08 ] Completed Epoch: 13 batch 490: forward pass 107.760 ms, 11.25 s total | |
[ 2023-10-08 00:42:08 ] Completed Epoch: 13 batch 490: backward pass 82.193 ms, 11.33 s total | |
[ 2023-10-08 00:42:08 ] Completed Epoch: 13 batch 490: computing loss 208.685 ms, 11.54 s total | |
EPOCH: [13], BATCH: [490/889], loss: 0.395, loss_box_reg: 0.117, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 490 | |
[ 2023-10-08 00:42:10 ] Completed saving temp checkpoint 1,755.074 ms, 13.29 s total | |
[ 2023-10-08 00:42:10 ] Completed replacing temp checkpoint with checkpoint 85.178 ms, 13.38 s total | |
[ 2023-10-08 00:42:10 ] Completed Epoch: 13 batch 491: moving batch data to device 3.177 ms, 13.38 s total | |
[ 2023-10-08 00:42:10 ] Completed Epoch: 13 batch 491: forward pass 107.116 ms, 13.49 s total | |
[ 2023-10-08 00:42:10 ] Completed Epoch: 13 batch 491: backward pass 75.718 ms, 13.57 s total | |
[ 2023-10-08 00:42:10 ] Completed Epoch: 13 batch 491: computing loss 120.070 ms, 13.69 s total | |
EPOCH: [13], BATCH: [491/889], loss: 0.390, loss_box_reg: 0.117, loss_classifier: 0.101, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 491 | |
[ 2023-10-08 00:42:11 ] Completed saving temp checkpoint 1,024.426 ms, 14.71 s total | |
[ 2023-10-08 00:42:11 ] Completed replacing temp checkpoint with checkpoint 69.638 ms, 14.78 s total | |
[ 2023-10-08 00:42:11 ] Completed Epoch: 13 batch 492: moving batch data to device 6.741 ms, 14.79 s total | |
[ 2023-10-08 00:42:11 ] Completed Epoch: 13 batch 492: forward pass 101.586 ms, 14.89 s total | |
[ 2023-10-08 00:42:11 ] Completed Epoch: 13 batch 492: backward pass 69.618 ms, 14.96 s total | |
[ 2023-10-08 00:42:11 ] Completed Epoch: 13 batch 492: computing loss 170.824 ms, 15.13 s total | |
EPOCH: [13], BATCH: [492/889], loss: 0.364, loss_box_reg: 0.110, loss_classifier: 0.095, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 492 | |
[ 2023-10-08 00:42:13 ] Completed saving temp checkpoint 1,682.376 ms, 16.81 s total | |
[ 2023-10-08 00:42:13 ] Completed replacing temp checkpoint with checkpoint 61.357 ms, 16.87 s total | |
[ 2023-10-08 00:42:13 ] Completed Epoch: 13 batch 493: moving batch data to device 7.431 ms, 16.88 s total | |
[ 2023-10-08 00:42:13 ] Completed Epoch: 13 batch 493: forward pass 103.966 ms, 16.98 s total | |
[ 2023-10-08 00:42:13 ] Completed Epoch: 13 batch 493: backward pass 76.157 ms, 17.06 s total | |
[ 2023-10-08 00:42:14 ] Completed Epoch: 13 batch 493: computing loss 120.258 ms, 17.18 s total | |
EPOCH: [13], BATCH: [493/889], loss: 0.376, loss_box_reg: 0.111, loss_classifier: 0.089, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 493 | |
[ 2023-10-08 00:42:15 ] Completed saving temp checkpoint 1,127.607 ms, 18.31 s total | |
[ 2023-10-08 00:42:15 ] Completed replacing temp checkpoint with checkpoint 82.293 ms, 18.39 s total | |
[ 2023-10-08 00:42:15 ] Completed Epoch: 13 batch 494: moving batch data to device 8.214 ms, 18.40 s total | |
[ 2023-10-08 00:42:15 ] Completed Epoch: 13 batch 494: forward pass 102.970 ms, 18.50 s total | |
[ 2023-10-08 00:42:15 ] Completed Epoch: 13 batch 494: backward pass 52.584 ms, 18.55 s total | |
[ 2023-10-08 00:42:15 ] Completed Epoch: 13 batch 494: computing loss 144.354 ms, 18.70 s total | |
EPOCH: [13], BATCH: [494/889], loss: 0.392, loss_box_reg: 0.119, loss_classifier: 0.098, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 494 | |
[ 2023-10-08 00:42:16 ] Completed saving temp checkpoint 1,241.753 ms, 19.94 s total | |
[ 2023-10-08 00:42:16 ] Completed replacing temp checkpoint with checkpoint 63.271 ms, 20.00 s total | |
[ 2023-10-08 00:42:16 ] Completed Epoch: 13 batch 495: moving batch data to device 9.260 ms, 20.01 s total | |
[ 2023-10-08 00:42:16 ] Completed Epoch: 13 batch 495: forward pass 112.474 ms, 20.12 s total | |
[ 2023-10-08 00:42:17 ] Completed Epoch: 13 batch 495: backward pass 73.834 ms, 20.20 s total | |
[ 2023-10-08 00:42:17 ] Completed Epoch: 13 batch 495: computing loss 126.773 ms, 20.33 s total | |
EPOCH: [13], BATCH: [495/889], loss: 0.431, loss_box_reg: 0.130, loss_classifier: 0.117, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 495 | |
[ 2023-10-08 00:42:18 ] Completed saving temp checkpoint 1,171.203 ms, 21.50 s total | |
[ 2023-10-08 00:42:18 ] Completed replacing temp checkpoint with checkpoint 79.220 ms, 21.58 s total | |
[ 2023-10-08 00:42:18 ] Completed Epoch: 13 batch 496: moving batch data to device 7.304 ms, 21.58 s total | |
[ 2023-10-08 00:42:18 ] Completed Epoch: 13 batch 496: forward pass 113.993 ms, 21.70 s total | |
[ 2023-10-08 00:42:18 ] Completed Epoch: 13 batch 496: backward pass 79.571 ms, 21.78 s total | |
[ 2023-10-08 00:42:18 ] Completed Epoch: 13 batch 496: computing loss 121.078 ms, 21.90 s total | |
EPOCH: [13], BATCH: [496/889], loss: 0.384, loss_box_reg: 0.120, loss_classifier: 0.098, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 496 | |
[ 2023-10-08 00:42:20 ] Completed saving temp checkpoint 1,282.007 ms, 23.18 s total | |
[ 2023-10-08 00:42:20 ] Completed replacing temp checkpoint with checkpoint 77.568 ms, 23.26 s total | |
[ 2023-10-08 00:42:20 ] Completed Epoch: 13 batch 497: moving batch data to device 7.264 ms, 23.26 s total | |
[ 2023-10-08 00:42:20 ] Completed Epoch: 13 batch 497: forward pass 103.161 ms, 23.37 s total | |
[ 2023-10-08 00:42:20 ] Completed Epoch: 13 batch 497: backward pass 74.265 ms, 23.44 s total | |
[ 2023-10-08 00:42:20 ] Completed Epoch: 13 batch 497: computing loss 117.014 ms, 23.56 s total | |
EPOCH: [13], BATCH: [497/889], loss: 0.349, loss_box_reg: 0.103, loss_classifier: 0.090, loss_mask: 0.120, loss_objectness: 0.014, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 497 | |
[ 2023-10-08 00:42:21 ] Completed saving temp checkpoint 1,187.427 ms, 24.75 s total | |
[ 2023-10-08 00:42:21 ] Completed replacing temp checkpoint with checkpoint 59.840 ms, 24.81 s total | |
[ 2023-10-08 00:42:21 ] Completed Epoch: 13 batch 498: moving batch data to device 5.428 ms, 24.81 s total | |
[ 2023-10-08 00:42:21 ] Completed Epoch: 13 batch 498: forward pass 105.443 ms, 24.92 s total | |
[ 2023-10-08 00:42:21 ] Completed Epoch: 13 batch 498: backward pass 35.913 ms, 24.95 s total | |
[ 2023-10-08 00:42:21 ] Completed Epoch: 13 batch 498: computing loss 161.613 ms, 25.11 s total | |
EPOCH: [13], BATCH: [498/889], loss: 0.387, loss_box_reg: 0.116, loss_classifier: 0.103, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 498 | |
[ 2023-10-08 00:42:23 ] Completed saving temp checkpoint 1,282.745 ms, 26.40 s total | |
[ 2023-10-08 00:42:23 ] Completed replacing temp checkpoint with checkpoint 83.641 ms, 26.48 s total | |
[ 2023-10-08 00:42:23 ] Completed Epoch: 13 batch 499: moving batch data to device 8.540 ms, 26.49 s total | |
[ 2023-10-08 00:42:23 ] Completed Epoch: 13 batch 499: forward pass 105.100 ms, 26.59 s total | |
[ 2023-10-08 00:42:23 ] Completed Epoch: 13 batch 499: backward pass 59.232 ms, 26.65 s total | |
[ 2023-10-08 00:42:23 ] Completed Epoch: 13 batch 499: computing loss 136.919 ms, 26.79 s total | |
EPOCH: [13], BATCH: [499/889], loss: 0.355, loss_box_reg: 0.106, loss_classifier: 0.089, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 499 | |
[ 2023-10-08 00:42:24 ] Completed saving temp checkpoint 1,162.383 ms, 27.95 s total | |
[ 2023-10-08 00:42:24 ] Completed replacing temp checkpoint with checkpoint 61.545 ms, 28.01 s total | |
[ 2023-10-08 00:42:24 ] Completed Epoch: 13 batch 500: moving batch data to device 6.397 ms, 28.02 s total | |
[ 2023-10-08 00:42:24 ] Completed Epoch: 13 batch 500: forward pass 108.993 ms, 28.13 s total | |
[ 2023-10-08 00:42:25 ] Completed Epoch: 13 batch 500: backward pass 47.472 ms, 28.18 s total | |
[ 2023-10-08 00:42:25 ] Completed Epoch: 13 batch 500: computing loss 140.351 ms, 28.32 s total | |
EPOCH: [13], BATCH: [500/889], loss: 0.378, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.019 | |
Saving checkpoint at epoch 13 train batch 500 | |
[ 2023-10-08 00:42:26 ] Completed saving temp checkpoint 1,277.705 ms, 29.60 s total | |
[ 2023-10-08 00:42:26 ] Completed replacing temp checkpoint with checkpoint 85.256 ms, 29.68 s total | |
[ 2023-10-08 00:42:26 ] Completed Epoch: 13 batch 501: moving batch data to device 9.302 ms, 29.69 s total | |
[ 2023-10-08 00:42:26 ] Completed Epoch: 13 batch 501: forward pass 106.812 ms, 29.80 s total | |
[ 2023-10-08 00:42:26 ] Completed Epoch: 13 batch 501: backward pass 78.537 ms, 29.88 s total | |
[ 2023-10-08 00:42:26 ] Completed Epoch: 13 batch 501: computing loss 116.626 ms, 29.99 s total | |
EPOCH: [13], BATCH: [501/889], loss: 0.422, loss_box_reg: 0.130, loss_classifier: 0.111, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 501 | |
[ 2023-10-08 00:42:28 ] Completed saving temp checkpoint 1,676.636 ms, 31.67 s total | |
[ 2023-10-08 00:42:28 ] Completed replacing temp checkpoint with checkpoint 118.064 ms, 31.79 s total | |
[ 2023-10-08 00:42:28 ] Completed Epoch: 13 batch 502: moving batch data to device 8.152 ms, 31.80 s total | |
[ 2023-10-08 00:42:28 ] Completed Epoch: 13 batch 502: forward pass 105.995 ms, 31.90 s total | |
[ 2023-10-08 00:42:28 ] Completed Epoch: 13 batch 502: backward pass 78.327 ms, 31.98 s total | |
[ 2023-10-08 00:42:28 ] Completed Epoch: 13 batch 502: computing loss 118.244 ms, 32.10 s total | |
EPOCH: [13], BATCH: [502/889], loss: 0.396, loss_box_reg: 0.115, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 502 | |
[ 2023-10-08 00:42:30 ] Completed saving temp checkpoint 1,592.308 ms, 33.69 s total | |
[ 2023-10-08 00:42:30 ] Completed replacing temp checkpoint with checkpoint 99.895 ms, 33.79 s total | |
[ 2023-10-08 00:42:30 ] Completed Epoch: 13 batch 503: moving batch data to device 8.597 ms, 33.80 s total | |
[ 2023-10-08 00:42:30 ] Completed Epoch: 13 batch 503: forward pass 103.171 ms, 33.90 s total | |
[ 2023-10-08 00:42:30 ] Completed Epoch: 13 batch 503: backward pass 73.960 ms, 33.98 s total | |
[ 2023-10-08 00:42:30 ] Completed Epoch: 13 batch 503: computing loss 113.244 ms, 34.09 s total | |
EPOCH: [13], BATCH: [503/889], loss: 0.372, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 503 | |
[ 2023-10-08 00:42:32 ] Completed saving temp checkpoint 1,233.337 ms, 35.32 s total | |
[ 2023-10-08 00:42:32 ] Completed replacing temp checkpoint with checkpoint 81.680 ms, 35.40 s total | |
[ 2023-10-08 00:42:32 ] Completed Epoch: 13 batch 504: moving batch data to device 7.782 ms, 35.41 s total | |
[ 2023-10-08 00:42:32 ] Completed Epoch: 13 batch 504: forward pass 103.011 ms, 35.51 s total | |
[ 2023-10-08 00:42:32 ] Completed Epoch: 13 batch 504: backward pass 30.512 ms, 35.55 s total | |
[ 2023-10-08 00:42:32 ] Completed Epoch: 13 batch 504: computing loss 140.288 ms, 35.69 s total | |
EPOCH: [13], BATCH: [504/889], loss: 0.400, loss_box_reg: 0.125, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 504 | |
[ 2023-10-08 00:42:33 ] Completed saving temp checkpoint 1,325.884 ms, 37.01 s total | |
[ 2023-10-08 00:42:33 ] Completed replacing temp checkpoint with checkpoint 71.478 ms, 37.08 s total | |
[ 2023-10-08 00:42:33 ] Completed Epoch: 13 batch 505: moving batch data to device 7.846 ms, 37.09 s total | |
[ 2023-10-08 00:42:34 ] Completed Epoch: 13 batch 505: forward pass 103.516 ms, 37.19 s total | |
[ 2023-10-08 00:42:34 ] Completed Epoch: 13 batch 505: backward pass 75.698 ms, 37.27 s total | |
[ 2023-10-08 00:42:34 ] Completed Epoch: 13 batch 505: computing loss 122.202 ms, 37.39 s total | |
EPOCH: [13], BATCH: [505/889], loss: 0.413, loss_box_reg: 0.125, loss_classifier: 0.109, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 505 | |
[ 2023-10-08 00:42:35 ] Completed saving temp checkpoint 1,227.207 ms, 38.62 s total | |
[ 2023-10-08 00:42:35 ] Completed replacing temp checkpoint with checkpoint 83.592 ms, 38.70 s total | |
[ 2023-10-08 00:42:35 ] Completed Epoch: 13 batch 506: moving batch data to device 6.786 ms, 38.71 s total | |
[ 2023-10-08 00:42:35 ] Completed Epoch: 13 batch 506: forward pass 109.051 ms, 38.82 s total | |
[ 2023-10-08 00:42:35 ] Completed Epoch: 13 batch 506: backward pass 46.630 ms, 38.87 s total | |
[ 2023-10-08 00:42:35 ] Completed Epoch: 13 batch 506: computing loss 153.740 ms, 39.02 s total | |
EPOCH: [13], BATCH: [506/889], loss: 0.377, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 506 | |
[ 2023-10-08 00:42:37 ] Completed saving temp checkpoint 1,482.970 ms, 40.50 s total | |
[ 2023-10-08 00:42:37 ] Completed replacing temp checkpoint with checkpoint 35.826 ms, 40.54 s total | |
[ 2023-10-08 00:42:37 ] Completed Epoch: 13 batch 507: moving batch data to device 4.616 ms, 40.54 s total | |
[ 2023-10-08 00:42:37 ] Completed Epoch: 13 batch 507: forward pass 103.299 ms, 40.65 s total | |
[ 2023-10-08 00:42:37 ] Completed Epoch: 13 batch 507: backward pass 81.398 ms, 40.73 s total | |
[ 2023-10-08 00:42:37 ] Completed Epoch: 13 batch 507: computing loss 113.998 ms, 40.84 s total | |
EPOCH: [13], BATCH: [507/889], loss: 0.439, loss_box_reg: 0.139, loss_classifier: 0.114, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 507 | |
[ 2023-10-08 00:42:38 ] Completed saving temp checkpoint 1,058.163 ms, 41.90 s total | |
[ 2023-10-08 00:42:38 ] Completed replacing temp checkpoint with checkpoint 58.492 ms, 41.96 s total | |
[ 2023-10-08 00:42:38 ] Completed Epoch: 13 batch 508: moving batch data to device 6.707 ms, 41.96 s total | |
[ 2023-10-08 00:42:38 ] Completed Epoch: 13 batch 508: forward pass 106.511 ms, 42.07 s total | |
[ 2023-10-08 00:42:38 ] Completed Epoch: 13 batch 508: backward pass 67.376 ms, 42.14 s total | |
[ 2023-10-08 00:42:39 ] Completed Epoch: 13 batch 508: computing loss 125.996 ms, 42.26 s total | |
EPOCH: [13], BATCH: [508/889], loss: 0.419, loss_box_reg: 0.125, loss_classifier: 0.104, loss_mask: 0.140, loss_objectness: 0.017, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 508 | |
[ 2023-10-08 00:42:40 ] Completed saving temp checkpoint 1,225.608 ms, 43.49 s total | |
[ 2023-10-08 00:42:40 ] Completed replacing temp checkpoint with checkpoint 82.058 ms, 43.57 s total | |
[ 2023-10-08 00:42:40 ] Completed Epoch: 13 batch 509: moving batch data to device 10.219 ms, 43.58 s total | |
[ 2023-10-08 00:42:40 ] Completed Epoch: 13 batch 509: forward pass 109.128 ms, 43.69 s total | |
[ 2023-10-08 00:42:40 ] Completed Epoch: 13 batch 509: backward pass 72.945 ms, 43.76 s total | |
[ 2023-10-08 00:42:40 ] Completed Epoch: 13 batch 509: computing loss 127.390 ms, 43.89 s total | |
EPOCH: [13], BATCH: [509/889], loss: 0.374, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 509 | |
[ 2023-10-08 00:42:41 ] Completed saving temp checkpoint 1,115.261 ms, 45.01 s total | |
[ 2023-10-08 00:42:41 ] Completed replacing temp checkpoint with checkpoint 63.046 ms, 45.07 s total | |
[ 2023-10-08 00:42:41 ] Completed Epoch: 13 batch 510: moving batch data to device 6.402 ms, 45.08 s total | |
[ 2023-10-08 00:42:42 ] Completed Epoch: 13 batch 510: forward pass 101.612 ms, 45.18 s total | |
[ 2023-10-08 00:42:42 ] Completed Epoch: 13 batch 510: backward pass 52.817 ms, 45.23 s total | |
[ 2023-10-08 00:42:42 ] Completed Epoch: 13 batch 510: computing loss 136.727 ms, 45.37 s total | |
EPOCH: [13], BATCH: [510/889], loss: 0.388, loss_box_reg: 0.115, loss_classifier: 0.103, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 510 | |
[ 2023-10-08 00:42:43 ] Completed saving temp checkpoint 1,195.784 ms, 46.56 s total | |
[ 2023-10-08 00:42:43 ] Completed replacing temp checkpoint with checkpoint 85.613 ms, 46.65 s total | |
[ 2023-10-08 00:42:43 ] Completed Epoch: 13 batch 511: moving batch data to device 8.121 ms, 46.66 s total | |
[ 2023-10-08 00:42:43 ] Completed Epoch: 13 batch 511: forward pass 104.055 ms, 46.76 s total | |
[ 2023-10-08 00:42:43 ] Completed Epoch: 13 batch 511: backward pass 47.682 ms, 46.81 s total | |
[ 2023-10-08 00:42:43 ] Completed Epoch: 13 batch 511: computing loss 147.441 ms, 46.96 s total | |
EPOCH: [13], BATCH: [511/889], loss: 0.377, loss_box_reg: 0.112, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 511 | |
[ 2023-10-08 00:42:44 ] Completed saving temp checkpoint 1,083.064 ms, 48.04 s total | |
[ 2023-10-08 00:42:44 ] Completed replacing temp checkpoint with checkpoint 65.687 ms, 48.11 s total | |
[ 2023-10-08 00:42:44 ] Completed Epoch: 13 batch 512: moving batch data to device 7.559 ms, 48.11 s total | |
[ 2023-10-08 00:42:45 ] Completed Epoch: 13 batch 512: forward pass 105.019 ms, 48.22 s total | |
[ 2023-10-08 00:42:45 ] Completed Epoch: 13 batch 512: backward pass 39.690 ms, 48.26 s total | |
[ 2023-10-08 00:42:45 ] Completed Epoch: 13 batch 512: computing loss 153.467 ms, 48.41 s total | |
EPOCH: [13], BATCH: [512/889], loss: 0.396, loss_box_reg: 0.114, loss_classifier: 0.096, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 512 | |
[ 2023-10-08 00:42:46 ] Completed saving temp checkpoint 1,222.682 ms, 49.63 s total | |
[ 2023-10-08 00:42:46 ] Completed replacing temp checkpoint with checkpoint 82.054 ms, 49.72 s total | |
[ 2023-10-08 00:42:46 ] Completed Epoch: 13 batch 513: moving batch data to device 8.454 ms, 49.72 s total | |
[ 2023-10-08 00:42:46 ] Completed Epoch: 13 batch 513: forward pass 108.324 ms, 49.83 s total | |
[ 2023-10-08 00:42:46 ] Completed Epoch: 13 batch 513: backward pass 77.072 ms, 49.91 s total | |
[ 2023-10-08 00:42:46 ] Completed Epoch: 13 batch 513: computing loss 92.908 ms, 50.00 s total | |
EPOCH: [13], BATCH: [513/889], loss: 0.401, loss_box_reg: 0.123, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 513 | |
[ 2023-10-08 00:42:48 ] Completed saving temp checkpoint 1,316.103 ms, 51.32 s total | |
[ 2023-10-08 00:42:48 ] Completed replacing temp checkpoint with checkpoint 50.884 ms, 51.37 s total | |
[ 2023-10-08 00:42:48 ] Completed Epoch: 13 batch 514: moving batch data to device 8.273 ms, 51.38 s total | |
[ 2023-10-08 00:42:48 ] Completed Epoch: 13 batch 514: forward pass 106.417 ms, 51.48 s total | |
[ 2023-10-08 00:42:48 ] Completed Epoch: 13 batch 514: backward pass 63.146 ms, 51.55 s total | |
[ 2023-10-08 00:42:48 ] Completed Epoch: 13 batch 514: computing loss 130.583 ms, 51.68 s total | |
EPOCH: [13], BATCH: [514/889], loss: 0.398, loss_box_reg: 0.123, loss_classifier: 0.109, loss_mask: 0.126, loss_objectness: 0.017, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 514 | |
[ 2023-10-08 00:42:50 ] Completed saving temp checkpoint 1,904.603 ms, 53.58 s total | |
[ 2023-10-08 00:42:50 ] Completed replacing temp checkpoint with checkpoint 98.843 ms, 53.68 s total | |
[ 2023-10-08 00:42:50 ] Completed Epoch: 13 batch 515: moving batch data to device 7.151 ms, 53.69 s total | |
[ 2023-10-08 00:42:50 ] Completed Epoch: 13 batch 515: forward pass 108.963 ms, 53.80 s total | |
[ 2023-10-08 00:42:50 ] Completed Epoch: 13 batch 515: backward pass 74.205 ms, 53.87 s total | |
[ 2023-10-08 00:42:50 ] Completed Epoch: 13 batch 515: computing loss 111.873 ms, 53.98 s total | |
EPOCH: [13], BATCH: [515/889], loss: 0.365, loss_box_reg: 0.109, loss_classifier: 0.089, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 515 | |
[ 2023-10-08 00:42:52 ] Completed saving temp checkpoint 1,215.095 ms, 55.20 s total | |
[ 2023-10-08 00:42:52 ] Completed replacing temp checkpoint with checkpoint 54.084 ms, 55.25 s total | |
[ 2023-10-08 00:42:52 ] Completed Epoch: 13 batch 516: moving batch data to device 4.724 ms, 55.26 s total | |
[ 2023-10-08 00:42:52 ] Completed Epoch: 13 batch 516: forward pass 101.875 ms, 55.36 s total | |
[ 2023-10-08 00:42:52 ] Completed Epoch: 13 batch 516: backward pass 48.223 ms, 55.41 s total | |
[ 2023-10-08 00:42:52 ] Completed Epoch: 13 batch 516: computing loss 136.150 ms, 55.54 s total | |
EPOCH: [13], BATCH: [516/889], loss: 0.346, loss_box_reg: 0.102, loss_classifier: 0.083, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 516 | |
[ 2023-10-08 00:42:53 ] Completed saving temp checkpoint 1,364.489 ms, 56.91 s total | |
[ 2023-10-08 00:42:53 ] Completed replacing temp checkpoint with checkpoint 63.930 ms, 56.97 s total | |
[ 2023-10-08 00:42:53 ] Completed Epoch: 13 batch 517: moving batch data to device 10.240 ms, 56.98 s total | |
[ 2023-10-08 00:42:53 ] Completed Epoch: 13 batch 517: forward pass 107.529 ms, 57.09 s total | |
[ 2023-10-08 00:42:54 ] Completed Epoch: 13 batch 517: backward pass 75.998 ms, 57.17 s total | |
[ 2023-10-08 00:42:54 ] Completed Epoch: 13 batch 517: computing loss 118.099 ms, 57.28 s total | |
EPOCH: [13], BATCH: [517/889], loss: 0.380, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 517 | |
[ 2023-10-08 00:42:55 ] Completed saving temp checkpoint 1,031.929 ms, 58.32 s total | |
[ 2023-10-08 00:42:55 ] Completed replacing temp checkpoint with checkpoint 51.735 ms, 58.37 s total | |
[ 2023-10-08 00:42:55 ] Completed Epoch: 13 batch 518: moving batch data to device 4.605 ms, 58.37 s total | |
[ 2023-10-08 00:42:55 ] Completed Epoch: 13 batch 518: forward pass 101.053 ms, 58.47 s total | |
[ 2023-10-08 00:42:55 ] Completed Epoch: 13 batch 518: backward pass 35.259 ms, 58.51 s total | |
[ 2023-10-08 00:42:55 ] Completed Epoch: 13 batch 518: computing loss 165.083 ms, 58.67 s total | |
EPOCH: [13], BATCH: [518/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 518 | |
[ 2023-10-08 00:42:57 ] Completed saving temp checkpoint 1,608.112 ms, 60.28 s total | |
[ 2023-10-08 00:42:57 ] Completed replacing temp checkpoint with checkpoint 62.119 ms, 60.34 s total | |
[ 2023-10-08 00:42:57 ] Completed Epoch: 13 batch 519: moving batch data to device 8.083 ms, 60.35 s total | |
[ 2023-10-08 00:42:57 ] Completed Epoch: 13 batch 519: forward pass 106.191 ms, 60.46 s total | |
[ 2023-10-08 00:42:57 ] Completed Epoch: 13 batch 519: backward pass 36.280 ms, 60.49 s total | |
[ 2023-10-08 00:42:57 ] Completed Epoch: 13 batch 519: computing loss 160.835 ms, 60.66 s total | |
EPOCH: [13], BATCH: [519/889], loss: 0.381, loss_box_reg: 0.111, loss_classifier: 0.094, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 519 | |
[ 2023-10-08 00:42:58 ] Completed saving temp checkpoint 970.089 ms, 61.63 s total | |
[ 2023-10-08 00:42:58 ] Completed replacing temp checkpoint with checkpoint 47.517 ms, 61.67 s total | |
[ 2023-10-08 00:42:58 ] Completed Epoch: 13 batch 520: moving batch data to device 4.525 ms, 61.68 s total | |
[ 2023-10-08 00:42:58 ] Completed Epoch: 13 batch 520: forward pass 101.770 ms, 61.78 s total | |
[ 2023-10-08 00:42:58 ] Completed Epoch: 13 batch 520: backward pass 55.008 ms, 61.83 s total | |
[ 2023-10-08 00:42:58 ] Completed Epoch: 13 batch 520: computing loss 130.433 ms, 61.96 s total | |
EPOCH: [13], BATCH: [520/889], loss: 0.367, loss_box_reg: 0.110, loss_classifier: 0.092, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 520 | |
[ 2023-10-08 00:42:59 ] Completed saving temp checkpoint 1,147.475 ms, 63.11 s total | |
[ 2023-10-08 00:43:00 ] Completed replacing temp checkpoint with checkpoint 69.396 ms, 63.18 s total | |
[ 2023-10-08 00:43:00 ] Completed Epoch: 13 batch 521: moving batch data to device 8.175 ms, 63.19 s total | |
[ 2023-10-08 00:43:00 ] Completed Epoch: 13 batch 521: forward pass 107.647 ms, 63.30 s total | |
[ 2023-10-08 00:43:00 ] Completed Epoch: 13 batch 521: backward pass 45.757 ms, 63.34 s total | |
[ 2023-10-08 00:43:00 ] Completed Epoch: 13 batch 521: computing loss 123.815 ms, 63.47 s total | |
EPOCH: [13], BATCH: [521/889], loss: 0.362, loss_box_reg: 0.109, loss_classifier: 0.086, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 521 | |
[ 2023-10-08 00:43:01 ] Completed saving temp checkpoint 1,045.285 ms, 64.51 s total | |
[ 2023-10-08 00:43:01 ] Completed replacing temp checkpoint with checkpoint 72.868 ms, 64.59 s total | |
[ 2023-10-08 00:43:01 ] Completed Epoch: 13 batch 522: moving batch data to device 6.849 ms, 64.59 s total | |
[ 2023-10-08 00:43:01 ] Completed Epoch: 13 batch 522: forward pass 106.163 ms, 64.70 s total | |
[ 2023-10-08 00:43:01 ] Completed Epoch: 13 batch 522: backward pass 73.907 ms, 64.77 s total | |
[ 2023-10-08 00:43:01 ] Completed Epoch: 13 batch 522: computing loss 94.983 ms, 64.87 s total | |
EPOCH: [13], BATCH: [522/889], loss: 0.415, loss_box_reg: 0.127, loss_classifier: 0.101, loss_mask: 0.142, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 522 | |
[ 2023-10-08 00:43:02 ] Completed saving temp checkpoint 1,176.786 ms, 66.04 s total | |
[ 2023-10-08 00:43:02 ] Completed replacing temp checkpoint with checkpoint 69.862 ms, 66.11 s total | |
[ 2023-10-08 00:43:02 ] Completed Epoch: 13 batch 523: moving batch data to device 7.914 ms, 66.12 s total | |
[ 2023-10-08 00:43:03 ] Completed Epoch: 13 batch 523: forward pass 102.742 ms, 66.22 s total | |
[ 2023-10-08 00:43:03 ] Completed Epoch: 13 batch 523: backward pass 42.184 ms, 66.27 s total | |
[ 2023-10-08 00:43:03 ] Completed Epoch: 13 batch 523: computing loss 156.693 ms, 66.42 s total | |
EPOCH: [13], BATCH: [523/889], loss: 0.386, loss_box_reg: 0.117, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 523 | |
[ 2023-10-08 00:43:04 ] Completed saving temp checkpoint 1,071.206 ms, 67.49 s total | |
[ 2023-10-08 00:43:04 ] Completed replacing temp checkpoint with checkpoint 75.062 ms, 67.57 s total | |
[ 2023-10-08 00:43:04 ] Completed Epoch: 13 batch 524: moving batch data to device 8.157 ms, 67.58 s total | |
[ 2023-10-08 00:43:04 ] Completed Epoch: 13 batch 524: forward pass 104.333 ms, 67.68 s total | |
[ 2023-10-08 00:43:04 ] Completed Epoch: 13 batch 524: backward pass 38.717 ms, 67.72 s total | |
[ 2023-10-08 00:43:04 ] Completed Epoch: 13 batch 524: computing loss 145.239 ms, 67.87 s total | |
EPOCH: [13], BATCH: [524/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 524 | |
[ 2023-10-08 00:43:05 ] Completed saving temp checkpoint 1,186.759 ms, 69.05 s total | |
[ 2023-10-08 00:43:05 ] Completed replacing temp checkpoint with checkpoint 48.991 ms, 69.10 s total | |
[ 2023-10-08 00:43:05 ] Completed Epoch: 13 batch 525: moving batch data to device 5.626 ms, 69.11 s total | |
[ 2023-10-08 00:43:06 ] Completed Epoch: 13 batch 525: forward pass 104.628 ms, 69.21 s total | |
[ 2023-10-08 00:43:06 ] Completed Epoch: 13 batch 525: backward pass 80.315 ms, 69.29 s total | |
[ 2023-10-08 00:43:06 ] Completed Epoch: 13 batch 525: computing loss 92.816 ms, 69.38 s total | |
EPOCH: [13], BATCH: [525/889], loss: 0.381, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 525 | |
[ 2023-10-08 00:43:07 ] Completed saving temp checkpoint 1,009.221 ms, 70.39 s total | |
[ 2023-10-08 00:43:07 ] Completed replacing temp checkpoint with checkpoint 80.929 ms, 70.48 s total | |
[ 2023-10-08 00:43:07 ] Completed Epoch: 13 batch 526: moving batch data to device 6.484 ms, 70.48 s total | |
[ 2023-10-08 00:43:07 ] Completed Epoch: 13 batch 526: forward pass 107.301 ms, 70.59 s total | |
[ 2023-10-08 00:43:07 ] Completed Epoch: 13 batch 526: backward pass 74.561 ms, 70.66 s total | |
[ 2023-10-08 00:43:07 ] Completed Epoch: 13 batch 526: computing loss 96.025 ms, 70.76 s total | |
EPOCH: [13], BATCH: [526/889], loss: 0.404, loss_box_reg: 0.126, loss_classifier: 0.102, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 526 | |
[ 2023-10-08 00:43:09 ] Completed saving temp checkpoint 1,672.849 ms, 72.43 s total | |
[ 2023-10-08 00:43:09 ] Completed replacing temp checkpoint with checkpoint 93.058 ms, 72.53 s total | |
[ 2023-10-08 00:43:09 ] Completed Epoch: 13 batch 527: moving batch data to device 8.239 ms, 72.53 s total | |
[ 2023-10-08 00:43:09 ] Completed Epoch: 13 batch 527: forward pass 103.693 ms, 72.64 s total | |
[ 2023-10-08 00:43:09 ] Completed Epoch: 13 batch 527: backward pass 43.616 ms, 72.68 s total | |
[ 2023-10-08 00:43:09 ] Completed Epoch: 13 batch 527: computing loss 148.418 ms, 72.83 s total | |
EPOCH: [13], BATCH: [527/889], loss: 0.384, loss_box_reg: 0.116, loss_classifier: 0.094, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 527 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 00:56:24 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 00:56:24 ] Completed importing Timer 0.022 ms, 0.00 s total | |
[ 2023-10-08 00:56:25 ] Completed importing everything else 526.947 ms, 0.53 s total | |
[ 2023-10-08 00:56:25 ] Completed defined other functions 0.023 ms, 0.53 s total | |
| distributed init (rank 5): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 2): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 00:56:32 ] Completed main preliminaries 7,586.647 ms, 8.11 s total | |
loading annotations into memory... | |
Done (t=10.50s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.27s) | |
creating index... | |
index created! | |
[ 2023-10-08 00:56:44 ] Completed loading data 12,201.541 ms, 20.32 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 00:56:44 ] Completed creating data samplers 90.325 ms, 20.41 s total | |
[ 2023-10-08 00:56:44 ] Completed creating data loaders 0.196 ms, 20.41 s total | |
[ 2023-10-08 00:56:45 ] Completed creating model and .to(device) 650.232 ms, 21.06 s total | |
[ 2023-10-08 00:56:47 ] Completed preparing model for distributed training 2,093.232 ms, 23.15 s total | |
[ 2023-10-08 00:56:47 ] Completed optimizer and scaler 0.610 ms, 23.15 s total | |
[ 2023-10-08 00:56:47 ] Completed learning rate schedulers 0.250 ms, 23.15 s total | |
[ 2023-10-08 00:56:48 ] Completed init coco evaluator 936.205 ms, 24.09 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 00:56:49 ] Completed retrieving checkpoint 850.377 ms, 24.94 s total | |
EPOCH :: 13 | |
[ 2023-10-08 00:56:49 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 00:56:49 ] Completed training preliminaries 1.227 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 527 | |
[ 2023-10-08 00:56:49 ] Completed Epoch: 13 batch 527: moving batch data to device 520.832 ms, 0.52 s total | |
[ 2023-10-08 00:56:50 ] Completed Epoch: 13 batch 527: forward pass 915.714 ms, 1.44 s total | |
[ 2023-10-08 00:56:51 ] Completed Epoch: 13 batch 527: backward pass 178.502 ms, 1.62 s total | |
[ 2023-10-08 00:56:51 ] Completed Epoch: 13 batch 527: computing loss 533.326 ms, 2.15 s total | |
EPOCH: [13], BATCH: [527/889], loss: 0.385, loss_box_reg: 0.116, loss_classifier: 0.094, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 527 | |
[ 2023-10-08 00:56:52 ] Completed saving temp checkpoint 888.979 ms, 3.04 s total | |
[ 2023-10-08 00:56:52 ] Completed replacing temp checkpoint with checkpoint 148.237 ms, 3.19 s total | |
[ 2023-10-08 00:56:52 ] Completed Epoch: 13 batch 528: moving batch data to device 4.580 ms, 3.19 s total | |
[ 2023-10-08 00:56:52 ] Completed Epoch: 13 batch 528: forward pass 111.189 ms, 3.30 s total | |
[ 2023-10-08 00:56:52 ] Completed Epoch: 13 batch 528: backward pass 121.660 ms, 3.42 s total | |
[ 2023-10-08 00:56:52 ] Completed Epoch: 13 batch 528: computing loss 102.731 ms, 3.53 s total | |
EPOCH: [13], BATCH: [528/889], loss: 0.373, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 528 | |
[ 2023-10-08 00:56:54 ] Completed saving temp checkpoint 1,102.679 ms, 4.63 s total | |
[ 2023-10-08 00:56:54 ] Completed replacing temp checkpoint with checkpoint 58.591 ms, 4.69 s total | |
[ 2023-10-08 00:56:54 ] Completed Epoch: 13 batch 529: moving batch data to device 4.364 ms, 4.69 s total | |
[ 2023-10-08 00:56:54 ] Completed Epoch: 13 batch 529: forward pass 108.761 ms, 4.80 s total | |
[ 2023-10-08 00:56:54 ] Completed Epoch: 13 batch 529: backward pass 43.333 ms, 4.84 s total | |
[ 2023-10-08 00:56:54 ] Completed Epoch: 13 batch 529: computing loss 177.924 ms, 5.02 s total | |
EPOCH: [13], BATCH: [529/889], loss: 0.386, loss_box_reg: 0.118, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 529 | |
[ 2023-10-08 00:56:55 ] Completed saving temp checkpoint 1,014.975 ms, 6.04 s total | |
[ 2023-10-08 00:56:55 ] Completed replacing temp checkpoint with checkpoint 62.629 ms, 6.10 s total | |
[ 2023-10-08 00:56:55 ] Completed Epoch: 13 batch 530: moving batch data to device 12.550 ms, 6.11 s total | |
[ 2023-10-08 00:56:55 ] Completed Epoch: 13 batch 530: forward pass 104.637 ms, 6.22 s total | |
[ 2023-10-08 00:56:55 ] Completed Epoch: 13 batch 530: backward pass 123.873 ms, 6.34 s total | |
[ 2023-10-08 00:56:55 ] Completed Epoch: 13 batch 530: computing loss 91.857 ms, 6.43 s total | |
EPOCH: [13], BATCH: [530/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 530 | |
[ 2023-10-08 00:56:56 ] Completed saving temp checkpoint 1,071.772 ms, 7.50 s total | |
[ 2023-10-08 00:56:57 ] Completed replacing temp checkpoint with checkpoint 52.733 ms, 7.56 s total | |
[ 2023-10-08 00:56:57 ] Completed Epoch: 13 batch 531: moving batch data to device 3.571 ms, 7.56 s total | |
[ 2023-10-08 00:56:57 ] Completed Epoch: 13 batch 531: forward pass 108.553 ms, 7.67 s total | |
[ 2023-10-08 00:56:57 ] Completed Epoch: 13 batch 531: backward pass 111.824 ms, 7.78 s total | |
[ 2023-10-08 00:56:57 ] Completed Epoch: 13 batch 531: computing loss 98.785 ms, 7.88 s total | |
EPOCH: [13], BATCH: [531/889], loss: 0.371, loss_box_reg: 0.114, loss_classifier: 0.093, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 531 | |
[ 2023-10-08 00:56:58 ] Completed saving temp checkpoint 872.243 ms, 8.75 s total | |
[ 2023-10-08 00:56:58 ] Completed replacing temp checkpoint with checkpoint 65.080 ms, 8.82 s total | |
[ 2023-10-08 00:56:58 ] Completed Epoch: 13 batch 532: moving batch data to device 6.035 ms, 8.82 s total | |
[ 2023-10-08 00:56:58 ] Completed Epoch: 13 batch 532: forward pass 108.768 ms, 8.93 s total | |
[ 2023-10-08 00:56:58 ] Completed Epoch: 13 batch 532: backward pass 78.861 ms, 9.01 s total | |
[ 2023-10-08 00:56:58 ] Completed Epoch: 13 batch 532: computing loss 125.088 ms, 9.14 s total | |
EPOCH: [13], BATCH: [532/889], loss: 0.384, loss_box_reg: 0.115, loss_classifier: 0.097, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 532 | |
[ 2023-10-08 00:56:59 ] Completed saving temp checkpoint 991.307 ms, 10.13 s total | |
[ 2023-10-08 00:56:59 ] Completed replacing temp checkpoint with checkpoint 36.727 ms, 10.16 s total | |
[ 2023-10-08 00:56:59 ] Completed Epoch: 13 batch 533: moving batch data to device 2.470 ms, 10.17 s total | |
[ 2023-10-08 00:56:59 ] Completed Epoch: 13 batch 533: forward pass 104.850 ms, 10.27 s total | |
[ 2023-10-08 00:56:59 ] Completed Epoch: 13 batch 533: backward pass 73.387 ms, 10.35 s total | |
[ 2023-10-08 00:56:59 ] Completed Epoch: 13 batch 533: computing loss 142.174 ms, 10.49 s total | |
EPOCH: [13], BATCH: [533/889], loss: 0.387, loss_box_reg: 0.115, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 533 | |
[ 2023-10-08 00:57:00 ] Completed saving temp checkpoint 765.547 ms, 11.25 s total | |
[ 2023-10-08 00:57:00 ] Completed replacing temp checkpoint with checkpoint 66.476 ms, 11.32 s total | |
[ 2023-10-08 00:57:00 ] Completed Epoch: 13 batch 534: moving batch data to device 11.938 ms, 11.33 s total | |
[ 2023-10-08 00:57:00 ] Completed Epoch: 13 batch 534: forward pass 182.389 ms, 11.51 s total | |
[ 2023-10-08 00:57:01 ] Completed Epoch: 13 batch 534: backward pass 79.228 ms, 11.59 s total | |
[ 2023-10-08 00:57:01 ] Completed Epoch: 13 batch 534: computing loss 132.497 ms, 11.73 s total | |
EPOCH: [13], BATCH: [534/889], loss: 0.429, loss_box_reg: 0.132, loss_classifier: 0.112, loss_mask: 0.135, loss_objectness: 0.016, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 534 | |
[ 2023-10-08 00:57:02 ] Completed saving temp checkpoint 1,008.773 ms, 12.73 s total | |
[ 2023-10-08 00:57:02 ] Completed replacing temp checkpoint with checkpoint 70.881 ms, 12.81 s total | |
[ 2023-10-08 00:57:02 ] Completed Epoch: 13 batch 535: moving batch data to device 3.944 ms, 12.81 s total | |
[ 2023-10-08 00:57:02 ] Completed Epoch: 13 batch 535: forward pass 107.682 ms, 12.92 s total | |
[ 2023-10-08 00:57:02 ] Completed Epoch: 13 batch 535: backward pass 81.121 ms, 13.00 s total | |
[ 2023-10-08 00:57:02 ] Completed Epoch: 13 batch 535: computing loss 113.579 ms, 13.11 s total | |
EPOCH: [13], BATCH: [535/889], loss: 0.355, loss_box_reg: 0.106, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.013, loss_rpn_box_reg: 0.018 | |
Saving checkpoint at epoch 13 train batch 535 | |
[ 2023-10-08 00:57:03 ] Completed saving temp checkpoint 1,044.111 ms, 14.16 s total | |
[ 2023-10-08 00:57:03 ] Completed replacing temp checkpoint with checkpoint 39.285 ms, 14.19 s total | |
[ 2023-10-08 00:57:03 ] Completed Epoch: 13 batch 536: moving batch data to device 5.304 ms, 14.20 s total | |
[ 2023-10-08 00:57:03 ] Completed Epoch: 13 batch 536: forward pass 108.829 ms, 14.31 s total | |
[ 2023-10-08 00:57:03 ] Completed Epoch: 13 batch 536: backward pass 49.005 ms, 14.36 s total | |
[ 2023-10-08 00:57:03 ] Completed Epoch: 13 batch 536: computing loss 143.953 ms, 14.50 s total | |
EPOCH: [13], BATCH: [536/889], loss: 0.383, loss_box_reg: 0.117, loss_classifier: 0.098, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 536 | |
[ 2023-10-08 00:57:05 ] Completed saving temp checkpoint 1,358.229 ms, 15.86 s total | |
[ 2023-10-08 00:57:05 ] Completed replacing temp checkpoint with checkpoint 90.491 ms, 15.95 s total | |
[ 2023-10-08 00:57:05 ] Completed Epoch: 13 batch 537: moving batch data to device 6.905 ms, 15.96 s total | |
[ 2023-10-08 00:57:05 ] Completed Epoch: 13 batch 537: forward pass 107.992 ms, 16.07 s total | |
[ 2023-10-08 00:57:05 ] Completed Epoch: 13 batch 537: backward pass 77.719 ms, 16.14 s total | |
[ 2023-10-08 00:57:05 ] Completed Epoch: 13 batch 537: computing loss 115.172 ms, 16.26 s total | |
EPOCH: [13], BATCH: [537/889], loss: 0.357, loss_box_reg: 0.106, loss_classifier: 0.083, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 537 | |
[ 2023-10-08 00:57:07 ] Completed saving temp checkpoint 1,461.423 ms, 17.72 s total | |
[ 2023-10-08 00:57:07 ] Completed replacing temp checkpoint with checkpoint 87.624 ms, 17.81 s total | |
[ 2023-10-08 00:57:07 ] Completed Epoch: 13 batch 538: moving batch data to device 6.634 ms, 17.81 s total | |
[ 2023-10-08 00:57:07 ] Completed Epoch: 13 batch 538: forward pass 109.444 ms, 17.92 s total | |
[ 2023-10-08 00:57:07 ] Completed Epoch: 13 batch 538: backward pass 47.971 ms, 17.97 s total | |
[ 2023-10-08 00:57:07 ] Completed Epoch: 13 batch 538: computing loss 150.146 ms, 18.12 s total | |
EPOCH: [13], BATCH: [538/889], loss: 0.375, loss_box_reg: 0.113, loss_classifier: 0.090, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 538 | |
[ 2023-10-08 00:57:09 ] Completed saving temp checkpoint 1,717.518 ms, 19.84 s total | |
[ 2023-10-08 00:57:09 ] Completed replacing temp checkpoint with checkpoint 73.948 ms, 19.91 s total | |
[ 2023-10-08 00:57:09 ] Completed Epoch: 13 batch 539: moving batch data to device 8.644 ms, 19.92 s total | |
[ 2023-10-08 00:57:09 ] Completed Epoch: 13 batch 539: forward pass 107.199 ms, 20.03 s total | |
[ 2023-10-08 00:57:09 ] Completed Epoch: 13 batch 539: backward pass 40.085 ms, 20.07 s total | |
[ 2023-10-08 00:57:09 ] Completed Epoch: 13 batch 539: computing loss 145.964 ms, 20.22 s total | |
EPOCH: [13], BATCH: [539/889], loss: 0.385, loss_box_reg: 0.117, loss_classifier: 0.096, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 539 | |
[ 2023-10-08 00:57:10 ] Completed saving temp checkpoint 1,021.601 ms, 21.24 s total | |
[ 2023-10-08 00:57:10 ] Completed replacing temp checkpoint with checkpoint 61.268 ms, 21.30 s total | |
[ 2023-10-08 00:57:10 ] Completed Epoch: 13 batch 540: moving batch data to device 6.949 ms, 21.30 s total | |
[ 2023-10-08 00:57:10 ] Completed Epoch: 13 batch 540: forward pass 104.998 ms, 21.41 s total | |
[ 2023-10-08 00:57:10 ] Completed Epoch: 13 batch 540: backward pass 46.221 ms, 21.46 s total | |
[ 2023-10-08 00:57:11 ] Completed Epoch: 13 batch 540: computing loss 140.599 ms, 21.60 s total | |
EPOCH: [13], BATCH: [540/889], loss: 0.377, loss_box_reg: 0.119, loss_classifier: 0.098, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.019 | |
Saving checkpoint at epoch 13 train batch 540 | |
[ 2023-10-08 00:57:12 ] Completed saving temp checkpoint 1,501.016 ms, 23.10 s total | |
[ 2023-10-08 00:57:12 ] Completed replacing temp checkpoint with checkpoint 47.843 ms, 23.15 s total | |
[ 2023-10-08 00:57:12 ] Completed Epoch: 13 batch 541: moving batch data to device 4.997 ms, 23.15 s total | |
[ 2023-10-08 00:57:12 ] Completed Epoch: 13 batch 541: forward pass 103.622 ms, 23.25 s total | |
[ 2023-10-08 00:57:12 ] Completed Epoch: 13 batch 541: backward pass 73.379 ms, 23.33 s total | |
[ 2023-10-08 00:57:12 ] Completed Epoch: 13 batch 541: computing loss 117.636 ms, 23.45 s total | |
EPOCH: [13], BATCH: [541/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.123, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 541 | |
[ 2023-10-08 00:57:14 ] Completed saving temp checkpoint 1,564.629 ms, 25.01 s total | |
[ 2023-10-08 00:57:14 ] Completed replacing temp checkpoint with checkpoint 82.025 ms, 25.09 s total | |
[ 2023-10-08 00:57:14 ] Completed Epoch: 13 batch 542: moving batch data to device 7.626 ms, 25.10 s total | |
[ 2023-10-08 00:57:14 ] Completed Epoch: 13 batch 542: forward pass 104.113 ms, 25.20 s total | |
[ 2023-10-08 00:57:14 ] Completed Epoch: 13 batch 542: backward pass 53.004 ms, 25.26 s total | |
[ 2023-10-08 00:57:14 ] Completed Epoch: 13 batch 542: computing loss 134.150 ms, 25.39 s total | |
EPOCH: [13], BATCH: [542/889], loss: 0.381, loss_box_reg: 0.115, loss_classifier: 0.089, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 542 | |
[ 2023-10-08 00:57:16 ] Completed saving temp checkpoint 1,387.018 ms, 26.78 s total | |
[ 2023-10-08 00:57:16 ] Completed replacing temp checkpoint with checkpoint 86.421 ms, 26.86 s total | |
[ 2023-10-08 00:57:16 ] Completed Epoch: 13 batch 543: moving batch data to device 7.004 ms, 26.87 s total | |
[ 2023-10-08 00:57:16 ] Completed Epoch: 13 batch 543: forward pass 108.692 ms, 26.98 s total | |
[ 2023-10-08 00:57:16 ] Completed Epoch: 13 batch 543: backward pass 74.890 ms, 27.05 s total | |
[ 2023-10-08 00:57:16 ] Completed Epoch: 13 batch 543: computing loss 109.897 ms, 27.16 s total | |
EPOCH: [13], BATCH: [543/889], loss: 0.391, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 543 | |
[ 2023-10-08 00:57:17 ] Completed saving temp checkpoint 1,010.338 ms, 28.17 s total | |
[ 2023-10-08 00:57:17 ] Completed replacing temp checkpoint with checkpoint 53.616 ms, 28.23 s total | |
[ 2023-10-08 00:57:17 ] Completed Epoch: 13 batch 544: moving batch data to device 8.224 ms, 28.24 s total | |
[ 2023-10-08 00:57:17 ] Completed Epoch: 13 batch 544: forward pass 102.136 ms, 28.34 s total | |
[ 2023-10-08 00:57:17 ] Completed Epoch: 13 batch 544: backward pass 46.314 ms, 28.39 s total | |
[ 2023-10-08 00:57:17 ] Completed Epoch: 13 batch 544: computing loss 145.723 ms, 28.53 s total | |
EPOCH: [13], BATCH: [544/889], loss: 0.392, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 544 | |
[ 2023-10-08 00:57:19 ] Completed saving temp checkpoint 1,207.610 ms, 29.74 s total | |
[ 2023-10-08 00:57:19 ] Completed replacing temp checkpoint with checkpoint 79.166 ms, 29.82 s total | |
[ 2023-10-08 00:57:19 ] Completed Epoch: 13 batch 545: moving batch data to device 5.891 ms, 29.82 s total | |
[ 2023-10-08 00:57:19 ] Completed Epoch: 13 batch 545: forward pass 104.219 ms, 29.93 s total | |
[ 2023-10-08 00:57:19 ] Completed Epoch: 13 batch 545: backward pass 79.617 ms, 30.01 s total | |
[ 2023-10-08 00:57:19 ] Completed Epoch: 13 batch 545: computing loss 120.084 ms, 30.13 s total | |
EPOCH: [13], BATCH: [545/889], loss: 0.406, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.138, loss_objectness: 0.019, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 545 | |
[ 2023-10-08 00:57:20 ] Completed saving temp checkpoint 957.788 ms, 31.09 s total | |
[ 2023-10-08 00:57:20 ] Completed replacing temp checkpoint with checkpoint 48.076 ms, 31.13 s total | |
[ 2023-10-08 00:57:20 ] Completed Epoch: 13 batch 546: moving batch data to device 6.862 ms, 31.14 s total | |
[ 2023-10-08 00:57:20 ] Completed Epoch: 13 batch 546: forward pass 161.391 ms, 31.30 s total | |
[ 2023-10-08 00:57:20 ] Completed Epoch: 13 batch 546: backward pass 34.958 ms, 31.34 s total | |
[ 2023-10-08 00:57:20 ] Completed Epoch: 13 batch 546: computing loss 149.884 ms, 31.49 s total | |
EPOCH: [13], BATCH: [546/889], loss: 0.390, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 546 | |
[ 2023-10-08 00:57:22 ] Completed saving temp checkpoint 1,060.030 ms, 32.55 s total | |
[ 2023-10-08 00:57:22 ] Completed replacing temp checkpoint with checkpoint 71.367 ms, 32.62 s total | |
[ 2023-10-08 00:57:22 ] Completed Epoch: 13 batch 547: moving batch data to device 8.914 ms, 32.63 s total | |
[ 2023-10-08 00:57:22 ] Completed Epoch: 13 batch 547: forward pass 105.323 ms, 32.73 s total | |
[ 2023-10-08 00:57:22 ] Completed Epoch: 13 batch 547: backward pass 78.143 ms, 32.81 s total | |
[ 2023-10-08 00:57:22 ] Completed Epoch: 13 batch 547: computing loss 111.627 ms, 32.92 s total | |
EPOCH: [13], BATCH: [547/889], loss: 0.413, loss_box_reg: 0.129, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 547 | |
[ 2023-10-08 00:57:23 ] Completed saving temp checkpoint 961.736 ms, 33.88 s total | |
[ 2023-10-08 00:57:23 ] Completed replacing temp checkpoint with checkpoint 68.827 ms, 33.95 s total | |
[ 2023-10-08 00:57:23 ] Completed Epoch: 13 batch 548: moving batch data to device 8.212 ms, 33.96 s total | |
[ 2023-10-08 00:57:23 ] Completed Epoch: 13 batch 548: forward pass 107.932 ms, 34.07 s total | |
[ 2023-10-08 00:57:23 ] Completed Epoch: 13 batch 548: backward pass 73.142 ms, 34.14 s total | |
[ 2023-10-08 00:57:23 ] Completed Epoch: 13 batch 548: computing loss 97.996 ms, 34.24 s total | |
EPOCH: [13], BATCH: [548/889], loss: 0.391, loss_box_reg: 0.124, loss_classifier: 0.094, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 548 | |
[ 2023-10-08 00:57:24 ] Completed saving temp checkpoint 1,197.097 ms, 35.44 s total | |
[ 2023-10-08 00:57:24 ] Completed replacing temp checkpoint with checkpoint 70.085 ms, 35.51 s total | |
[ 2023-10-08 00:57:24 ] Completed Epoch: 13 batch 549: moving batch data to device 7.266 ms, 35.51 s total | |
[ 2023-10-08 00:57:25 ] Completed Epoch: 13 batch 549: forward pass 113.086 ms, 35.63 s total | |
[ 2023-10-08 00:57:25 ] Completed Epoch: 13 batch 549: backward pass 79.516 ms, 35.71 s total | |
[ 2023-10-08 00:57:25 ] Completed Epoch: 13 batch 549: computing loss 118.107 ms, 35.82 s total | |
EPOCH: [13], BATCH: [549/889], loss: 0.404, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 549 | |
[ 2023-10-08 00:57:26 ] Completed saving temp checkpoint 1,013.694 ms, 36.84 s total | |
[ 2023-10-08 00:57:26 ] Completed replacing temp checkpoint with checkpoint 71.777 ms, 36.91 s total | |
[ 2023-10-08 00:57:26 ] Completed Epoch: 13 batch 550: moving batch data to device 8.742 ms, 36.92 s total | |
[ 2023-10-08 00:57:26 ] Completed Epoch: 13 batch 550: forward pass 101.601 ms, 37.02 s total | |
[ 2023-10-08 00:57:26 ] Completed Epoch: 13 batch 550: backward pass 46.305 ms, 37.07 s total | |
[ 2023-10-08 00:57:26 ] Completed Epoch: 13 batch 550: computing loss 122.384 ms, 37.19 s total | |
EPOCH: [13], BATCH: [550/889], loss: 0.363, loss_box_reg: 0.109, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 550 | |
[ 2023-10-08 00:57:27 ] Completed saving temp checkpoint 1,150.058 ms, 38.34 s total | |
[ 2023-10-08 00:57:27 ] Completed replacing temp checkpoint with checkpoint 65.687 ms, 38.41 s total | |
[ 2023-10-08 00:57:27 ] Completed Epoch: 13 batch 551: moving batch data to device 6.235 ms, 38.41 s total | |
[ 2023-10-08 00:57:27 ] Completed Epoch: 13 batch 551: forward pass 105.059 ms, 38.52 s total | |
[ 2023-10-08 00:57:28 ] Completed Epoch: 13 batch 551: backward pass 52.921 ms, 38.57 s total | |
[ 2023-10-08 00:57:28 ] Completed Epoch: 13 batch 551: computing loss 144.337 ms, 38.71 s total | |
EPOCH: [13], BATCH: [551/889], loss: 0.381, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 551 | |
[ 2023-10-08 00:57:29 ] Completed saving temp checkpoint 1,007.947 ms, 39.72 s total | |
[ 2023-10-08 00:57:29 ] Completed replacing temp checkpoint with checkpoint 70.279 ms, 39.79 s total | |
[ 2023-10-08 00:57:29 ] Completed Epoch: 13 batch 552: moving batch data to device 7.367 ms, 39.80 s total | |
[ 2023-10-08 00:57:29 ] Completed Epoch: 13 batch 552: forward pass 103.372 ms, 39.90 s total | |
[ 2023-10-08 00:57:29 ] Completed Epoch: 13 batch 552: backward pass 76.847 ms, 39.98 s total | |
[ 2023-10-08 00:57:29 ] Completed Epoch: 13 batch 552: computing loss 93.294 ms, 40.07 s total | |
EPOCH: [13], BATCH: [552/889], loss: 0.402, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.036 | |
Saving checkpoint at epoch 13 train batch 552 | |
[ 2023-10-08 00:57:30 ] Completed saving temp checkpoint 1,431.680 ms, 41.50 s total | |
[ 2023-10-08 00:57:31 ] Completed replacing temp checkpoint with checkpoint 79.337 ms, 41.58 s total | |
[ 2023-10-08 00:57:31 ] Completed Epoch: 13 batch 553: moving batch data to device 8.929 ms, 41.59 s total | |
[ 2023-10-08 00:57:31 ] Completed Epoch: 13 batch 553: forward pass 105.606 ms, 41.70 s total | |
[ 2023-10-08 00:57:31 ] Completed Epoch: 13 batch 553: backward pass 53.821 ms, 41.75 s total | |
[ 2023-10-08 00:57:31 ] Completed Epoch: 13 batch 553: computing loss 145.301 ms, 41.90 s total | |
EPOCH: [13], BATCH: [553/889], loss: 0.372, loss_box_reg: 0.117, loss_classifier: 0.087, loss_mask: 0.128, loss_objectness: 0.012, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 553 | |
[ 2023-10-08 00:57:32 ] Completed saving temp checkpoint 1,531.088 ms, 43.43 s total | |
[ 2023-10-08 00:57:32 ] Completed replacing temp checkpoint with checkpoint 74.524 ms, 43.50 s total | |
[ 2023-10-08 00:57:32 ] Completed Epoch: 13 batch 554: moving batch data to device 6.400 ms, 43.51 s total | |
[ 2023-10-08 00:57:33 ] Completed Epoch: 13 batch 554: forward pass 104.496 ms, 43.61 s total | |
[ 2023-10-08 00:57:33 ] Completed Epoch: 13 batch 554: backward pass 69.084 ms, 43.68 s total | |
[ 2023-10-08 00:57:33 ] Completed Epoch: 13 batch 554: computing loss 122.880 ms, 43.81 s total | |
EPOCH: [13], BATCH: [554/889], loss: 0.364, loss_box_reg: 0.109, loss_classifier: 0.088, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 554 | |
[ 2023-10-08 00:57:34 ] Completed saving temp checkpoint 1,600.222 ms, 45.41 s total | |
[ 2023-10-08 00:57:34 ] Completed replacing temp checkpoint with checkpoint 75.219 ms, 45.48 s total | |
[ 2023-10-08 00:57:34 ] Completed Epoch: 13 batch 555: moving batch data to device 6.236 ms, 45.49 s total | |
[ 2023-10-08 00:57:35 ] Completed Epoch: 13 batch 555: forward pass 106.072 ms, 45.59 s total | |
[ 2023-10-08 00:57:35 ] Completed Epoch: 13 batch 555: backward pass 78.416 ms, 45.67 s total | |
[ 2023-10-08 00:57:35 ] Completed Epoch: 13 batch 555: computing loss 94.276 ms, 45.77 s total | |
EPOCH: [13], BATCH: [555/889], loss: 0.387, loss_box_reg: 0.116, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 555 | |
[ 2023-10-08 00:57:36 ] Completed saving temp checkpoint 1,273.717 ms, 47.04 s total | |
[ 2023-10-08 00:57:36 ] Completed replacing temp checkpoint with checkpoint 54.922 ms, 47.10 s total | |
[ 2023-10-08 00:57:36 ] Completed Epoch: 13 batch 556: moving batch data to device 7.510 ms, 47.10 s total | |
[ 2023-10-08 00:57:36 ] Completed Epoch: 13 batch 556: forward pass 105.232 ms, 47.21 s total | |
[ 2023-10-08 00:57:36 ] Completed Epoch: 13 batch 556: backward pass 48.763 ms, 47.26 s total | |
[ 2023-10-08 00:57:36 ] Completed Epoch: 13 batch 556: computing loss 145.268 ms, 47.40 s total | |
EPOCH: [13], BATCH: [556/889], loss: 0.388, loss_box_reg: 0.119, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 556 | |
[ 2023-10-08 00:57:38 ] Completed saving temp checkpoint 2,038.376 ms, 49.44 s total | |
[ 2023-10-08 00:57:38 ] Completed replacing temp checkpoint with checkpoint 76.533 ms, 49.52 s total | |
[ 2023-10-08 00:57:38 ] Completed Epoch: 13 batch 557: moving batch data to device 5.730 ms, 49.52 s total | |
[ 2023-10-08 00:57:39 ] Completed Epoch: 13 batch 557: forward pass 105.052 ms, 49.63 s total | |
[ 2023-10-08 00:57:39 ] Completed Epoch: 13 batch 557: backward pass 60.647 ms, 49.69 s total | |
[ 2023-10-08 00:57:39 ] Completed Epoch: 13 batch 557: computing loss 136.984 ms, 49.83 s total | |
EPOCH: [13], BATCH: [557/889], loss: 0.403, loss_box_reg: 0.126, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 557 | |
[ 2023-10-08 00:57:40 ] Completed saving temp checkpoint 1,156.396 ms, 50.98 s total | |
[ 2023-10-08 00:57:40 ] Completed replacing temp checkpoint with checkpoint 59.172 ms, 51.04 s total | |
[ 2023-10-08 00:57:40 ] Completed Epoch: 13 batch 558: moving batch data to device 8.366 ms, 51.05 s total | |
[ 2023-10-08 00:57:40 ] Completed Epoch: 13 batch 558: forward pass 104.926 ms, 51.15 s total | |
[ 2023-10-08 00:57:40 ] Completed Epoch: 13 batch 558: backward pass 73.769 ms, 51.23 s total | |
[ 2023-10-08 00:57:40 ] Completed Epoch: 13 batch 558: computing loss 116.492 ms, 51.34 s total | |
EPOCH: [13], BATCH: [558/889], loss: 0.392, loss_box_reg: 0.116, loss_classifier: 0.105, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 558 | |
[ 2023-10-08 00:57:42 ] Completed saving temp checkpoint 1,252.143 ms, 52.60 s total | |
[ 2023-10-08 00:57:42 ] Completed replacing temp checkpoint with checkpoint 76.524 ms, 52.67 s total | |
[ 2023-10-08 00:57:42 ] Completed Epoch: 13 batch 559: moving batch data to device 6.568 ms, 52.68 s total | |
[ 2023-10-08 00:57:42 ] Completed Epoch: 13 batch 559: forward pass 102.443 ms, 52.78 s total | |
[ 2023-10-08 00:57:42 ] Completed Epoch: 13 batch 559: backward pass 67.602 ms, 52.85 s total | |
[ 2023-10-08 00:57:42 ] Completed Epoch: 13 batch 559: computing loss 119.724 ms, 52.97 s total | |
EPOCH: [13], BATCH: [559/889], loss: 0.378, loss_box_reg: 0.113, loss_classifier: 0.098, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 559 | |
[ 2023-10-08 00:57:43 ] Completed saving temp checkpoint 1,103.174 ms, 54.07 s total | |
[ 2023-10-08 00:57:43 ] Completed replacing temp checkpoint with checkpoint 58.563 ms, 54.13 s total | |
[ 2023-10-08 00:57:43 ] Completed Epoch: 13 batch 560: moving batch data to device 5.359 ms, 54.14 s total | |
[ 2023-10-08 00:57:43 ] Completed Epoch: 13 batch 560: forward pass 106.597 ms, 54.24 s total | |
[ 2023-10-08 00:57:43 ] Completed Epoch: 13 batch 560: backward pass 34.179 ms, 54.28 s total | |
[ 2023-10-08 00:57:43 ] Completed Epoch: 13 batch 560: computing loss 133.553 ms, 54.41 s total | |
EPOCH: [13], BATCH: [560/889], loss: 0.354, loss_box_reg: 0.103, loss_classifier: 0.089, loss_mask: 0.121, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 560 | |
[ 2023-10-08 00:57:45 ] Completed saving temp checkpoint 1,212.498 ms, 55.62 s total | |
[ 2023-10-08 00:57:45 ] Completed replacing temp checkpoint with checkpoint 78.619 ms, 55.70 s total | |
[ 2023-10-08 00:57:45 ] Completed Epoch: 13 batch 561: moving batch data to device 6.335 ms, 55.71 s total | |
[ 2023-10-08 00:57:45 ] Completed Epoch: 13 batch 561: forward pass 103.083 ms, 55.81 s total | |
[ 2023-10-08 00:57:45 ] Completed Epoch: 13 batch 561: backward pass 47.007 ms, 55.86 s total | |
[ 2023-10-08 00:57:45 ] Completed Epoch: 13 batch 561: computing loss 121.612 ms, 55.98 s total | |
EPOCH: [13], BATCH: [561/889], loss: 0.389, loss_box_reg: 0.112, loss_classifier: 0.097, loss_mask: 0.137, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 561 | |
[ 2023-10-08 00:57:46 ] Completed saving temp checkpoint 1,094.193 ms, 57.07 s total | |
[ 2023-10-08 00:57:46 ] Completed replacing temp checkpoint with checkpoint 82.993 ms, 57.16 s total | |
[ 2023-10-08 00:57:46 ] Completed Epoch: 13 batch 562: moving batch data to device 8.324 ms, 57.17 s total | |
[ 2023-10-08 00:57:46 ] Completed Epoch: 13 batch 562: forward pass 104.639 ms, 57.27 s total | |
[ 2023-10-08 00:57:46 ] Completed Epoch: 13 batch 562: backward pass 75.239 ms, 57.35 s total | |
[ 2023-10-08 00:57:46 ] Completed Epoch: 13 batch 562: computing loss 115.966 ms, 57.46 s total | |
EPOCH: [13], BATCH: [562/889], loss: 0.386, loss_box_reg: 0.114, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.017, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 562 | |
[ 2023-10-08 00:57:48 ] Completed saving temp checkpoint 1,239.831 ms, 58.70 s total | |
[ 2023-10-08 00:57:48 ] Completed replacing temp checkpoint with checkpoint 68.087 ms, 58.77 s total | |
[ 2023-10-08 00:57:48 ] Completed Epoch: 13 batch 563: moving batch data to device 4.628 ms, 58.77 s total | |
[ 2023-10-08 00:57:48 ] Completed Epoch: 13 batch 563: forward pass 104.559 ms, 58.88 s total | |
[ 2023-10-08 00:57:48 ] Completed Epoch: 13 batch 563: backward pass 80.304 ms, 58.96 s total | |
[ 2023-10-08 00:57:48 ] Completed Epoch: 13 batch 563: computing loss 115.155 ms, 59.07 s total | |
EPOCH: [13], BATCH: [563/889], loss: 0.377, loss_box_reg: 0.117, loss_classifier: 0.094, loss_mask: 0.123, loss_objectness: 0.018, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 563 | |
[ 2023-10-08 00:57:49 ] Completed saving temp checkpoint 1,110.223 ms, 60.18 s total | |
[ 2023-10-08 00:57:49 ] Completed replacing temp checkpoint with checkpoint 47.891 ms, 60.23 s total | |
[ 2023-10-08 00:57:49 ] Completed Epoch: 13 batch 564: moving batch data to device 6.752 ms, 60.24 s total | |
[ 2023-10-08 00:57:49 ] Completed Epoch: 13 batch 564: forward pass 113.005 ms, 60.35 s total | |
[ 2023-10-08 00:57:49 ] Completed Epoch: 13 batch 564: backward pass 70.761 ms, 60.42 s total | |
[ 2023-10-08 00:57:49 ] Completed Epoch: 13 batch 564: computing loss 103.301 ms, 60.53 s total | |
EPOCH: [13], BATCH: [564/889], loss: 0.359, loss_box_reg: 0.107, loss_classifier: 0.087, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 564 | |
[ 2023-10-08 00:57:51 ] Completed saving temp checkpoint 1,274.329 ms, 61.80 s total | |
[ 2023-10-08 00:57:51 ] Completed replacing temp checkpoint with checkpoint 89.751 ms, 61.89 s total | |
[ 2023-10-08 00:57:51 ] Completed Epoch: 13 batch 565: moving batch data to device 7.436 ms, 61.90 s total | |
[ 2023-10-08 00:57:51 ] Completed Epoch: 13 batch 565: forward pass 104.396 ms, 62.00 s total | |
[ 2023-10-08 00:57:51 ] Completed Epoch: 13 batch 565: backward pass 75.271 ms, 62.08 s total | |
[ 2023-10-08 00:57:51 ] Completed Epoch: 13 batch 565: computing loss 122.076 ms, 62.20 s total | |
EPOCH: [13], BATCH: [565/889], loss: 0.391, loss_box_reg: 0.122, loss_classifier: 0.099, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 565 | |
[ 2023-10-08 00:57:53 ] Completed saving temp checkpoint 1,628.517 ms, 63.83 s total | |
[ 2023-10-08 00:57:53 ] Completed replacing temp checkpoint with checkpoint 87.532 ms, 63.92 s total | |
[ 2023-10-08 00:57:53 ] Completed Epoch: 13 batch 566: moving batch data to device 8.162 ms, 63.92 s total | |
[ 2023-10-08 00:57:53 ] Completed Epoch: 13 batch 566: forward pass 103.842 ms, 64.03 s total | |
[ 2023-10-08 00:57:53 ] Completed Epoch: 13 batch 566: backward pass 75.459 ms, 64.10 s total | |
[ 2023-10-08 00:57:53 ] Completed Epoch: 13 batch 566: computing loss 92.419 ms, 64.19 s total | |
EPOCH: [13], BATCH: [566/889], loss: 0.385, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 566 | |
[ 2023-10-08 00:57:55 ] Completed saving temp checkpoint 1,939.362 ms, 66.13 s total | |
[ 2023-10-08 00:57:55 ] Completed replacing temp checkpoint with checkpoint 94.713 ms, 66.23 s total | |
[ 2023-10-08 00:57:55 ] Completed Epoch: 13 batch 567: moving batch data to device 8.366 ms, 66.24 s total | |
[ 2023-10-08 00:57:55 ] Completed Epoch: 13 batch 567: forward pass 107.416 ms, 66.34 s total | |
[ 2023-10-08 00:57:55 ] Completed Epoch: 13 batch 567: backward pass 73.205 ms, 66.42 s total | |
[ 2023-10-08 00:57:55 ] Completed Epoch: 13 batch 567: computing loss 113.341 ms, 66.53 s total | |
EPOCH: [13], BATCH: [567/889], loss: 0.412, loss_box_reg: 0.124, loss_classifier: 0.105, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 567 | |
[ 2023-10-08 00:57:57 ] Completed saving temp checkpoint 1,456.273 ms, 67.99 s total | |
[ 2023-10-08 00:57:57 ] Completed replacing temp checkpoint with checkpoint 97.677 ms, 68.09 s total | |
[ 2023-10-08 00:57:57 ] Completed Epoch: 13 batch 568: moving batch data to device 8.245 ms, 68.09 s total | |
[ 2023-10-08 00:57:57 ] Completed Epoch: 13 batch 568: forward pass 104.925 ms, 68.20 s total | |
[ 2023-10-08 00:57:57 ] Completed Epoch: 13 batch 568: backward pass 70.493 ms, 68.27 s total | |
[ 2023-10-08 00:57:57 ] Completed Epoch: 13 batch 568: computing loss 126.705 ms, 68.40 s total | |
EPOCH: [13], BATCH: [568/889], loss: 0.375, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.131, loss_objectness: 0.013, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 568 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 01:11:11 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 01:11:11 ] Completed importing Timer 0.021 ms, 0.00 s total | |
[ 2023-10-08 01:11:12 ] Completed importing everything else 534.423 ms, 0.53 s total | |
[ 2023-10-08 01:11:12 ] Completed defined other functions 0.021 ms, 0.53 s total | |
| distributed init (rank 1): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 4): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 01:11:14 ] Completed main preliminaries 2,915.570 ms, 3.45 s total | |
loading annotations into memory... | |
Done (t=11.25s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-08 01:11:28 ] Completed loading data 13,181.690 ms, 16.63 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 01:11:28 ] Completed creating data samplers 110.381 ms, 16.74 s total | |
[ 2023-10-08 01:11:28 ] Completed creating data loaders 0.228 ms, 16.74 s total | |
[ 2023-10-08 01:11:28 ] Completed creating model and .to(device) 667.427 ms, 17.41 s total | |
[ 2023-10-08 01:11:30 ] Completed preparing model for distributed training 1,531.996 ms, 18.94 s total | |
[ 2023-10-08 01:11:30 ] Completed optimizer and scaler 0.627 ms, 18.94 s total | |
[ 2023-10-08 01:11:30 ] Completed learning rate schedulers 0.279 ms, 18.94 s total | |
[ 2023-10-08 01:11:31 ] Completed init coco evaluator 972.331 ms, 19.91 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 01:11:32 ] Completed retrieving checkpoint 817.981 ms, 20.73 s total | |
EPOCH :: 13 | |
[ 2023-10-08 01:11:32 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 01:11:32 ] Completed training preliminaries 0.880 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 568 | |
[ 2023-10-08 01:11:32 ] Completed Epoch: 13 batch 568: moving batch data to device 557.138 ms, 0.56 s total | |
[ 2023-10-08 01:11:33 ] Completed Epoch: 13 batch 568: forward pass 1,197.166 ms, 1.76 s total | |
[ 2023-10-08 01:11:34 ] Completed Epoch: 13 batch 568: backward pass 181.581 ms, 1.94 s total | |
[ 2023-10-08 01:11:34 ] Completed Epoch: 13 batch 568: computing loss 160.286 ms, 2.10 s total | |
EPOCH: [13], BATCH: [568/889], loss: 0.374, loss_box_reg: 0.112, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 568 | |
[ 2023-10-08 01:11:35 ] Completed saving temp checkpoint 1,027.241 ms, 3.12 s total | |
[ 2023-10-08 01:11:35 ] Completed replacing temp checkpoint with checkpoint 150.079 ms, 3.27 s total | |
[ 2023-10-08 01:11:35 ] Completed Epoch: 13 batch 569: moving batch data to device 3.561 ms, 3.28 s total | |
[ 2023-10-08 01:11:35 ] Completed Epoch: 13 batch 569: forward pass 108.663 ms, 3.39 s total | |
[ 2023-10-08 01:11:35 ] Completed Epoch: 13 batch 569: backward pass 123.645 ms, 3.51 s total | |
[ 2023-10-08 01:11:35 ] Completed Epoch: 13 batch 569: computing loss 99.372 ms, 3.61 s total | |
EPOCH: [13], BATCH: [569/889], loss: 0.380, loss_box_reg: 0.116, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 569 | |
[ 2023-10-08 01:11:36 ] Completed saving temp checkpoint 1,039.206 ms, 4.65 s total | |
[ 2023-10-08 01:11:36 ] Completed replacing temp checkpoint with checkpoint 70.962 ms, 4.72 s total | |
[ 2023-10-08 01:11:36 ] Completed Epoch: 13 batch 570: moving batch data to device 3.317 ms, 4.72 s total | |
[ 2023-10-08 01:11:37 ] Completed Epoch: 13 batch 570: forward pass 109.464 ms, 4.83 s total | |
[ 2023-10-08 01:11:37 ] Completed Epoch: 13 batch 570: backward pass 82.495 ms, 4.92 s total | |
[ 2023-10-08 01:11:37 ] Completed Epoch: 13 batch 570: computing loss 138.979 ms, 5.05 s total | |
EPOCH: [13], BATCH: [570/889], loss: 0.415, loss_box_reg: 0.127, loss_classifier: 0.099, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 570 | |
[ 2023-10-08 01:11:38 ] Completed saving temp checkpoint 1,118.625 ms, 6.17 s total | |
[ 2023-10-08 01:11:38 ] Completed replacing temp checkpoint with checkpoint 47.203 ms, 6.22 s total | |
[ 2023-10-08 01:11:38 ] Completed Epoch: 13 batch 571: moving batch data to device 7.374 ms, 6.23 s total | |
[ 2023-10-08 01:11:38 ] Completed Epoch: 13 batch 571: forward pass 103.848 ms, 6.33 s total | |
[ 2023-10-08 01:11:38 ] Completed Epoch: 13 batch 571: backward pass 49.906 ms, 6.38 s total | |
[ 2023-10-08 01:11:38 ] Completed Epoch: 13 batch 571: computing loss 201.311 ms, 6.58 s total | |
EPOCH: [13], BATCH: [571/889], loss: 0.364, loss_box_reg: 0.105, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.012, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 571 | |
[ 2023-10-08 01:11:39 ] Completed saving temp checkpoint 934.523 ms, 7.52 s total | |
[ 2023-10-08 01:11:39 ] Completed replacing temp checkpoint with checkpoint 66.131 ms, 7.58 s total | |
[ 2023-10-08 01:11:39 ] Completed Epoch: 13 batch 572: moving batch data to device 11.880 ms, 7.59 s total | |
[ 2023-10-08 01:11:39 ] Completed Epoch: 13 batch 572: forward pass 114.110 ms, 7.71 s total | |
[ 2023-10-08 01:11:40 ] Completed Epoch: 13 batch 572: backward pass 76.141 ms, 7.79 s total | |
[ 2023-10-08 01:11:40 ] Completed Epoch: 13 batch 572: computing loss 142.399 ms, 7.93 s total | |
EPOCH: [13], BATCH: [572/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.094, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 572 | |
[ 2023-10-08 01:11:41 ] Completed saving temp checkpoint 1,104.875 ms, 9.03 s total | |
[ 2023-10-08 01:11:41 ] Completed replacing temp checkpoint with checkpoint 75.413 ms, 9.11 s total | |
[ 2023-10-08 01:11:41 ] Completed Epoch: 13 batch 573: moving batch data to device 10.309 ms, 9.12 s total | |
[ 2023-10-08 01:11:41 ] Completed Epoch: 13 batch 573: forward pass 108.839 ms, 9.23 s total | |
[ 2023-10-08 01:11:41 ] Completed Epoch: 13 batch 573: backward pass 79.659 ms, 9.31 s total | |
[ 2023-10-08 01:11:41 ] Completed Epoch: 13 batch 573: computing loss 115.833 ms, 9.42 s total | |
EPOCH: [13], BATCH: [573/889], loss: 0.402, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.129, loss_objectness: 0.020, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 573 | |
[ 2023-10-08 01:11:42 ] Completed saving temp checkpoint 867.507 ms, 10.29 s total | |
[ 2023-10-08 01:11:42 ] Completed replacing temp checkpoint with checkpoint 61.220 ms, 10.35 s total | |
[ 2023-10-08 01:11:42 ] Completed Epoch: 13 batch 574: moving batch data to device 4.876 ms, 10.36 s total | |
[ 2023-10-08 01:11:42 ] Completed Epoch: 13 batch 574: forward pass 105.876 ms, 10.46 s total | |
[ 2023-10-08 01:11:42 ] Completed Epoch: 13 batch 574: backward pass 81.307 ms, 10.54 s total | |
[ 2023-10-08 01:11:42 ] Completed Epoch: 13 batch 574: computing loss 130.309 ms, 10.67 s total | |
EPOCH: [13], BATCH: [574/889], loss: 0.394, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.014, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 574 | |
[ 2023-10-08 01:11:43 ] Completed saving temp checkpoint 990.162 ms, 11.66 s total | |
[ 2023-10-08 01:11:43 ] Completed replacing temp checkpoint with checkpoint 69.417 ms, 11.73 s total | |
[ 2023-10-08 01:11:43 ] Completed Epoch: 13 batch 575: moving batch data to device 9.214 ms, 11.74 s total | |
[ 2023-10-08 01:11:44 ] Completed Epoch: 13 batch 575: forward pass 107.722 ms, 11.85 s total | |
[ 2023-10-08 01:11:44 ] Completed Epoch: 13 batch 575: backward pass 77.984 ms, 11.93 s total | |
[ 2023-10-08 01:11:44 ] Completed Epoch: 13 batch 575: computing loss 117.707 ms, 12.05 s total | |
EPOCH: [13], BATCH: [575/889], loss: 0.346, loss_box_reg: 0.102, loss_classifier: 0.090, loss_mask: 0.120, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 575 | |
[ 2023-10-08 01:11:45 ] Completed saving temp checkpoint 750.570 ms, 12.80 s total | |
[ 2023-10-08 01:11:45 ] Completed replacing temp checkpoint with checkpoint 58.775 ms, 12.86 s total | |
[ 2023-10-08 01:11:45 ] Completed Epoch: 13 batch 576: moving batch data to device 3.474 ms, 12.86 s total | |
[ 2023-10-08 01:11:45 ] Completed Epoch: 13 batch 576: forward pass 105.797 ms, 12.96 s total | |
[ 2023-10-08 01:11:45 ] Completed Epoch: 13 batch 576: backward pass 48.613 ms, 13.01 s total | |
[ 2023-10-08 01:11:45 ] Completed Epoch: 13 batch 576: computing loss 148.538 ms, 13.16 s total | |
EPOCH: [13], BATCH: [576/889], loss: 0.419, loss_box_reg: 0.128, loss_classifier: 0.109, loss_mask: 0.141, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 576 | |
[ 2023-10-08 01:11:46 ] Completed saving temp checkpoint 1,010.494 ms, 14.17 s total | |
[ 2023-10-08 01:11:46 ] Completed replacing temp checkpoint with checkpoint 26.290 ms, 14.20 s total | |
[ 2023-10-08 01:11:46 ] Completed Epoch: 13 batch 577: moving batch data to device 5.020 ms, 14.20 s total | |
[ 2023-10-08 01:11:46 ] Completed Epoch: 13 batch 577: forward pass 104.438 ms, 14.31 s total | |
[ 2023-10-08 01:11:46 ] Completed Epoch: 13 batch 577: backward pass 32.365 ms, 14.34 s total | |
[ 2023-10-08 01:11:46 ] Completed Epoch: 13 batch 577: computing loss 177.551 ms, 14.52 s total | |
EPOCH: [13], BATCH: [577/889], loss: 0.380, loss_box_reg: 0.112, loss_classifier: 0.093, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 577 | |
[ 2023-10-08 01:11:47 ] Completed saving temp checkpoint 766.758 ms, 15.28 s total | |
[ 2023-10-08 01:11:47 ] Completed replacing temp checkpoint with checkpoint 67.515 ms, 15.35 s total | |
[ 2023-10-08 01:11:47 ] Completed Epoch: 13 batch 578: moving batch data to device 9.923 ms, 15.36 s total | |
[ 2023-10-08 01:11:47 ] Completed Epoch: 13 batch 578: forward pass 107.061 ms, 15.47 s total | |
[ 2023-10-08 01:11:47 ] Completed Epoch: 13 batch 578: backward pass 80.133 ms, 15.55 s total | |
[ 2023-10-08 01:11:47 ] Completed Epoch: 13 batch 578: computing loss 112.669 ms, 15.66 s total | |
EPOCH: [13], BATCH: [578/889], loss: 0.394, loss_box_reg: 0.121, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 578 | |
[ 2023-10-08 01:11:49 ] Completed saving temp checkpoint 1,131.025 ms, 16.79 s total | |
[ 2023-10-08 01:11:49 ] Completed replacing temp checkpoint with checkpoint 78.804 ms, 16.87 s total | |
[ 2023-10-08 01:11:49 ] Completed Epoch: 13 batch 579: moving batch data to device 8.305 ms, 16.88 s total | |
[ 2023-10-08 01:11:49 ] Completed Epoch: 13 batch 579: forward pass 104.908 ms, 16.98 s total | |
[ 2023-10-08 01:11:49 ] Completed Epoch: 13 batch 579: backward pass 84.302 ms, 17.07 s total | |
[ 2023-10-08 01:11:49 ] Completed Epoch: 13 batch 579: computing loss 113.865 ms, 17.18 s total | |
EPOCH: [13], BATCH: [579/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.112, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 579 | |
[ 2023-10-08 01:11:51 ] Completed saving temp checkpoint 1,611.714 ms, 18.79 s total | |
[ 2023-10-08 01:11:51 ] Completed replacing temp checkpoint with checkpoint 90.917 ms, 18.89 s total | |
[ 2023-10-08 01:11:51 ] Completed Epoch: 13 batch 580: moving batch data to device 10.034 ms, 18.90 s total | |
[ 2023-10-08 01:11:51 ] Completed Epoch: 13 batch 580: forward pass 107.275 ms, 19.00 s total | |
[ 2023-10-08 01:11:51 ] Completed Epoch: 13 batch 580: backward pass 59.885 ms, 19.06 s total | |
[ 2023-10-08 01:11:51 ] Completed Epoch: 13 batch 580: computing loss 132.287 ms, 19.20 s total | |
EPOCH: [13], BATCH: [580/889], loss: 0.389, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 580 | |
[ 2023-10-08 01:11:53 ] Completed saving temp checkpoint 1,644.729 ms, 20.84 s total | |
[ 2023-10-08 01:11:53 ] Completed replacing temp checkpoint with checkpoint 93.452 ms, 20.93 s total | |
[ 2023-10-08 01:11:53 ] Completed Epoch: 13 batch 581: moving batch data to device 7.687 ms, 20.94 s total | |
[ 2023-10-08 01:11:53 ] Completed Epoch: 13 batch 581: forward pass 111.275 ms, 21.05 s total | |
[ 2023-10-08 01:11:53 ] Completed Epoch: 13 batch 581: backward pass 77.631 ms, 21.13 s total | |
[ 2023-10-08 01:11:53 ] Completed Epoch: 13 batch 581: computing loss 122.912 ms, 21.25 s total | |
EPOCH: [13], BATCH: [581/889], loss: 0.387, loss_box_reg: 0.114, loss_classifier: 0.101, loss_mask: 0.127, loss_objectness: 0.019, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 581 | |
[ 2023-10-08 01:11:55 ] Completed saving temp checkpoint 1,673.379 ms, 22.93 s total | |
[ 2023-10-08 01:11:55 ] Completed replacing temp checkpoint with checkpoint 94.777 ms, 23.02 s total | |
[ 2023-10-08 01:11:55 ] Completed Epoch: 13 batch 582: moving batch data to device 8.137 ms, 23.03 s total | |
[ 2023-10-08 01:11:55 ] Completed Epoch: 13 batch 582: forward pass 105.868 ms, 23.13 s total | |
[ 2023-10-08 01:11:55 ] Completed Epoch: 13 batch 582: backward pass 75.883 ms, 23.21 s total | |
[ 2023-10-08 01:11:55 ] Completed Epoch: 13 batch 582: computing loss 113.943 ms, 23.32 s total | |
EPOCH: [13], BATCH: [582/889], loss: 0.377, loss_box_reg: 0.115, loss_classifier: 0.095, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 582 | |
[ 2023-10-08 01:11:57 ] Completed saving temp checkpoint 1,728.639 ms, 25.05 s total | |
[ 2023-10-08 01:11:57 ] Completed replacing temp checkpoint with checkpoint 110.835 ms, 25.16 s total | |
[ 2023-10-08 01:11:57 ] Completed Epoch: 13 batch 583: moving batch data to device 6.112 ms, 25.17 s total | |
[ 2023-10-08 01:11:57 ] Completed Epoch: 13 batch 583: forward pass 102.162 ms, 25.27 s total | |
[ 2023-10-08 01:11:57 ] Completed Epoch: 13 batch 583: backward pass 45.952 ms, 25.32 s total | |
[ 2023-10-08 01:11:57 ] Completed Epoch: 13 batch 583: computing loss 127.159 ms, 25.45 s total | |
EPOCH: [13], BATCH: [583/889], loss: 0.389, loss_box_reg: 0.119, loss_classifier: 0.102, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 583 | |
[ 2023-10-08 01:11:58 ] Completed saving temp checkpoint 1,247.320 ms, 26.69 s total | |
[ 2023-10-08 01:11:58 ] Completed replacing temp checkpoint with checkpoint 77.278 ms, 26.77 s total | |
[ 2023-10-08 01:11:59 ] Completed Epoch: 13 batch 584: moving batch data to device 9.156 ms, 26.78 s total | |
[ 2023-10-08 01:11:59 ] Completed Epoch: 13 batch 584: forward pass 105.135 ms, 26.88 s total | |
[ 2023-10-08 01:11:59 ] Completed Epoch: 13 batch 584: backward pass 76.140 ms, 26.96 s total | |
[ 2023-10-08 01:11:59 ] Completed Epoch: 13 batch 584: computing loss 120.720 ms, 27.08 s total | |
EPOCH: [13], BATCH: [584/889], loss: 0.361, loss_box_reg: 0.111, loss_classifier: 0.089, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 584 | |
[ 2023-10-08 01:12:00 ] Completed saving temp checkpoint 1,316.857 ms, 28.40 s total | |
[ 2023-10-08 01:12:00 ] Completed replacing temp checkpoint with checkpoint 67.737 ms, 28.47 s total | |
[ 2023-10-08 01:12:00 ] Completed Epoch: 13 batch 585: moving batch data to device 9.579 ms, 28.48 s total | |
[ 2023-10-08 01:12:00 ] Completed Epoch: 13 batch 585: forward pass 107.895 ms, 28.58 s total | |
[ 2023-10-08 01:12:00 ] Completed Epoch: 13 batch 585: backward pass 78.504 ms, 28.66 s total | |
[ 2023-10-08 01:12:01 ] Completed Epoch: 13 batch 585: computing loss 113.883 ms, 28.78 s total | |
EPOCH: [13], BATCH: [585/889], loss: 0.357, loss_box_reg: 0.106, loss_classifier: 0.087, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 585 | |
[ 2023-10-08 01:12:02 ] Completed saving temp checkpoint 1,357.861 ms, 30.13 s total | |
[ 2023-10-08 01:12:02 ] Completed replacing temp checkpoint with checkpoint 90.047 ms, 30.22 s total | |
[ 2023-10-08 01:12:02 ] Completed Epoch: 13 batch 586: moving batch data to device 10.242 ms, 30.23 s total | |
[ 2023-10-08 01:12:02 ] Completed Epoch: 13 batch 586: forward pass 110.906 ms, 30.34 s total | |
[ 2023-10-08 01:12:02 ] Completed Epoch: 13 batch 586: backward pass 76.100 ms, 30.42 s total | |
[ 2023-10-08 01:12:02 ] Completed Epoch: 13 batch 586: computing loss 120.519 ms, 30.54 s total | |
EPOCH: [13], BATCH: [586/889], loss: 0.417, loss_box_reg: 0.125, loss_classifier: 0.106, loss_mask: 0.139, loss_objectness: 0.019, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 586 | |
[ 2023-10-08 01:12:04 ] Completed saving temp checkpoint 1,891.706 ms, 32.43 s total | |
[ 2023-10-08 01:12:04 ] Completed replacing temp checkpoint with checkpoint 81.538 ms, 32.51 s total | |
[ 2023-10-08 01:12:04 ] Completed Epoch: 13 batch 587: moving batch data to device 8.589 ms, 32.52 s total | |
[ 2023-10-08 01:12:04 ] Completed Epoch: 13 batch 587: forward pass 110.865 ms, 32.63 s total | |
[ 2023-10-08 01:12:04 ] Completed Epoch: 13 batch 587: backward pass 37.667 ms, 32.67 s total | |
[ 2023-10-08 01:12:05 ] Completed Epoch: 13 batch 587: computing loss 152.509 ms, 32.82 s total | |
EPOCH: [13], BATCH: [587/889], loss: 0.374, loss_box_reg: 0.105, loss_classifier: 0.090, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 587 | |
[ 2023-10-08 01:12:06 ] Completed saving temp checkpoint 1,229.002 ms, 34.05 s total | |
[ 2023-10-08 01:12:06 ] Completed replacing temp checkpoint with checkpoint 84.910 ms, 34.14 s total | |
[ 2023-10-08 01:12:06 ] Completed Epoch: 13 batch 588: moving batch data to device 8.266 ms, 34.15 s total | |
[ 2023-10-08 01:12:06 ] Completed Epoch: 13 batch 588: forward pass 109.315 ms, 34.26 s total | |
[ 2023-10-08 01:12:06 ] Completed Epoch: 13 batch 588: backward pass 91.337 ms, 34.35 s total | |
[ 2023-10-08 01:12:06 ] Completed Epoch: 13 batch 588: computing loss 109.022 ms, 34.46 s total | |
EPOCH: [13], BATCH: [588/889], loss: 0.357, loss_box_reg: 0.102, loss_classifier: 0.090, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 588 | |
[ 2023-10-08 01:12:08 ] Completed saving temp checkpoint 1,324.829 ms, 35.78 s total | |
[ 2023-10-08 01:12:08 ] Completed replacing temp checkpoint with checkpoint 82.453 ms, 35.86 s total | |
[ 2023-10-08 01:12:08 ] Completed Epoch: 13 batch 589: moving batch data to device 8.088 ms, 35.87 s total | |
[ 2023-10-08 01:12:08 ] Completed Epoch: 13 batch 589: forward pass 103.242 ms, 35.97 s total | |
[ 2023-10-08 01:12:08 ] Completed Epoch: 13 batch 589: backward pass 75.214 ms, 36.05 s total | |
[ 2023-10-08 01:12:08 ] Completed Epoch: 13 batch 589: computing loss 121.111 ms, 36.17 s total | |
EPOCH: [13], BATCH: [589/889], loss: 0.359, loss_box_reg: 0.103, loss_classifier: 0.089, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 589 | |
[ 2023-10-08 01:12:09 ] Completed saving temp checkpoint 1,186.674 ms, 37.36 s total | |
[ 2023-10-08 01:12:09 ] Completed replacing temp checkpoint with checkpoint 59.718 ms, 37.42 s total | |
[ 2023-10-08 01:12:09 ] Completed Epoch: 13 batch 590: moving batch data to device 7.829 ms, 37.43 s total | |
[ 2023-10-08 01:12:09 ] Completed Epoch: 13 batch 590: forward pass 109.138 ms, 37.53 s total | |
[ 2023-10-08 01:12:09 ] Completed Epoch: 13 batch 590: backward pass 40.407 ms, 37.57 s total | |
[ 2023-10-08 01:12:09 ] Completed Epoch: 13 batch 590: computing loss 154.567 ms, 37.73 s total | |
EPOCH: [13], BATCH: [590/889], loss: 0.408, loss_box_reg: 0.122, loss_classifier: 0.107, loss_mask: 0.135, loss_objectness: 0.019, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 590 | |
[ 2023-10-08 01:12:11 ] Completed saving temp checkpoint 1,306.854 ms, 39.04 s total | |
[ 2023-10-08 01:12:11 ] Completed replacing temp checkpoint with checkpoint 87.718 ms, 39.12 s total | |
[ 2023-10-08 01:12:11 ] Completed Epoch: 13 batch 591: moving batch data to device 7.465 ms, 39.13 s total | |
[ 2023-10-08 01:12:11 ] Completed Epoch: 13 batch 591: forward pass 101.864 ms, 39.23 s total | |
[ 2023-10-08 01:12:11 ] Completed Epoch: 13 batch 591: backward pass 70.738 ms, 39.30 s total | |
[ 2023-10-08 01:12:11 ] Completed Epoch: 13 batch 591: computing loss 127.863 ms, 39.43 s total | |
EPOCH: [13], BATCH: [591/889], loss: 0.354, loss_box_reg: 0.106, loss_classifier: 0.088, loss_mask: 0.122, loss_objectness: 0.017, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 591 | |
[ 2023-10-08 01:12:12 ] Completed saving temp checkpoint 1,182.973 ms, 40.61 s total | |
[ 2023-10-08 01:12:12 ] Completed replacing temp checkpoint with checkpoint 73.187 ms, 40.69 s total | |
[ 2023-10-08 01:12:12 ] Completed Epoch: 13 batch 592: moving batch data to device 11.136 ms, 40.70 s total | |
[ 2023-10-08 01:12:13 ] Completed Epoch: 13 batch 592: forward pass 107.166 ms, 40.81 s total | |
[ 2023-10-08 01:12:13 ] Completed Epoch: 13 batch 592: backward pass 48.256 ms, 40.85 s total | |
[ 2023-10-08 01:12:13 ] Completed Epoch: 13 batch 592: computing loss 122.000 ms, 40.98 s total | |
EPOCH: [13], BATCH: [592/889], loss: 0.393, loss_box_reg: 0.115, loss_classifier: 0.094, loss_mask: 0.138, loss_objectness: 0.013, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 592 | |
[ 2023-10-08 01:12:14 ] Completed saving temp checkpoint 1,339.093 ms, 42.32 s total | |
[ 2023-10-08 01:12:14 ] Completed replacing temp checkpoint with checkpoint 63.247 ms, 42.38 s total | |
[ 2023-10-08 01:12:14 ] Completed Epoch: 13 batch 593: moving batch data to device 7.692 ms, 42.39 s total | |
[ 2023-10-08 01:12:14 ] Completed Epoch: 13 batch 593: forward pass 105.662 ms, 42.49 s total | |
[ 2023-10-08 01:12:14 ] Completed Epoch: 13 batch 593: backward pass 70.429 ms, 42.56 s total | |
[ 2023-10-08 01:12:14 ] Completed Epoch: 13 batch 593: computing loss 123.564 ms, 42.69 s total | |
EPOCH: [13], BATCH: [593/889], loss: 0.377, loss_box_reg: 0.111, loss_classifier: 0.096, loss_mask: 0.124, loss_objectness: 0.017, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 593 | |
[ 2023-10-08 01:12:16 ] Completed saving temp checkpoint 1,306.865 ms, 43.99 s total | |
[ 2023-10-08 01:12:16 ] Completed replacing temp checkpoint with checkpoint 88.008 ms, 44.08 s total | |
[ 2023-10-08 01:12:16 ] Completed Epoch: 13 batch 594: moving batch data to device 6.826 ms, 44.09 s total | |
[ 2023-10-08 01:12:16 ] Completed Epoch: 13 batch 594: forward pass 112.707 ms, 44.20 s total | |
[ 2023-10-08 01:12:16 ] Completed Epoch: 13 batch 594: backward pass 81.069 ms, 44.28 s total | |
[ 2023-10-08 01:12:16 ] Completed Epoch: 13 batch 594: computing loss 107.658 ms, 44.39 s total | |
EPOCH: [13], BATCH: [594/889], loss: 0.400, loss_box_reg: 0.118, loss_classifier: 0.101, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 594 | |
[ 2023-10-08 01:12:18 ] Completed saving temp checkpoint 1,589.597 ms, 45.98 s total | |
[ 2023-10-08 01:12:18 ] Completed replacing temp checkpoint with checkpoint 74.895 ms, 46.05 s total | |
[ 2023-10-08 01:12:18 ] Completed Epoch: 13 batch 595: moving batch data to device 8.955 ms, 46.06 s total | |
[ 2023-10-08 01:12:18 ] Completed Epoch: 13 batch 595: forward pass 102.294 ms, 46.17 s total | |
[ 2023-10-08 01:12:18 ] Completed Epoch: 13 batch 595: backward pass 53.456 ms, 46.22 s total | |
[ 2023-10-08 01:12:18 ] Completed Epoch: 13 batch 595: computing loss 136.591 ms, 46.36 s total | |
EPOCH: [13], BATCH: [595/889], loss: 0.378, loss_box_reg: 0.112, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 595 | |
[ 2023-10-08 01:12:20 ] Completed saving temp checkpoint 1,665.882 ms, 48.02 s total | |
[ 2023-10-08 01:12:20 ] Completed replacing temp checkpoint with checkpoint 94.410 ms, 48.12 s total | |
[ 2023-10-08 01:12:20 ] Completed Epoch: 13 batch 596: moving batch data to device 8.423 ms, 48.12 s total | |
[ 2023-10-08 01:12:20 ] Completed Epoch: 13 batch 596: forward pass 105.381 ms, 48.23 s total | |
[ 2023-10-08 01:12:20 ] Completed Epoch: 13 batch 596: backward pass 82.793 ms, 48.31 s total | |
[ 2023-10-08 01:12:20 ] Completed Epoch: 13 batch 596: computing loss 107.747 ms, 48.42 s total | |
EPOCH: [13], BATCH: [596/889], loss: 0.391, loss_box_reg: 0.116, loss_classifier: 0.100, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 596 | |
[ 2023-10-08 01:12:21 ] Completed saving temp checkpoint 1,299.089 ms, 49.72 s total | |
[ 2023-10-08 01:12:22 ] Completed replacing temp checkpoint with checkpoint 58.940 ms, 49.78 s total | |
[ 2023-10-08 01:12:22 ] Completed Epoch: 13 batch 597: moving batch data to device 5.034 ms, 49.78 s total | |
[ 2023-10-08 01:12:22 ] Completed Epoch: 13 batch 597: forward pass 101.933 ms, 49.88 s total | |
[ 2023-10-08 01:12:22 ] Completed Epoch: 13 batch 597: backward pass 73.037 ms, 49.96 s total | |
[ 2023-10-08 01:12:22 ] Completed Epoch: 13 batch 597: computing loss 122.322 ms, 50.08 s total | |
EPOCH: [13], BATCH: [597/889], loss: 0.353, loss_box_reg: 0.106, loss_classifier: 0.083, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 597 | |
[ 2023-10-08 01:12:23 ] Completed saving temp checkpoint 1,098.803 ms, 51.18 s total | |
[ 2023-10-08 01:12:23 ] Completed replacing temp checkpoint with checkpoint 55.744 ms, 51.23 s total | |
[ 2023-10-08 01:12:23 ] Completed Epoch: 13 batch 598: moving batch data to device 6.926 ms, 51.24 s total | |
[ 2023-10-08 01:12:23 ] Completed Epoch: 13 batch 598: forward pass 105.390 ms, 51.35 s total | |
[ 2023-10-08 01:12:23 ] Completed Epoch: 13 batch 598: backward pass 70.342 ms, 51.42 s total | |
[ 2023-10-08 01:12:23 ] Completed Epoch: 13 batch 598: computing loss 123.955 ms, 51.54 s total | |
EPOCH: [13], BATCH: [598/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 598 | |
[ 2023-10-08 01:12:25 ] Completed saving temp checkpoint 1,438.667 ms, 52.98 s total | |
[ 2023-10-08 01:12:25 ] Completed replacing temp checkpoint with checkpoint 73.674 ms, 53.05 s total | |
[ 2023-10-08 01:12:25 ] Completed Epoch: 13 batch 599: moving batch data to device 5.434 ms, 53.06 s total | |
[ 2023-10-08 01:12:25 ] Completed Epoch: 13 batch 599: forward pass 109.976 ms, 53.17 s total | |
[ 2023-10-08 01:12:25 ] Completed Epoch: 13 batch 599: backward pass 38.510 ms, 53.21 s total | |
[ 2023-10-08 01:12:25 ] Completed Epoch: 13 batch 599: computing loss 156.151 ms, 53.36 s total | |
EPOCH: [13], BATCH: [599/889], loss: 0.394, loss_box_reg: 0.120, loss_classifier: 0.096, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 599 | |
[ 2023-10-08 01:12:26 ] Completed saving temp checkpoint 1,381.586 ms, 54.75 s total | |
[ 2023-10-08 01:12:27 ] Completed replacing temp checkpoint with checkpoint 82.685 ms, 54.83 s total | |
[ 2023-10-08 01:12:27 ] Completed Epoch: 13 batch 600: moving batch data to device 7.578 ms, 54.84 s total | |
[ 2023-10-08 01:12:27 ] Completed Epoch: 13 batch 600: forward pass 107.572 ms, 54.94 s total | |
[ 2023-10-08 01:12:27 ] Completed Epoch: 13 batch 600: backward pass 32.713 ms, 54.98 s total | |
[ 2023-10-08 01:12:27 ] Completed Epoch: 13 batch 600: computing loss 161.054 ms, 55.14 s total | |
EPOCH: [13], BATCH: [600/889], loss: 0.376, loss_box_reg: 0.114, loss_classifier: 0.090, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 600 | |
[ 2023-10-08 01:12:28 ] Completed saving temp checkpoint 1,225.152 ms, 56.36 s total | |
[ 2023-10-08 01:12:28 ] Completed replacing temp checkpoint with checkpoint 52.844 ms, 56.41 s total | |
[ 2023-10-08 01:12:28 ] Completed Epoch: 13 batch 601: moving batch data to device 5.269 ms, 56.42 s total | |
[ 2023-10-08 01:12:28 ] Completed Epoch: 13 batch 601: forward pass 105.407 ms, 56.53 s total | |
[ 2023-10-08 01:12:28 ] Completed Epoch: 13 batch 601: backward pass 71.415 ms, 56.60 s total | |
[ 2023-10-08 01:12:28 ] Completed Epoch: 13 batch 601: computing loss 119.052 ms, 56.72 s total | |
EPOCH: [13], BATCH: [601/889], loss: 0.390, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 601 | |
[ 2023-10-08 01:12:29 ] Completed saving temp checkpoint 1,030.369 ms, 57.75 s total | |
[ 2023-10-08 01:12:30 ] Completed replacing temp checkpoint with checkpoint 57.513 ms, 57.80 s total | |
[ 2023-10-08 01:12:30 ] Completed Epoch: 13 batch 602: moving batch data to device 5.043 ms, 57.81 s total | |
[ 2023-10-08 01:12:30 ] Completed Epoch: 13 batch 602: forward pass 104.305 ms, 57.91 s total | |
[ 2023-10-08 01:12:30 ] Completed Epoch: 13 batch 602: backward pass 36.314 ms, 57.95 s total | |
[ 2023-10-08 01:12:30 ] Completed Epoch: 13 batch 602: computing loss 158.053 ms, 58.11 s total | |
EPOCH: [13], BATCH: [602/889], loss: 0.389, loss_box_reg: 0.126, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 602 | |
[ 2023-10-08 01:12:31 ] Completed saving temp checkpoint 1,102.465 ms, 59.21 s total | |
[ 2023-10-08 01:12:31 ] Completed replacing temp checkpoint with checkpoint 47.501 ms, 59.26 s total | |
[ 2023-10-08 01:12:31 ] Completed Epoch: 13 batch 603: moving batch data to device 10.306 ms, 59.27 s total | |
[ 2023-10-08 01:12:31 ] Completed Epoch: 13 batch 603: forward pass 99.718 ms, 59.37 s total | |
[ 2023-10-08 01:12:31 ] Completed Epoch: 13 batch 603: backward pass 57.775 ms, 59.43 s total | |
[ 2023-10-08 01:12:31 ] Completed Epoch: 13 batch 603: computing loss 137.764 ms, 59.56 s total | |
EPOCH: [13], BATCH: [603/889], loss: 0.384, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 603 | |
[ 2023-10-08 01:12:32 ] Completed saving temp checkpoint 1,024.033 ms, 60.59 s total | |
[ 2023-10-08 01:12:32 ] Completed replacing temp checkpoint with checkpoint 78.165 ms, 60.67 s total | |
[ 2023-10-08 01:12:32 ] Completed Epoch: 13 batch 604: moving batch data to device 6.676 ms, 60.67 s total | |
[ 2023-10-08 01:12:33 ] Completed Epoch: 13 batch 604: forward pass 104.553 ms, 60.78 s total | |
[ 2023-10-08 01:12:33 ] Completed Epoch: 13 batch 604: backward pass 46.072 ms, 60.82 s total | |
[ 2023-10-08 01:12:33 ] Completed Epoch: 13 batch 604: computing loss 144.366 ms, 60.97 s total | |
EPOCH: [13], BATCH: [604/889], loss: 0.412, loss_box_reg: 0.124, loss_classifier: 0.106, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 604 | |
[ 2023-10-08 01:12:34 ] Completed saving temp checkpoint 1,196.065 ms, 62.16 s total | |
[ 2023-10-08 01:12:34 ] Completed replacing temp checkpoint with checkpoint 88.165 ms, 62.25 s total | |
[ 2023-10-08 01:12:34 ] Completed Epoch: 13 batch 605: moving batch data to device 6.524 ms, 62.26 s total | |
[ 2023-10-08 01:12:34 ] Completed Epoch: 13 batch 605: forward pass 108.501 ms, 62.37 s total | |
[ 2023-10-08 01:12:34 ] Completed Epoch: 13 batch 605: backward pass 77.017 ms, 62.44 s total | |
[ 2023-10-08 01:12:34 ] Completed Epoch: 13 batch 605: computing loss 115.618 ms, 62.56 s total | |
EPOCH: [13], BATCH: [605/889], loss: 0.369, loss_box_reg: 0.107, loss_classifier: 0.093, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 605 | |
[ 2023-10-08 01:12:35 ] Completed saving temp checkpoint 1,067.041 ms, 63.63 s total | |
[ 2023-10-08 01:12:35 ] Completed replacing temp checkpoint with checkpoint 73.829 ms, 63.70 s total | |
[ 2023-10-08 01:12:35 ] Completed Epoch: 13 batch 606: moving batch data to device 7.558 ms, 63.71 s total | |
[ 2023-10-08 01:12:36 ] Completed Epoch: 13 batch 606: forward pass 111.592 ms, 63.82 s total | |
[ 2023-10-08 01:12:36 ] Completed Epoch: 13 batch 606: backward pass 73.864 ms, 63.89 s total | |
[ 2023-10-08 01:12:36 ] Completed Epoch: 13 batch 606: computing loss 114.185 ms, 64.01 s total | |
EPOCH: [13], BATCH: [606/889], loss: 0.413, loss_box_reg: 0.124, loss_classifier: 0.107, loss_mask: 0.131, loss_objectness: 0.019, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 606 | |
[ 2023-10-08 01:12:37 ] Completed saving temp checkpoint 1,599.594 ms, 65.61 s total | |
[ 2023-10-08 01:12:37 ] Completed replacing temp checkpoint with checkpoint 76.106 ms, 65.68 s total | |
[ 2023-10-08 01:12:37 ] Completed Epoch: 13 batch 607: moving batch data to device 6.000 ms, 65.69 s total | |
[ 2023-10-08 01:12:38 ] Completed Epoch: 13 batch 607: forward pass 110.164 ms, 65.80 s total | |
[ 2023-10-08 01:12:38 ] Completed Epoch: 13 batch 607: backward pass 45.679 ms, 65.84 s total | |
[ 2023-10-08 01:12:38 ] Completed Epoch: 13 batch 607: computing loss 152.217 ms, 66.00 s total | |
EPOCH: [13], BATCH: [607/889], loss: 0.422, loss_box_reg: 0.131, loss_classifier: 0.108, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 607 | |
[ 2023-10-08 01:12:39 ] Completed saving temp checkpoint 1,673.684 ms, 67.67 s total | |
[ 2023-10-08 01:12:39 ] Completed replacing temp checkpoint with checkpoint 57.685 ms, 67.73 s total | |
[ 2023-10-08 01:12:39 ] Completed Epoch: 13 batch 608: moving batch data to device 4.807 ms, 67.73 s total | |
[ 2023-10-08 01:12:40 ] Completed Epoch: 13 batch 608: forward pass 98.680 ms, 67.83 s total | |
[ 2023-10-08 01:12:40 ] Completed Epoch: 13 batch 608: backward pass 33.910 ms, 67.87 s total | |
[ 2023-10-08 01:12:40 ] Completed Epoch: 13 batch 608: computing loss 150.059 ms, 68.02 s total | |
EPOCH: [13], BATCH: [608/889], loss: 0.365, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 608 | |
[ 2023-10-08 01:12:41 ] Completed saving temp checkpoint 1,314.711 ms, 69.33 s total | |
[ 2023-10-08 01:12:41 ] Completed replacing temp checkpoint with checkpoint 53.455 ms, 69.38 s total | |
[ 2023-10-08 01:12:41 ] Completed Epoch: 13 batch 609: moving batch data to device 5.362 ms, 69.39 s total | |
[ 2023-10-08 01:12:41 ] Completed Epoch: 13 batch 609: forward pass 103.191 ms, 69.49 s total | |
[ 2023-10-08 01:12:41 ] Completed Epoch: 13 batch 609: backward pass 50.375 ms, 69.54 s total | |
[ 2023-10-08 01:12:41 ] Completed Epoch: 13 batch 609: computing loss 121.235 ms, 69.66 s total | |
EPOCH: [13], BATCH: [609/889], loss: 0.360, loss_box_reg: 0.111, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 609 | |
[ 2023-10-08 01:12:42 ] Completed saving temp checkpoint 1,054.073 ms, 70.72 s total | |
[ 2023-10-08 01:12:43 ] Completed replacing temp checkpoint with checkpoint 65.187 ms, 70.78 s total | |
[ 2023-10-08 01:12:43 ] Completed Epoch: 13 batch 610: moving batch data to device 5.177 ms, 70.79 s total | |
[ 2023-10-08 01:12:43 ] Completed Epoch: 13 batch 610: forward pass 112.462 ms, 70.90 s total | |
[ 2023-10-08 01:12:43 ] Completed Epoch: 13 batch 610: backward pass 76.176 ms, 70.98 s total | |
[ 2023-10-08 01:12:43 ] Completed Epoch: 13 batch 610: computing loss 117.511 ms, 71.09 s total | |
EPOCH: [13], BATCH: [610/889], loss: 0.373, loss_box_reg: 0.111, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 610 | |
[ 2023-10-08 01:12:44 ] Completed saving temp checkpoint 1,277.562 ms, 72.37 s total | |
[ 2023-10-08 01:12:44 ] Completed replacing temp checkpoint with checkpoint 56.640 ms, 72.43 s total | |
[ 2023-10-08 01:12:44 ] Completed Epoch: 13 batch 611: moving batch data to device 7.401 ms, 72.44 s total | |
[ 2023-10-08 01:12:44 ] Completed Epoch: 13 batch 611: forward pass 104.191 ms, 72.54 s total | |
[ 2023-10-08 01:12:44 ] Completed Epoch: 13 batch 611: backward pass 81.124 ms, 72.62 s total | |
[ 2023-10-08 01:12:44 ] Completed Epoch: 13 batch 611: computing loss 115.952 ms, 72.74 s total | |
EPOCH: [13], BATCH: [611/889], loss: 0.403, loss_box_reg: 0.123, loss_classifier: 0.102, loss_mask: 0.138, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 611 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 01:25:59 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 01:25:59 ] Completed importing Timer 0.022 ms, 0.00 s total | |
[ 2023-10-08 01:25:59 ] Completed importing everything else 509.578 ms, 0.51 s total | |
[ 2023-10-08 01:25:59 ] Completed defined other functions 0.023 ms, 0.51 s total | |
| distributed init (rank 1): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 4): env://| distributed init (rank 5): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 01:26:02 ] Completed main preliminaries 3,005.805 ms, 3.52 s total | |
loading annotations into memory... | |
Done (t=11.26s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-08 01:26:15 ] Completed loading data 13,090.801 ms, 16.61 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 01:26:15 ] Completed creating data samplers 109.291 ms, 16.72 s total | |
[ 2023-10-08 01:26:15 ] Completed creating data loaders 0.221 ms, 16.72 s total | |
[ 2023-10-08 01:26:16 ] Completed creating model and .to(device) 652.185 ms, 17.37 s total | |
[ 2023-10-08 01:26:17 ] Completed preparing model for distributed training 1,183.692 ms, 18.55 s total | |
[ 2023-10-08 01:26:17 ] Completed optimizer and scaler 0.556 ms, 18.55 s total | |
[ 2023-10-08 01:26:17 ] Completed learning rate schedulers 0.219 ms, 18.55 s total | |
[ 2023-10-08 01:26:18 ] Completed init coco evaluator 956.387 ms, 19.51 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 01:26:19 ] Completed retrieving checkpoint 897.609 ms, 20.41 s total | |
EPOCH :: 13 | |
[ 2023-10-08 01:26:19 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 01:26:19 ] Completed training preliminaries 1.134 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 611 | |
[ 2023-10-08 01:26:20 ] Completed Epoch: 13 batch 611: moving batch data to device 497.458 ms, 0.50 s total | |
[ 2023-10-08 01:26:21 ] Completed Epoch: 13 batch 611: forward pass 1,049.936 ms, 1.55 s total | |
[ 2023-10-08 01:26:21 ] Completed Epoch: 13 batch 611: backward pass 181.788 ms, 1.73 s total | |
[ 2023-10-08 01:26:21 ] Completed Epoch: 13 batch 611: computing loss 521.949 ms, 2.25 s total | |
EPOCH: [13], BATCH: [611/889], loss: 0.401, loss_box_reg: 0.122, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 611 | |
[ 2023-10-08 01:26:22 ] Completed saving temp checkpoint 1,040.024 ms, 3.29 s total | |
[ 2023-10-08 01:26:23 ] Completed replacing temp checkpoint with checkpoint 185.870 ms, 3.48 s total | |
[ 2023-10-08 01:26:23 ] Completed Epoch: 13 batch 612: moving batch data to device 7.712 ms, 3.49 s total | |
[ 2023-10-08 01:26:23 ] Completed Epoch: 13 batch 612: forward pass 115.920 ms, 3.60 s total | |
[ 2023-10-08 01:26:23 ] Completed Epoch: 13 batch 612: backward pass 99.098 ms, 3.70 s total | |
[ 2023-10-08 01:26:23 ] Completed Epoch: 13 batch 612: computing loss 125.932 ms, 3.83 s total | |
EPOCH: [13], BATCH: [612/889], loss: 0.409, loss_box_reg: 0.124, loss_classifier: 0.104, loss_mask: 0.138, loss_objectness: 0.014, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 612 | |
[ 2023-10-08 01:26:24 ] Completed saving temp checkpoint 926.548 ms, 4.75 s total | |
[ 2023-10-08 01:26:24 ] Completed replacing temp checkpoint with checkpoint 45.651 ms, 4.80 s total | |
[ 2023-10-08 01:26:24 ] Completed Epoch: 13 batch 613: moving batch data to device 9.551 ms, 4.81 s total | |
[ 2023-10-08 01:26:24 ] Completed Epoch: 13 batch 613: forward pass 112.936 ms, 4.92 s total | |
[ 2023-10-08 01:26:24 ] Completed Epoch: 13 batch 613: backward pass 125.017 ms, 5.05 s total | |
[ 2023-10-08 01:26:24 ] Completed Epoch: 13 batch 613: computing loss 91.651 ms, 5.14 s total | |
EPOCH: [13], BATCH: [613/889], loss: 0.408, loss_box_reg: 0.125, loss_classifier: 0.099, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 613 | |
[ 2023-10-08 01:26:26 ] Completed saving temp checkpoint 1,321.734 ms, 6.46 s total | |
[ 2023-10-08 01:26:26 ] Completed replacing temp checkpoint with checkpoint 95.027 ms, 6.55 s total | |
[ 2023-10-08 01:26:26 ] Completed Epoch: 13 batch 614: moving batch data to device 4.308 ms, 6.56 s total | |
[ 2023-10-08 01:26:26 ] Completed Epoch: 13 batch 614: forward pass 196.002 ms, 6.76 s total | |
[ 2023-10-08 01:26:26 ] Completed Epoch: 13 batch 614: backward pass 79.354 ms, 6.83 s total | |
[ 2023-10-08 01:26:26 ] Completed Epoch: 13 batch 614: computing loss 183.138 ms, 7.02 s total | |
EPOCH: [13], BATCH: [614/889], loss: 0.358, loss_box_reg: 0.107, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 614 | |
[ 2023-10-08 01:26:28 ] Completed saving temp checkpoint 1,595.555 ms, 8.61 s total | |
[ 2023-10-08 01:26:28 ] Completed replacing temp checkpoint with checkpoint 108.370 ms, 8.72 s total | |
[ 2023-10-08 01:26:28 ] Completed Epoch: 13 batch 615: moving batch data to device 64.252 ms, 8.79 s total | |
[ 2023-10-08 01:26:28 ] Completed Epoch: 13 batch 615: forward pass 110.577 ms, 8.90 s total | |
[ 2023-10-08 01:26:28 ] Completed Epoch: 13 batch 615: backward pass 82.601 ms, 8.98 s total | |
[ 2023-10-08 01:26:28 ] Completed Epoch: 13 batch 615: computing loss 134.310 ms, 9.11 s total | |
EPOCH: [13], BATCH: [615/889], loss: 0.392, loss_box_reg: 0.115, loss_classifier: 0.096, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 615 | |
[ 2023-10-08 01:26:30 ] Completed saving temp checkpoint 1,661.904 ms, 10.78 s total | |
[ 2023-10-08 01:26:30 ] Completed replacing temp checkpoint with checkpoint 52.645 ms, 10.83 s total | |
[ 2023-10-08 01:26:30 ] Completed Epoch: 13 batch 616: moving batch data to device 4.553 ms, 10.83 s total | |
[ 2023-10-08 01:26:30 ] Completed Epoch: 13 batch 616: forward pass 113.120 ms, 10.95 s total | |
[ 2023-10-08 01:26:30 ] Completed Epoch: 13 batch 616: backward pass 83.281 ms, 11.03 s total | |
[ 2023-10-08 01:26:30 ] Completed Epoch: 13 batch 616: computing loss 126.068 ms, 11.15 s total | |
EPOCH: [13], BATCH: [616/889], loss: 0.350, loss_box_reg: 0.101, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.012, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 616 | |
[ 2023-10-08 01:26:31 ] Completed saving temp checkpoint 1,155.799 ms, 12.31 s total | |
[ 2023-10-08 01:26:31 ] Completed replacing temp checkpoint with checkpoint 79.360 ms, 12.39 s total | |
[ 2023-10-08 01:26:31 ] Completed Epoch: 13 batch 617: moving batch data to device 5.408 ms, 12.40 s total | |
[ 2023-10-08 01:26:32 ] Completed Epoch: 13 batch 617: forward pass 109.354 ms, 12.50 s total | |
[ 2023-10-08 01:26:32 ] Completed Epoch: 13 batch 617: backward pass 76.533 ms, 12.58 s total | |
[ 2023-10-08 01:26:32 ] Completed Epoch: 13 batch 617: computing loss 117.623 ms, 12.70 s total | |
EPOCH: [13], BATCH: [617/889], loss: 0.365, loss_box_reg: 0.109, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 617 | |
[ 2023-10-08 01:26:33 ] Completed saving temp checkpoint 1,271.557 ms, 13.97 s total | |
[ 2023-10-08 01:26:33 ] Completed replacing temp checkpoint with checkpoint 72.455 ms, 14.04 s total | |
[ 2023-10-08 01:26:33 ] Completed Epoch: 13 batch 618: moving batch data to device 3.491 ms, 14.05 s total | |
[ 2023-10-08 01:26:33 ] Completed Epoch: 13 batch 618: forward pass 110.942 ms, 14.16 s total | |
[ 2023-10-08 01:26:33 ] Completed Epoch: 13 batch 618: backward pass 80.904 ms, 14.24 s total | |
[ 2023-10-08 01:26:33 ] Completed Epoch: 13 batch 618: computing loss 136.099 ms, 14.37 s total | |
EPOCH: [13], BATCH: [618/889], loss: 0.395, loss_box_reg: 0.121, loss_classifier: 0.099, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 618 | |
[ 2023-10-08 01:26:35 ] Completed saving temp checkpoint 1,150.054 ms, 15.52 s total | |
[ 2023-10-08 01:26:35 ] Completed replacing temp checkpoint with checkpoint 70.008 ms, 15.59 s total | |
[ 2023-10-08 01:26:35 ] Completed Epoch: 13 batch 619: moving batch data to device 4.430 ms, 15.60 s total | |
[ 2023-10-08 01:26:35 ] Completed Epoch: 13 batch 619: forward pass 100.687 ms, 15.70 s total | |
[ 2023-10-08 01:26:35 ] Completed Epoch: 13 batch 619: backward pass 78.697 ms, 15.78 s total | |
[ 2023-10-08 01:26:35 ] Completed Epoch: 13 batch 619: computing loss 110.857 ms, 15.89 s total | |
EPOCH: [13], BATCH: [619/889], loss: 0.386, loss_box_reg: 0.113, loss_classifier: 0.097, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 619 | |
[ 2023-10-08 01:26:36 ] Completed saving temp checkpoint 1,282.181 ms, 17.17 s total | |
[ 2023-10-08 01:26:36 ] Completed replacing temp checkpoint with checkpoint 57.692 ms, 17.23 s total | |
[ 2023-10-08 01:26:36 ] Completed Epoch: 13 batch 620: moving batch data to device 5.290 ms, 17.23 s total | |
[ 2023-10-08 01:26:36 ] Completed Epoch: 13 batch 620: forward pass 102.107 ms, 17.34 s total | |
[ 2023-10-08 01:26:36 ] Completed Epoch: 13 batch 620: backward pass 80.741 ms, 17.42 s total | |
[ 2023-10-08 01:26:37 ] Completed Epoch: 13 batch 620: computing loss 102.601 ms, 17.52 s total | |
EPOCH: [13], BATCH: [620/889], loss: 0.379, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 620 | |
[ 2023-10-08 01:26:38 ] Completed saving temp checkpoint 1,150.917 ms, 18.67 s total | |
[ 2023-10-08 01:26:38 ] Completed replacing temp checkpoint with checkpoint 51.878 ms, 18.72 s total | |
[ 2023-10-08 01:26:38 ] Completed Epoch: 13 batch 621: moving batch data to device 7.596 ms, 18.73 s total | |
[ 2023-10-08 01:26:38 ] Completed Epoch: 13 batch 621: forward pass 105.605 ms, 18.84 s total | |
[ 2023-10-08 01:26:38 ] Completed Epoch: 13 batch 621: backward pass 38.539 ms, 18.87 s total | |
[ 2023-10-08 01:26:38 ] Completed Epoch: 13 batch 621: computing loss 160.603 ms, 19.03 s total | |
EPOCH: [13], BATCH: [621/889], loss: 0.372, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 621 | |
[ 2023-10-08 01:26:39 ] Completed saving temp checkpoint 1,271.449 ms, 20.31 s total | |
[ 2023-10-08 01:26:39 ] Completed replacing temp checkpoint with checkpoint 66.628 ms, 20.37 s total | |
[ 2023-10-08 01:26:39 ] Completed Epoch: 13 batch 622: moving batch data to device 6.467 ms, 20.38 s total | |
[ 2023-10-08 01:26:40 ] Completed Epoch: 13 batch 622: forward pass 105.812 ms, 20.49 s total | |
[ 2023-10-08 01:26:40 ] Completed Epoch: 13 batch 622: backward pass 80.841 ms, 20.57 s total | |
[ 2023-10-08 01:26:40 ] Completed Epoch: 13 batch 622: computing loss 118.994 ms, 20.69 s total | |
EPOCH: [13], BATCH: [622/889], loss: 0.399, loss_box_reg: 0.122, loss_classifier: 0.098, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 622 | |
[ 2023-10-08 01:26:41 ] Completed saving temp checkpoint 1,686.178 ms, 22.37 s total | |
[ 2023-10-08 01:26:41 ] Completed replacing temp checkpoint with checkpoint 69.133 ms, 22.44 s total | |
[ 2023-10-08 01:26:41 ] Completed Epoch: 13 batch 623: moving batch data to device 5.748 ms, 22.45 s total | |
[ 2023-10-08 01:26:42 ] Completed Epoch: 13 batch 623: forward pass 105.687 ms, 22.55 s total | |
[ 2023-10-08 01:26:42 ] Completed Epoch: 13 batch 623: backward pass 40.227 ms, 22.59 s total | |
[ 2023-10-08 01:26:42 ] Completed Epoch: 13 batch 623: computing loss 323.023 ms, 22.92 s total | |
EPOCH: [13], BATCH: [623/889], loss: 0.373, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 623 | |
[ 2023-10-08 01:26:44 ] Completed saving temp checkpoint 1,631.117 ms, 24.55 s total | |
[ 2023-10-08 01:26:44 ] Completed replacing temp checkpoint with checkpoint 80.850 ms, 24.63 s total | |
[ 2023-10-08 01:26:44 ] Completed Epoch: 13 batch 624: moving batch data to device 4.774 ms, 24.63 s total | |
[ 2023-10-08 01:26:44 ] Completed Epoch: 13 batch 624: forward pass 101.878 ms, 24.73 s total | |
[ 2023-10-08 01:26:44 ] Completed Epoch: 13 batch 624: backward pass 81.224 ms, 24.82 s total | |
[ 2023-10-08 01:26:44 ] Completed Epoch: 13 batch 624: computing loss 112.844 ms, 24.93 s total | |
EPOCH: [13], BATCH: [624/889], loss: 0.394, loss_box_reg: 0.119, loss_classifier: 0.096, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 624 | |
[ 2023-10-08 01:26:45 ] Completed saving temp checkpoint 1,170.873 ms, 26.10 s total | |
[ 2023-10-08 01:26:45 ] Completed replacing temp checkpoint with checkpoint 73.072 ms, 26.17 s total | |
[ 2023-10-08 01:26:45 ] Completed Epoch: 13 batch 625: moving batch data to device 9.069 ms, 26.18 s total | |
[ 2023-10-08 01:26:45 ] Completed Epoch: 13 batch 625: forward pass 102.342 ms, 26.28 s total | |
[ 2023-10-08 01:26:45 ] Completed Epoch: 13 batch 625: backward pass 83.173 ms, 26.37 s total | |
[ 2023-10-08 01:26:46 ] Completed Epoch: 13 batch 625: computing loss 115.859 ms, 26.48 s total | |
EPOCH: [13], BATCH: [625/889], loss: 0.425, loss_box_reg: 0.134, loss_classifier: 0.107, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 625 | |
[ 2023-10-08 01:26:47 ] Completed saving temp checkpoint 1,303.777 ms, 27.79 s total | |
[ 2023-10-08 01:26:47 ] Completed replacing temp checkpoint with checkpoint 59.016 ms, 27.85 s total | |
[ 2023-10-08 01:26:47 ] Completed Epoch: 13 batch 626: moving batch data to device 7.311 ms, 27.85 s total | |
[ 2023-10-08 01:26:47 ] Completed Epoch: 13 batch 626: forward pass 106.887 ms, 27.96 s total | |
[ 2023-10-08 01:26:47 ] Completed Epoch: 13 batch 626: backward pass 76.280 ms, 28.04 s total | |
[ 2023-10-08 01:26:47 ] Completed Epoch: 13 batch 626: computing loss 124.348 ms, 28.16 s total | |
EPOCH: [13], BATCH: [626/889], loss: 0.414, loss_box_reg: 0.127, loss_classifier: 0.105, loss_mask: 0.137, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 626 | |
[ 2023-10-08 01:26:49 ] Completed saving temp checkpoint 1,338.932 ms, 29.50 s total | |
[ 2023-10-08 01:26:49 ] Completed replacing temp checkpoint with checkpoint 70.294 ms, 29.57 s total | |
[ 2023-10-08 01:26:49 ] Completed Epoch: 13 batch 627: moving batch data to device 4.636 ms, 29.57 s total | |
[ 2023-10-08 01:26:49 ] Completed Epoch: 13 batch 627: forward pass 106.219 ms, 29.68 s total | |
[ 2023-10-08 01:26:49 ] Completed Epoch: 13 batch 627: backward pass 78.222 ms, 29.76 s total | |
[ 2023-10-08 01:26:49 ] Completed Epoch: 13 batch 627: computing loss 116.318 ms, 29.87 s total | |
EPOCH: [13], BATCH: [627/889], loss: 0.388, loss_box_reg: 0.116, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 627 | |
[ 2023-10-08 01:26:51 ] Completed saving temp checkpoint 2,039.982 ms, 31.91 s total | |
[ 2023-10-08 01:26:51 ] Completed replacing temp checkpoint with checkpoint 89.102 ms, 32.00 s total | |
[ 2023-10-08 01:26:51 ] Completed Epoch: 13 batch 628: moving batch data to device 6.281 ms, 32.01 s total | |
[ 2023-10-08 01:26:51 ] Completed Epoch: 13 batch 628: forward pass 103.716 ms, 32.11 s total | |
[ 2023-10-08 01:26:51 ] Completed Epoch: 13 batch 628: backward pass 68.459 ms, 32.18 s total | |
[ 2023-10-08 01:26:51 ] Completed Epoch: 13 batch 628: computing loss 125.045 ms, 32.31 s total | |
EPOCH: [13], BATCH: [628/889], loss: 0.418, loss_box_reg: 0.127, loss_classifier: 0.109, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 628 | |
[ 2023-10-08 01:26:53 ] Completed saving temp checkpoint 1,203.892 ms, 33.51 s total | |
[ 2023-10-08 01:26:53 ] Completed replacing temp checkpoint with checkpoint 58.238 ms, 33.57 s total | |
[ 2023-10-08 01:26:53 ] Completed Epoch: 13 batch 629: moving batch data to device 5.605 ms, 33.57 s total | |
[ 2023-10-08 01:26:53 ] Completed Epoch: 13 batch 629: forward pass 103.654 ms, 33.68 s total | |
[ 2023-10-08 01:26:53 ] Completed Epoch: 13 batch 629: backward pass 73.198 ms, 33.75 s total | |
[ 2023-10-08 01:26:53 ] Completed Epoch: 13 batch 629: computing loss 123.245 ms, 33.87 s total | |
EPOCH: [13], BATCH: [629/889], loss: 0.381, loss_box_reg: 0.115, loss_classifier: 0.094, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 629 | |
[ 2023-10-08 01:26:54 ] Completed saving temp checkpoint 1,206.513 ms, 35.08 s total | |
[ 2023-10-08 01:26:54 ] Completed replacing temp checkpoint with checkpoint 31.893 ms, 35.11 s total | |
[ 2023-10-08 01:26:54 ] Completed Epoch: 13 batch 630: moving batch data to device 5.955 ms, 35.12 s total | |
[ 2023-10-08 01:26:54 ] Completed Epoch: 13 batch 630: forward pass 107.414 ms, 35.23 s total | |
[ 2023-10-08 01:26:54 ] Completed Epoch: 13 batch 630: backward pass 74.254 ms, 35.30 s total | |
[ 2023-10-08 01:26:54 ] Completed Epoch: 13 batch 630: computing loss 128.171 ms, 35.43 s total | |
EPOCH: [13], BATCH: [630/889], loss: 0.417, loss_box_reg: 0.132, loss_classifier: 0.105, loss_mask: 0.135, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 630 | |
[ 2023-10-08 01:26:56 ] Completed saving temp checkpoint 1,114.046 ms, 36.54 s total | |
[ 2023-10-08 01:26:56 ] Completed replacing temp checkpoint with checkpoint 56.312 ms, 36.60 s total | |
[ 2023-10-08 01:26:56 ] Completed Epoch: 13 batch 631: moving batch data to device 5.980 ms, 36.61 s total | |
[ 2023-10-08 01:26:56 ] Completed Epoch: 13 batch 631: forward pass 99.688 ms, 36.71 s total | |
[ 2023-10-08 01:26:56 ] Completed Epoch: 13 batch 631: backward pass 75.215 ms, 36.78 s total | |
[ 2023-10-08 01:26:56 ] Completed Epoch: 13 batch 631: computing loss 96.900 ms, 36.88 s total | |
EPOCH: [13], BATCH: [631/889], loss: 0.409, loss_box_reg: 0.121, loss_classifier: 0.107, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 631 | |
[ 2023-10-08 01:26:57 ] Completed saving temp checkpoint 1,219.583 ms, 38.10 s total | |
[ 2023-10-08 01:26:57 ] Completed replacing temp checkpoint with checkpoint 73.627 ms, 38.17 s total | |
[ 2023-10-08 01:26:57 ] Completed Epoch: 13 batch 632: moving batch data to device 8.713 ms, 38.18 s total | |
[ 2023-10-08 01:26:57 ] Completed Epoch: 13 batch 632: forward pass 101.811 ms, 38.28 s total | |
[ 2023-10-08 01:26:57 ] Completed Epoch: 13 batch 632: backward pass 55.110 ms, 38.34 s total | |
[ 2023-10-08 01:26:58 ] Completed Epoch: 13 batch 632: computing loss 143.280 ms, 38.48 s total | |
EPOCH: [13], BATCH: [632/889], loss: 0.373, loss_box_reg: 0.109, loss_classifier: 0.093, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 632 | |
[ 2023-10-08 01:26:59 ] Completed saving temp checkpoint 1,032.813 ms, 39.51 s total | |
[ 2023-10-08 01:26:59 ] Completed replacing temp checkpoint with checkpoint 73.521 ms, 39.59 s total | |
[ 2023-10-08 01:26:59 ] Completed Epoch: 13 batch 633: moving batch data to device 7.703 ms, 39.59 s total | |
[ 2023-10-08 01:26:59 ] Completed Epoch: 13 batch 633: forward pass 105.863 ms, 39.70 s total | |
[ 2023-10-08 01:26:59 ] Completed Epoch: 13 batch 633: backward pass 74.569 ms, 39.77 s total | |
[ 2023-10-08 01:26:59 ] Completed Epoch: 13 batch 633: computing loss 119.626 ms, 39.89 s total | |
EPOCH: [13], BATCH: [633/889], loss: 0.398, loss_box_reg: 0.117, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 633 | |
[ 2023-10-08 01:27:00 ] Completed saving temp checkpoint 1,122.214 ms, 41.02 s total | |
[ 2023-10-08 01:27:00 ] Completed replacing temp checkpoint with checkpoint 65.341 ms, 41.08 s total | |
[ 2023-10-08 01:27:00 ] Completed Epoch: 13 batch 634: moving batch data to device 5.258 ms, 41.09 s total | |
[ 2023-10-08 01:27:00 ] Completed Epoch: 13 batch 634: forward pass 106.438 ms, 41.19 s total | |
[ 2023-10-08 01:27:00 ] Completed Epoch: 13 batch 634: backward pass 78.260 ms, 41.27 s total | |
[ 2023-10-08 01:27:00 ] Completed Epoch: 13 batch 634: computing loss 111.216 ms, 41.38 s total | |
EPOCH: [13], BATCH: [634/889], loss: 0.356, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.127, loss_objectness: 0.013, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 634 | |
[ 2023-10-08 01:27:02 ] Completed saving temp checkpoint 1,449.220 ms, 42.83 s total | |
[ 2023-10-08 01:27:02 ] Completed replacing temp checkpoint with checkpoint 58.707 ms, 42.89 s total | |
[ 2023-10-08 01:27:02 ] Completed Epoch: 13 batch 635: moving batch data to device 7.167 ms, 42.90 s total | |
[ 2023-10-08 01:27:02 ] Completed Epoch: 13 batch 635: forward pass 108.423 ms, 43.01 s total | |
[ 2023-10-08 01:27:02 ] Completed Epoch: 13 batch 635: backward pass 70.360 ms, 43.08 s total | |
[ 2023-10-08 01:27:02 ] Completed Epoch: 13 batch 635: computing loss 123.413 ms, 43.20 s total | |
EPOCH: [13], BATCH: [635/889], loss: 0.375, loss_box_reg: 0.117, loss_classifier: 0.092, loss_mask: 0.133, loss_objectness: 0.014, loss_rpn_box_reg: 0.019 | |
Saving checkpoint at epoch 13 train batch 635 | |
[ 2023-10-08 01:27:04 ] Completed saving temp checkpoint 1,978.731 ms, 45.18 s total | |
[ 2023-10-08 01:27:04 ] Completed replacing temp checkpoint with checkpoint 47.950 ms, 45.23 s total | |
[ 2023-10-08 01:27:04 ] Completed Epoch: 13 batch 636: moving batch data to device 6.907 ms, 45.23 s total | |
[ 2023-10-08 01:27:04 ] Completed Epoch: 13 batch 636: forward pass 107.836 ms, 45.34 s total | |
[ 2023-10-08 01:27:04 ] Completed Epoch: 13 batch 636: backward pass 79.521 ms, 45.42 s total | |
[ 2023-10-08 01:27:05 ] Completed Epoch: 13 batch 636: computing loss 119.533 ms, 45.54 s total | |
EPOCH: [13], BATCH: [636/889], loss: 0.394, loss_box_reg: 0.121, loss_classifier: 0.098, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 636 | |
[ 2023-10-08 01:27:06 ] Completed saving temp checkpoint 1,089.308 ms, 46.63 s total | |
[ 2023-10-08 01:27:06 ] Completed replacing temp checkpoint with checkpoint 60.625 ms, 46.69 s total | |
[ 2023-10-08 01:27:06 ] Completed Epoch: 13 batch 637: moving batch data to device 5.020 ms, 46.69 s total | |
[ 2023-10-08 01:27:06 ] Completed Epoch: 13 batch 637: forward pass 105.400 ms, 46.80 s total | |
[ 2023-10-08 01:27:06 ] Completed Epoch: 13 batch 637: backward pass 81.529 ms, 46.88 s total | |
[ 2023-10-08 01:27:06 ] Completed Epoch: 13 batch 637: computing loss 114.036 ms, 47.00 s total | |
EPOCH: [13], BATCH: [637/889], loss: 0.372, loss_box_reg: 0.114, loss_classifier: 0.093, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 637 | |
[ 2023-10-08 01:27:07 ] Completed saving temp checkpoint 1,155.159 ms, 48.15 s total | |
[ 2023-10-08 01:27:07 ] Completed replacing temp checkpoint with checkpoint 59.118 ms, 48.21 s total | |
[ 2023-10-08 01:27:07 ] Completed Epoch: 13 batch 638: moving batch data to device 4.769 ms, 48.21 s total | |
[ 2023-10-08 01:27:07 ] Completed Epoch: 13 batch 638: forward pass 103.508 ms, 48.32 s total | |
[ 2023-10-08 01:27:07 ] Completed Epoch: 13 batch 638: backward pass 45.875 ms, 48.36 s total | |
[ 2023-10-08 01:27:08 ] Completed Epoch: 13 batch 638: computing loss 153.351 ms, 48.52 s total | |
EPOCH: [13], BATCH: [638/889], loss: 0.361, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.011, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 638 | |
[ 2023-10-08 01:27:09 ] Completed saving temp checkpoint 1,189.140 ms, 49.71 s total | |
[ 2023-10-08 01:27:09 ] Completed replacing temp checkpoint with checkpoint 62.906 ms, 49.77 s total | |
[ 2023-10-08 01:27:09 ] Completed Epoch: 13 batch 639: moving batch data to device 6.781 ms, 49.78 s total | |
[ 2023-10-08 01:27:09 ] Completed Epoch: 13 batch 639: forward pass 103.470 ms, 49.88 s total | |
[ 2023-10-08 01:27:09 ] Completed Epoch: 13 batch 639: backward pass 76.585 ms, 49.96 s total | |
[ 2023-10-08 01:27:09 ] Completed Epoch: 13 batch 639: computing loss 115.785 ms, 50.07 s total | |
EPOCH: [13], BATCH: [639/889], loss: 0.401, loss_box_reg: 0.121, loss_classifier: 0.100, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 639 | |
[ 2023-10-08 01:27:11 ] Completed saving temp checkpoint 1,532.919 ms, 51.61 s total | |
[ 2023-10-08 01:27:11 ] Completed replacing temp checkpoint with checkpoint 83.585 ms, 51.69 s total | |
[ 2023-10-08 01:27:11 ] Completed Epoch: 13 batch 640: moving batch data to device 5.001 ms, 51.69 s total | |
[ 2023-10-08 01:27:11 ] Completed Epoch: 13 batch 640: forward pass 101.052 ms, 51.79 s total | |
[ 2023-10-08 01:27:11 ] Completed Epoch: 13 batch 640: backward pass 74.435 ms, 51.87 s total | |
[ 2023-10-08 01:27:11 ] Completed Epoch: 13 batch 640: computing loss 126.832 ms, 52.00 s total | |
EPOCH: [13], BATCH: [640/889], loss: 0.393, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 640 | |
[ 2023-10-08 01:27:13 ] Completed saving temp checkpoint 1,591.709 ms, 53.59 s total | |
[ 2023-10-08 01:27:13 ] Completed replacing temp checkpoint with checkpoint 101.947 ms, 53.69 s total | |
[ 2023-10-08 01:27:13 ] Completed Epoch: 13 batch 641: moving batch data to device 6.958 ms, 53.70 s total | |
[ 2023-10-08 01:27:13 ] Completed Epoch: 13 batch 641: forward pass 103.299 ms, 53.80 s total | |
[ 2023-10-08 01:27:13 ] Completed Epoch: 13 batch 641: backward pass 42.747 ms, 53.84 s total | |
[ 2023-10-08 01:27:13 ] Completed Epoch: 13 batch 641: computing loss 151.900 ms, 53.99 s total | |
EPOCH: [13], BATCH: [641/889], loss: 0.382, loss_box_reg: 0.117, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 641 | |
[ 2023-10-08 01:27:14 ] Completed saving temp checkpoint 1,218.723 ms, 55.21 s total | |
[ 2023-10-08 01:27:14 ] Completed replacing temp checkpoint with checkpoint 79.785 ms, 55.29 s total | |
[ 2023-10-08 01:27:14 ] Completed Epoch: 13 batch 642: moving batch data to device 8.187 ms, 55.30 s total | |
[ 2023-10-08 01:27:14 ] Completed Epoch: 13 batch 642: forward pass 108.115 ms, 55.41 s total | |
[ 2023-10-08 01:27:15 ] Completed Epoch: 13 batch 642: backward pass 49.136 ms, 55.46 s total | |
[ 2023-10-08 01:27:15 ] Completed Epoch: 13 batch 642: computing loss 123.960 ms, 55.58 s total | |
EPOCH: [13], BATCH: [642/889], loss: 0.435, loss_box_reg: 0.133, loss_classifier: 0.110, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.036 | |
Saving checkpoint at epoch 13 train batch 642 | |
[ 2023-10-08 01:27:16 ] Completed saving temp checkpoint 982.046 ms, 56.56 s total | |
[ 2023-10-08 01:27:16 ] Completed replacing temp checkpoint with checkpoint 70.208 ms, 56.63 s total | |
[ 2023-10-08 01:27:16 ] Completed Epoch: 13 batch 643: moving batch data to device 8.227 ms, 56.64 s total | |
[ 2023-10-08 01:27:16 ] Completed Epoch: 13 batch 643: forward pass 107.122 ms, 56.75 s total | |
[ 2023-10-08 01:27:16 ] Completed Epoch: 13 batch 643: backward pass 71.203 ms, 56.82 s total | |
[ 2023-10-08 01:27:16 ] Completed Epoch: 13 batch 643: computing loss 125.222 ms, 56.95 s total | |
EPOCH: [13], BATCH: [643/889], loss: 0.409, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.141, loss_objectness: 0.015, loss_rpn_box_reg: 0.035 | |
Saving checkpoint at epoch 13 train batch 643 | |
[ 2023-10-08 01:27:17 ] Completed saving temp checkpoint 1,126.647 ms, 58.07 s total | |
[ 2023-10-08 01:27:17 ] Completed replacing temp checkpoint with checkpoint 75.256 ms, 58.15 s total | |
[ 2023-10-08 01:27:17 ] Completed Epoch: 13 batch 644: moving batch data to device 8.355 ms, 58.16 s total | |
[ 2023-10-08 01:27:17 ] Completed Epoch: 13 batch 644: forward pass 109.206 ms, 58.27 s total | |
[ 2023-10-08 01:27:17 ] Completed Epoch: 13 batch 644: backward pass 39.945 ms, 58.31 s total | |
[ 2023-10-08 01:27:18 ] Completed Epoch: 13 batch 644: computing loss 149.148 ms, 58.46 s total | |
EPOCH: [13], BATCH: [644/889], loss: 0.394, loss_box_reg: 0.118, loss_classifier: 0.103, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 644 | |
[ 2023-10-08 01:27:18 ] Completed saving temp checkpoint 970.100 ms, 59.43 s total | |
[ 2023-10-08 01:27:19 ] Completed replacing temp checkpoint with checkpoint 66.525 ms, 59.49 s total | |
[ 2023-10-08 01:27:19 ] Completed Epoch: 13 batch 645: moving batch data to device 7.772 ms, 59.50 s total | |
[ 2023-10-08 01:27:19 ] Completed Epoch: 13 batch 645: forward pass 106.133 ms, 59.61 s total | |
[ 2023-10-08 01:27:19 ] Completed Epoch: 13 batch 645: backward pass 76.767 ms, 59.68 s total | |
[ 2023-10-08 01:27:19 ] Completed Epoch: 13 batch 645: computing loss 114.438 ms, 59.80 s total | |
EPOCH: [13], BATCH: [645/889], loss: 0.386, loss_box_reg: 0.118, loss_classifier: 0.092, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 645 | |
[ 2023-10-08 01:27:20 ] Completed saving temp checkpoint 1,147.398 ms, 60.94 s total | |
[ 2023-10-08 01:27:20 ] Completed replacing temp checkpoint with checkpoint 54.441 ms, 61.00 s total | |
[ 2023-10-08 01:27:20 ] Completed Epoch: 13 batch 646: moving batch data to device 6.480 ms, 61.01 s total | |
[ 2023-10-08 01:27:20 ] Completed Epoch: 13 batch 646: forward pass 103.225 ms, 61.11 s total | |
[ 2023-10-08 01:27:20 ] Completed Epoch: 13 batch 646: backward pass 40.453 ms, 61.15 s total | |
[ 2023-10-08 01:27:20 ] Completed Epoch: 13 batch 646: computing loss 128.768 ms, 61.28 s total | |
EPOCH: [13], BATCH: [646/889], loss: 0.387, loss_box_reg: 0.115, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 646 | |
[ 2023-10-08 01:27:21 ] Completed saving temp checkpoint 1,164.935 ms, 62.44 s total | |
[ 2023-10-08 01:27:22 ] Completed replacing temp checkpoint with checkpoint 81.469 ms, 62.52 s total | |
[ 2023-10-08 01:27:22 ] Completed Epoch: 13 batch 647: moving batch data to device 6.675 ms, 62.53 s total | |
[ 2023-10-08 01:27:22 ] Completed Epoch: 13 batch 647: forward pass 112.499 ms, 62.64 s total | |
[ 2023-10-08 01:27:22 ] Completed Epoch: 13 batch 647: backward pass 72.278 ms, 62.72 s total | |
[ 2023-10-08 01:27:22 ] Completed Epoch: 13 batch 647: computing loss 125.817 ms, 62.84 s total | |
EPOCH: [13], BATCH: [647/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.097, loss_mask: 0.130, loss_objectness: 0.012, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 647 | |
[ 2023-10-08 01:27:23 ] Completed saving temp checkpoint 1,533.710 ms, 64.38 s total | |
[ 2023-10-08 01:27:24 ] Completed replacing temp checkpoint with checkpoint 74.761 ms, 64.45 s total | |
[ 2023-10-08 01:27:24 ] Completed Epoch: 13 batch 648: moving batch data to device 5.846 ms, 64.46 s total | |
[ 2023-10-08 01:27:24 ] Completed Epoch: 13 batch 648: forward pass 104.873 ms, 64.56 s total | |
[ 2023-10-08 01:27:24 ] Completed Epoch: 13 batch 648: backward pass 73.055 ms, 64.63 s total | |
[ 2023-10-08 01:27:24 ] Completed Epoch: 13 batch 648: computing loss 123.586 ms, 64.76 s total | |
EPOCH: [13], BATCH: [648/889], loss: 0.376, loss_box_reg: 0.111, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.019, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 648 | |
[ 2023-10-08 01:27:26 ] Completed saving temp checkpoint 1,706.145 ms, 66.46 s total | |
[ 2023-10-08 01:27:26 ] Completed replacing temp checkpoint with checkpoint 92.157 ms, 66.56 s total | |
[ 2023-10-08 01:27:26 ] Completed Epoch: 13 batch 649: moving batch data to device 11.081 ms, 66.57 s total | |
[ 2023-10-08 01:27:26 ] Completed Epoch: 13 batch 649: forward pass 103.347 ms, 66.67 s total | |
[ 2023-10-08 01:27:26 ] Completed Epoch: 13 batch 649: backward pass 71.364 ms, 66.74 s total | |
[ 2023-10-08 01:27:26 ] Completed Epoch: 13 batch 649: computing loss 166.295 ms, 66.91 s total | |
EPOCH: [13], BATCH: [649/889], loss: 0.393, loss_box_reg: 0.113, loss_classifier: 0.096, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.037 | |
Saving checkpoint at epoch 13 train batch 649 | |
[ 2023-10-08 01:27:27 ] Completed saving temp checkpoint 1,181.257 ms, 68.09 s total | |
[ 2023-10-08 01:27:27 ] Completed replacing temp checkpoint with checkpoint 62.903 ms, 68.15 s total | |
[ 2023-10-08 01:27:27 ] Completed Epoch: 13 batch 650: moving batch data to device 5.299 ms, 68.16 s total | |
[ 2023-10-08 01:27:27 ] Completed Epoch: 13 batch 650: forward pass 103.136 ms, 68.26 s total | |
[ 2023-10-08 01:27:27 ] Completed Epoch: 13 batch 650: backward pass 34.361 ms, 68.29 s total | |
[ 2023-10-08 01:27:28 ] Completed Epoch: 13 batch 650: computing loss 165.205 ms, 68.46 s total | |
EPOCH: [13], BATCH: [650/889], loss: 0.372, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.020, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 650 | |
[ 2023-10-08 01:27:28 ] Completed saving temp checkpoint 976.753 ms, 69.44 s total | |
[ 2023-10-08 01:27:29 ] Completed replacing temp checkpoint with checkpoint 71.827 ms, 69.51 s total | |
[ 2023-10-08 01:27:29 ] Completed Epoch: 13 batch 651: moving batch data to device 7.175 ms, 69.52 s total | |
[ 2023-10-08 01:27:29 ] Completed Epoch: 13 batch 651: forward pass 102.954 ms, 69.62 s total | |
[ 2023-10-08 01:27:29 ] Completed Epoch: 13 batch 651: backward pass 34.690 ms, 69.65 s total | |
[ 2023-10-08 01:27:29 ] Completed Epoch: 13 batch 651: computing loss 168.277 ms, 69.82 s total | |
EPOCH: [13], BATCH: [651/889], loss: 0.405, loss_box_reg: 0.124, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 651 | |
[ 2023-10-08 01:27:30 ] Completed saving temp checkpoint 1,567.003 ms, 71.39 s total | |
[ 2023-10-08 01:27:31 ] Completed replacing temp checkpoint with checkpoint 97.423 ms, 71.49 s total | |
[ 2023-10-08 01:27:31 ] Completed Epoch: 13 batch 652: moving batch data to device 6.931 ms, 71.49 s total | |
[ 2023-10-08 01:27:31 ] Completed Epoch: 13 batch 652: forward pass 108.909 ms, 71.60 s total | |
[ 2023-10-08 01:27:31 ] Completed Epoch: 13 batch 652: backward pass 47.988 ms, 71.65 s total | |
[ 2023-10-08 01:27:31 ] Completed Epoch: 13 batch 652: computing loss 147.720 ms, 71.80 s total | |
EPOCH: [13], BATCH: [652/889], loss: 0.382, loss_box_reg: 0.115, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 652 | |
[ 2023-10-08 01:27:32 ] Completed saving temp checkpoint 1,358.632 ms, 73.16 s total | |
[ 2023-10-08 01:27:32 ] Completed replacing temp checkpoint with checkpoint 75.351 ms, 73.23 s total | |
[ 2023-10-08 01:27:32 ] Completed Epoch: 13 batch 653: moving batch data to device 7.899 ms, 73.24 s total | |
[ 2023-10-08 01:27:32 ] Completed Epoch: 13 batch 653: forward pass 106.045 ms, 73.35 s total | |
[ 2023-10-08 01:27:32 ] Completed Epoch: 13 batch 653: backward pass 76.226 ms, 73.42 s total | |
[ 2023-10-08 01:27:33 ] Completed Epoch: 13 batch 653: computing loss 116.567 ms, 73.54 s total | |
EPOCH: [13], BATCH: [653/889], loss: 0.398, loss_box_reg: 0.121, loss_classifier: 0.100, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 653 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 01:40:57 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 01:40:57 ] Completed importing Timer 0.021 ms, 0.00 s total | |
[ 2023-10-08 01:40:58 ] Completed importing everything else 554.587 ms, 0.55 s total | |
[ 2023-10-08 01:40:58 ] Completed defined other functions 0.021 ms, 0.55 s total | |
| distributed init (rank 2): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 4): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 01:41:05 ] Completed main preliminaries 7,827.309 ms, 8.38 s total | |
loading annotations into memory... | |
Done (t=10.77s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-08 01:41:18 ] Completed loading data 12,577.176 ms, 20.96 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 01:41:18 ] Completed creating data samplers 105.694 ms, 21.06 s total | |
[ 2023-10-08 01:41:18 ] Completed creating data loaders 0.222 ms, 21.07 s total | |
[ 2023-10-08 01:41:19 ] Completed creating model and .to(device) 652.205 ms, 21.72 s total | |
[ 2023-10-08 01:41:20 ] Completed preparing model for distributed training 1,680.751 ms, 23.40 s total | |
[ 2023-10-08 01:41:20 ] Completed optimizer and scaler 0.622 ms, 23.40 s total | |
[ 2023-10-08 01:41:20 ] Completed learning rate schedulers 0.247 ms, 23.40 s total | |
[ 2023-10-08 01:41:21 ] Completed init coco evaluator 960.240 ms, 24.36 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 01:41:22 ] Completed retrieving checkpoint 875.605 ms, 25.23 s total | |
EPOCH :: 13 | |
[ 2023-10-08 01:41:22 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 01:41:22 ] Completed training preliminaries 0.851 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 653 | |
[ 2023-10-08 01:41:23 ] Completed Epoch: 13 batch 653: moving batch data to device 444.854 ms, 0.45 s total | |
[ 2023-10-08 01:41:24 ] Completed Epoch: 13 batch 653: forward pass 1,278.660 ms, 1.72 s total | |
[ 2023-10-08 01:41:24 ] Completed Epoch: 13 batch 653: backward pass 172.484 ms, 1.90 s total | |
[ 2023-10-08 01:41:25 ] Completed Epoch: 13 batch 653: computing loss 440.910 ms, 2.34 s total | |
EPOCH: [13], BATCH: [653/889], loss: 0.396, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 653 | |
[ 2023-10-08 01:41:25 ] Completed saving temp checkpoint 947.700 ms, 3.29 s total | |
[ 2023-10-08 01:41:26 ] Completed replacing temp checkpoint with checkpoint 153.532 ms, 3.44 s total | |
[ 2023-10-08 01:41:26 ] Completed Epoch: 13 batch 654: moving batch data to device 3.544 ms, 3.44 s total | |
[ 2023-10-08 01:41:26 ] Completed Epoch: 13 batch 654: forward pass 174.938 ms, 3.62 s total | |
[ 2023-10-08 01:41:26 ] Completed Epoch: 13 batch 654: backward pass 69.700 ms, 3.69 s total | |
[ 2023-10-08 01:41:26 ] Completed Epoch: 13 batch 654: computing loss 217.201 ms, 3.90 s total | |
EPOCH: [13], BATCH: [654/889], loss: 0.375, loss_box_reg: 0.111, loss_classifier: 0.100, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 654 | |
[ 2023-10-08 01:41:27 ] Completed saving temp checkpoint 1,100.004 ms, 5.00 s total | |
[ 2023-10-08 01:41:27 ] Completed replacing temp checkpoint with checkpoint 74.076 ms, 5.08 s total | |
[ 2023-10-08 01:41:27 ] Completed Epoch: 13 batch 655: moving batch data to device 49.885 ms, 5.13 s total | |
[ 2023-10-08 01:41:27 ] Completed Epoch: 13 batch 655: forward pass 122.197 ms, 5.25 s total | |
[ 2023-10-08 01:41:28 ] Completed Epoch: 13 batch 655: backward pass 92.052 ms, 5.34 s total | |
[ 2023-10-08 01:41:28 ] Completed Epoch: 13 batch 655: computing loss 128.587 ms, 5.47 s total | |
EPOCH: [13], BATCH: [655/889], loss: 0.358, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.120, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 655 | |
[ 2023-10-08 01:41:29 ] Completed saving temp checkpoint 971.711 ms, 6.44 s total | |
[ 2023-10-08 01:41:29 ] Completed replacing temp checkpoint with checkpoint 34.083 ms, 6.48 s total | |
[ 2023-10-08 01:41:29 ] Completed Epoch: 13 batch 656: moving batch data to device 2.167 ms, 6.48 s total | |
[ 2023-10-08 01:41:29 ] Completed Epoch: 13 batch 656: forward pass 210.397 ms, 6.69 s total | |
[ 2023-10-08 01:41:29 ] Completed Epoch: 13 batch 656: backward pass 79.622 ms, 6.77 s total | |
[ 2023-10-08 01:41:29 ] Completed Epoch: 13 batch 656: computing loss 133.401 ms, 6.90 s total | |
EPOCH: [13], BATCH: [656/889], loss: 0.400, loss_box_reg: 0.119, loss_classifier: 0.106, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 656 | |
[ 2023-10-08 01:41:30 ] Completed saving temp checkpoint 942.621 ms, 7.85 s total | |
[ 2023-10-08 01:41:30 ] Completed replacing temp checkpoint with checkpoint 45.975 ms, 7.89 s total | |
[ 2023-10-08 01:41:30 ] Completed Epoch: 13 batch 657: moving batch data to device 8.210 ms, 7.90 s total | |
[ 2023-10-08 01:41:30 ] Completed Epoch: 13 batch 657: forward pass 104.455 ms, 8.00 s total | |
[ 2023-10-08 01:41:30 ] Completed Epoch: 13 batch 657: backward pass 38.325 ms, 8.04 s total | |
[ 2023-10-08 01:41:30 ] Completed Epoch: 13 batch 657: computing loss 177.051 ms, 8.22 s total | |
EPOCH: [13], BATCH: [657/889], loss: 0.380, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 657 | |
[ 2023-10-08 01:41:31 ] Completed saving temp checkpoint 784.786 ms, 9.00 s total | |
[ 2023-10-08 01:41:31 ] Completed replacing temp checkpoint with checkpoint 68.837 ms, 9.07 s total | |
[ 2023-10-08 01:41:31 ] Completed Epoch: 13 batch 658: moving batch data to device 4.648 ms, 9.08 s total | |
[ 2023-10-08 01:41:31 ] Completed Epoch: 13 batch 658: forward pass 111.186 ms, 9.19 s total | |
[ 2023-10-08 01:41:31 ] Completed Epoch: 13 batch 658: backward pass 82.817 ms, 9.27 s total | |
[ 2023-10-08 01:41:32 ] Completed Epoch: 13 batch 658: computing loss 240.436 ms, 9.51 s total | |
EPOCH: [13], BATCH: [658/889], loss: 0.402, loss_box_reg: 0.122, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 658 | |
[ 2023-10-08 01:41:33 ] Completed saving temp checkpoint 818.963 ms, 10.33 s total | |
[ 2023-10-08 01:41:33 ] Completed replacing temp checkpoint with checkpoint 60.273 ms, 10.39 s total | |
[ 2023-10-08 01:41:33 ] Completed Epoch: 13 batch 659: moving batch data to device 2.434 ms, 10.39 s total | |
[ 2023-10-08 01:41:33 ] Completed Epoch: 13 batch 659: forward pass 100.733 ms, 10.49 s total | |
[ 2023-10-08 01:41:33 ] Completed Epoch: 13 batch 659: backward pass 47.044 ms, 10.54 s total | |
[ 2023-10-08 01:41:33 ] Completed Epoch: 13 batch 659: computing loss 160.422 ms, 10.70 s total | |
EPOCH: [13], BATCH: [659/889], loss: 0.399, loss_box_reg: 0.120, loss_classifier: 0.096, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 659 | |
[ 2023-10-08 01:41:34 ] Completed saving temp checkpoint 1,434.919 ms, 12.14 s total | |
[ 2023-10-08 01:41:34 ] Completed replacing temp checkpoint with checkpoint 97.156 ms, 12.23 s total | |
[ 2023-10-08 01:41:34 ] Completed Epoch: 13 batch 660: moving batch data to device 10.033 ms, 12.24 s total | |
[ 2023-10-08 01:41:35 ] Completed Epoch: 13 batch 660: forward pass 111.820 ms, 12.36 s total | |
[ 2023-10-08 01:41:35 ] Completed Epoch: 13 batch 660: backward pass 73.202 ms, 12.43 s total | |
[ 2023-10-08 01:41:35 ] Completed Epoch: 13 batch 660: computing loss 124.009 ms, 12.55 s total | |
EPOCH: [13], BATCH: [660/889], loss: 0.393, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 660 | |
[ 2023-10-08 01:41:37 ] Completed saving temp checkpoint 1,827.780 ms, 14.38 s total | |
[ 2023-10-08 01:41:37 ] Completed replacing temp checkpoint with checkpoint 99.834 ms, 14.48 s total | |
[ 2023-10-08 01:41:37 ] Completed Epoch: 13 batch 661: moving batch data to device 2.971 ms, 14.48 s total | |
[ 2023-10-08 01:41:37 ] Completed Epoch: 13 batch 661: forward pass 104.069 ms, 14.59 s total | |
[ 2023-10-08 01:41:37 ] Completed Epoch: 13 batch 661: backward pass 84.118 ms, 14.67 s total | |
[ 2023-10-08 01:41:37 ] Completed Epoch: 13 batch 661: computing loss 108.580 ms, 14.78 s total | |
EPOCH: [13], BATCH: [661/889], loss: 0.398, loss_box_reg: 0.119, loss_classifier: 0.104, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 661 | |
[ 2023-10-08 01:41:39 ] Completed saving temp checkpoint 1,712.442 ms, 16.49 s total | |
[ 2023-10-08 01:41:39 ] Completed replacing temp checkpoint with checkpoint 59.641 ms, 16.55 s total | |
[ 2023-10-08 01:41:39 ] Completed Epoch: 13 batch 662: moving batch data to device 6.984 ms, 16.56 s total | |
[ 2023-10-08 01:41:39 ] Completed Epoch: 13 batch 662: forward pass 106.922 ms, 16.67 s total | |
[ 2023-10-08 01:41:39 ] Completed Epoch: 13 batch 662: backward pass 69.469 ms, 16.74 s total | |
[ 2023-10-08 01:41:39 ] Completed Epoch: 13 batch 662: computing loss 119.709 ms, 16.86 s total | |
EPOCH: [13], BATCH: [662/889], loss: 0.383, loss_box_reg: 0.110, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 662 | |
[ 2023-10-08 01:41:41 ] Completed saving temp checkpoint 1,648.998 ms, 18.50 s total | |
[ 2023-10-08 01:41:41 ] Completed replacing temp checkpoint with checkpoint 68.605 ms, 18.57 s total | |
[ 2023-10-08 01:41:41 ] Completed Epoch: 13 batch 663: moving batch data to device 8.329 ms, 18.58 s total | |
[ 2023-10-08 01:41:41 ] Completed Epoch: 13 batch 663: forward pass 109.012 ms, 18.69 s total | |
[ 2023-10-08 01:41:41 ] Completed Epoch: 13 batch 663: backward pass 38.515 ms, 18.73 s total | |
[ 2023-10-08 01:41:41 ] Completed Epoch: 13 batch 663: computing loss 160.636 ms, 18.89 s total | |
EPOCH: [13], BATCH: [663/889], loss: 0.387, loss_box_reg: 0.119, loss_classifier: 0.101, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 663 | |
[ 2023-10-08 01:41:42 ] Completed saving temp checkpoint 1,071.451 ms, 19.96 s total | |
[ 2023-10-08 01:41:42 ] Completed replacing temp checkpoint with checkpoint 71.871 ms, 20.03 s total | |
[ 2023-10-08 01:41:42 ] Completed Epoch: 13 batch 664: moving batch data to device 8.385 ms, 20.04 s total | |
[ 2023-10-08 01:41:42 ] Completed Epoch: 13 batch 664: forward pass 105.134 ms, 20.15 s total | |
[ 2023-10-08 01:41:42 ] Completed Epoch: 13 batch 664: backward pass 74.512 ms, 20.22 s total | |
[ 2023-10-08 01:41:43 ] Completed Epoch: 13 batch 664: computing loss 122.502 ms, 20.34 s total | |
EPOCH: [13], BATCH: [664/889], loss: 0.356, loss_box_reg: 0.105, loss_classifier: 0.093, loss_mask: 0.128, loss_objectness: 0.011, loss_rpn_box_reg: 0.019 | |
Saving checkpoint at epoch 13 train batch 664 | |
[ 2023-10-08 01:41:44 ] Completed saving temp checkpoint 1,198.784 ms, 21.54 s total | |
[ 2023-10-08 01:41:44 ] Completed replacing temp checkpoint with checkpoint 46.166 ms, 21.59 s total | |
[ 2023-10-08 01:41:44 ] Completed Epoch: 13 batch 665: moving batch data to device 5.238 ms, 21.59 s total | |
[ 2023-10-08 01:41:44 ] Completed Epoch: 13 batch 665: forward pass 106.567 ms, 21.70 s total | |
[ 2023-10-08 01:41:44 ] Completed Epoch: 13 batch 665: backward pass 78.298 ms, 21.78 s total | |
[ 2023-10-08 01:41:44 ] Completed Epoch: 13 batch 665: computing loss 121.125 ms, 21.90 s total | |
EPOCH: [13], BATCH: [665/889], loss: 0.392, loss_box_reg: 0.116, loss_classifier: 0.101, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 665 | |
[ 2023-10-08 01:41:45 ] Completed saving temp checkpoint 1,030.479 ms, 22.93 s total | |
[ 2023-10-08 01:41:45 ] Completed replacing temp checkpoint with checkpoint 72.631 ms, 23.00 s total | |
[ 2023-10-08 01:41:45 ] Completed Epoch: 13 batch 666: moving batch data to device 9.490 ms, 23.01 s total | |
[ 2023-10-08 01:41:45 ] Completed Epoch: 13 batch 666: forward pass 107.482 ms, 23.12 s total | |
[ 2023-10-08 01:41:45 ] Completed Epoch: 13 batch 666: backward pass 72.195 ms, 23.19 s total | |
[ 2023-10-08 01:41:46 ] Completed Epoch: 13 batch 666: computing loss 299.857 ms, 23.49 s total | |
EPOCH: [13], BATCH: [666/889], loss: 0.433, loss_box_reg: 0.134, loss_classifier: 0.117, loss_mask: 0.139, loss_objectness: 0.019, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 666 | |
[ 2023-10-08 01:41:47 ] Completed saving temp checkpoint 1,111.747 ms, 24.60 s total | |
[ 2023-10-08 01:41:47 ] Completed replacing temp checkpoint with checkpoint 63.096 ms, 24.67 s total | |
[ 2023-10-08 01:41:47 ] Completed Epoch: 13 batch 667: moving batch data to device 7.458 ms, 24.67 s total | |
[ 2023-10-08 01:41:47 ] Completed Epoch: 13 batch 667: forward pass 108.927 ms, 24.78 s total | |
[ 2023-10-08 01:41:47 ] Completed Epoch: 13 batch 667: backward pass 32.682 ms, 24.82 s total | |
[ 2023-10-08 01:41:47 ] Completed Epoch: 13 batch 667: computing loss 158.988 ms, 24.97 s total | |
EPOCH: [13], BATCH: [667/889], loss: 0.401, loss_box_reg: 0.120, loss_classifier: 0.098, loss_mask: 0.135, loss_objectness: 0.019, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 667 | |
[ 2023-10-08 01:41:48 ] Completed saving temp checkpoint 1,016.217 ms, 25.99 s total | |
[ 2023-10-08 01:41:48 ] Completed replacing temp checkpoint with checkpoint 70.311 ms, 26.06 s total | |
[ 2023-10-08 01:41:48 ] Completed Epoch: 13 batch 668: moving batch data to device 7.867 ms, 26.07 s total | |
[ 2023-10-08 01:41:48 ] Completed Epoch: 13 batch 668: forward pass 102.700 ms, 26.17 s total | |
[ 2023-10-08 01:41:48 ] Completed Epoch: 13 batch 668: backward pass 78.047 ms, 26.25 s total | |
[ 2023-10-08 01:41:49 ] Completed Epoch: 13 batch 668: computing loss 128.564 ms, 26.38 s total | |
EPOCH: [13], BATCH: [668/889], loss: 0.362, loss_box_reg: 0.104, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 668 | |
[ 2023-10-08 01:41:50 ] Completed saving temp checkpoint 1,069.003 ms, 27.45 s total | |
[ 2023-10-08 01:41:50 ] Completed replacing temp checkpoint with checkpoint 43.315 ms, 27.49 s total | |
[ 2023-10-08 01:41:50 ] Completed Epoch: 13 batch 669: moving batch data to device 8.262 ms, 27.50 s total | |
[ 2023-10-08 01:41:50 ] Completed Epoch: 13 batch 669: forward pass 108.305 ms, 27.61 s total | |
[ 2023-10-08 01:41:50 ] Completed Epoch: 13 batch 669: backward pass 78.412 ms, 27.69 s total | |
[ 2023-10-08 01:41:50 ] Completed Epoch: 13 batch 669: computing loss 116.174 ms, 27.80 s total | |
EPOCH: [13], BATCH: [669/889], loss: 0.393, loss_box_reg: 0.121, loss_classifier: 0.102, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 669 | |
[ 2023-10-08 01:41:51 ] Completed saving temp checkpoint 995.723 ms, 28.80 s total | |
[ 2023-10-08 01:41:51 ] Completed replacing temp checkpoint with checkpoint 46.502 ms, 28.84 s total | |
[ 2023-10-08 01:41:51 ] Completed Epoch: 13 batch 670: moving batch data to device 7.742 ms, 28.85 s total | |
[ 2023-10-08 01:41:51 ] Completed Epoch: 13 batch 670: forward pass 123.522 ms, 28.98 s total | |
[ 2023-10-08 01:41:51 ] Completed Epoch: 13 batch 670: backward pass 47.861 ms, 29.02 s total | |
[ 2023-10-08 01:41:51 ] Completed Epoch: 13 batch 670: computing loss 146.909 ms, 29.17 s total | |
EPOCH: [13], BATCH: [670/889], loss: 0.357, loss_box_reg: 0.107, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 670 | |
[ 2023-10-08 01:41:52 ] Completed saving temp checkpoint 1,040.632 ms, 30.21 s total | |
[ 2023-10-08 01:41:52 ] Completed replacing temp checkpoint with checkpoint 61.298 ms, 30.27 s total | |
[ 2023-10-08 01:41:52 ] Completed Epoch: 13 batch 671: moving batch data to device 7.500 ms, 30.28 s total | |
[ 2023-10-08 01:41:53 ] Completed Epoch: 13 batch 671: forward pass 105.598 ms, 30.39 s total | |
[ 2023-10-08 01:41:53 ] Completed Epoch: 13 batch 671: backward pass 74.517 ms, 30.46 s total | |
[ 2023-10-08 01:41:53 ] Completed Epoch: 13 batch 671: computing loss 112.008 ms, 30.57 s total | |
EPOCH: [13], BATCH: [671/889], loss: 0.390, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 671 | |
[ 2023-10-08 01:41:54 ] Completed saving temp checkpoint 973.233 ms, 31.54 s total | |
[ 2023-10-08 01:41:54 ] Completed replacing temp checkpoint with checkpoint 47.194 ms, 31.59 s total | |
[ 2023-10-08 01:41:54 ] Completed Epoch: 13 batch 672: moving batch data to device 7.101 ms, 31.60 s total | |
[ 2023-10-08 01:41:54 ] Completed Epoch: 13 batch 672: forward pass 105.263 ms, 31.70 s total | |
[ 2023-10-08 01:41:54 ] Completed Epoch: 13 batch 672: backward pass 33.876 ms, 31.74 s total | |
[ 2023-10-08 01:41:54 ] Completed Epoch: 13 batch 672: computing loss 136.171 ms, 31.87 s total | |
EPOCH: [13], BATCH: [672/889], loss: 0.391, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 672 | |
[ 2023-10-08 01:41:56 ] Completed saving temp checkpoint 1,654.808 ms, 33.53 s total | |
[ 2023-10-08 01:41:56 ] Completed replacing temp checkpoint with checkpoint 45.881 ms, 33.58 s total | |
[ 2023-10-08 01:41:56 ] Completed Epoch: 13 batch 673: moving batch data to device 7.221 ms, 33.58 s total | |
[ 2023-10-08 01:41:56 ] Completed Epoch: 13 batch 673: forward pass 110.193 ms, 33.69 s total | |
[ 2023-10-08 01:41:56 ] Completed Epoch: 13 batch 673: backward pass 80.402 ms, 33.77 s total | |
[ 2023-10-08 01:41:56 ] Completed Epoch: 13 batch 673: computing loss 110.067 ms, 33.88 s total | |
EPOCH: [13], BATCH: [673/889], loss: 0.409, loss_box_reg: 0.123, loss_classifier: 0.105, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 673 | |
[ 2023-10-08 01:41:57 ] Completed saving temp checkpoint 1,313.587 ms, 35.20 s total | |
[ 2023-10-08 01:41:57 ] Completed replacing temp checkpoint with checkpoint 87.593 ms, 35.28 s total | |
[ 2023-10-08 01:41:57 ] Completed Epoch: 13 batch 674: moving batch data to device 6.822 ms, 35.29 s total | |
[ 2023-10-08 01:41:58 ] Completed Epoch: 13 batch 674: forward pass 108.130 ms, 35.40 s total | |
[ 2023-10-08 01:41:58 ] Completed Epoch: 13 batch 674: backward pass 69.549 ms, 35.47 s total | |
[ 2023-10-08 01:41:58 ] Completed Epoch: 13 batch 674: computing loss 124.153 ms, 35.59 s total | |
EPOCH: [13], BATCH: [674/889], loss: 0.392, loss_box_reg: 0.117, loss_classifier: 0.102, loss_mask: 0.137, loss_objectness: 0.016, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 674 | |
[ 2023-10-08 01:41:59 ] Completed saving temp checkpoint 1,167.439 ms, 36.76 s total | |
[ 2023-10-08 01:41:59 ] Completed replacing temp checkpoint with checkpoint 81.407 ms, 36.84 s total | |
[ 2023-10-08 01:41:59 ] Completed Epoch: 13 batch 675: moving batch data to device 6.320 ms, 36.85 s total | |
[ 2023-10-08 01:41:59 ] Completed Epoch: 13 batch 675: forward pass 100.317 ms, 36.95 s total | |
[ 2023-10-08 01:41:59 ] Completed Epoch: 13 batch 675: backward pass 73.816 ms, 37.02 s total | |
[ 2023-10-08 01:41:59 ] Completed Epoch: 13 batch 675: computing loss 108.512 ms, 37.13 s total | |
EPOCH: [13], BATCH: [675/889], loss: 0.386, loss_box_reg: 0.116, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 675 | |
[ 2023-10-08 01:42:00 ] Completed saving temp checkpoint 995.105 ms, 38.13 s total | |
[ 2023-10-08 01:42:00 ] Completed replacing temp checkpoint with checkpoint 51.829 ms, 38.18 s total | |
[ 2023-10-08 01:42:00 ] Completed Epoch: 13 batch 676: moving batch data to device 6.515 ms, 38.18 s total | |
[ 2023-10-08 01:42:00 ] Completed Epoch: 13 batch 676: forward pass 103.576 ms, 38.29 s total | |
[ 2023-10-08 01:42:01 ] Completed Epoch: 13 batch 676: backward pass 79.647 ms, 38.37 s total | |
[ 2023-10-08 01:42:01 ] Completed Epoch: 13 batch 676: computing loss 112.609 ms, 38.48 s total | |
EPOCH: [13], BATCH: [676/889], loss: 0.409, loss_box_reg: 0.126, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 676 | |
[ 2023-10-08 01:42:02 ] Completed saving temp checkpoint 1,182.308 ms, 39.66 s total | |
[ 2023-10-08 01:42:02 ] Completed replacing temp checkpoint with checkpoint 74.461 ms, 39.74 s total | |
[ 2023-10-08 01:42:02 ] Completed Epoch: 13 batch 677: moving batch data to device 6.410 ms, 39.74 s total | |
[ 2023-10-08 01:42:02 ] Completed Epoch: 13 batch 677: forward pass 106.038 ms, 39.85 s total | |
[ 2023-10-08 01:42:02 ] Completed Epoch: 13 batch 677: backward pass 77.005 ms, 39.93 s total | |
[ 2023-10-08 01:42:02 ] Completed Epoch: 13 batch 677: computing loss 98.853 ms, 40.02 s total | |
EPOCH: [13], BATCH: [677/889], loss: 0.346, loss_box_reg: 0.098, loss_classifier: 0.085, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 677 | |
[ 2023-10-08 01:42:03 ] Completed saving temp checkpoint 952.381 ms, 40.98 s total | |
[ 2023-10-08 01:42:03 ] Completed replacing temp checkpoint with checkpoint 68.040 ms, 41.05 s total | |
[ 2023-10-08 01:42:03 ] Completed Epoch: 13 batch 678: moving batch data to device 6.858 ms, 41.05 s total | |
[ 2023-10-08 01:42:03 ] Completed Epoch: 13 batch 678: forward pass 101.482 ms, 41.15 s total | |
[ 2023-10-08 01:42:03 ] Completed Epoch: 13 batch 678: backward pass 78.074 ms, 41.23 s total | |
[ 2023-10-08 01:42:04 ] Completed Epoch: 13 batch 678: computing loss 117.767 ms, 41.35 s total | |
EPOCH: [13], BATCH: [678/889], loss: 0.393, loss_box_reg: 0.113, loss_classifier: 0.098, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 678 | |
[ 2023-10-08 01:42:05 ] Completed saving temp checkpoint 1,109.014 ms, 42.46 s total | |
[ 2023-10-08 01:42:05 ] Completed replacing temp checkpoint with checkpoint 30.677 ms, 42.49 s total | |
[ 2023-10-08 01:42:05 ] Completed Epoch: 13 batch 679: moving batch data to device 5.161 ms, 42.49 s total | |
[ 2023-10-08 01:42:05 ] Completed Epoch: 13 batch 679: forward pass 106.157 ms, 42.60 s total | |
[ 2023-10-08 01:42:05 ] Completed Epoch: 13 batch 679: backward pass 41.448 ms, 42.64 s total | |
[ 2023-10-08 01:42:05 ] Completed Epoch: 13 batch 679: computing loss 156.883 ms, 42.80 s total | |
EPOCH: [13], BATCH: [679/889], loss: 0.372, loss_box_reg: 0.107, loss_classifier: 0.098, loss_mask: 0.125, loss_objectness: 0.018, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 679 | |
[ 2023-10-08 01:42:07 ] Completed saving temp checkpoint 1,687.780 ms, 44.49 s total | |
[ 2023-10-08 01:42:07 ] Completed replacing temp checkpoint with checkpoint 40.496 ms, 44.53 s total | |
[ 2023-10-08 01:42:07 ] Completed Epoch: 13 batch 680: moving batch data to device 6.126 ms, 44.53 s total | |
[ 2023-10-08 01:42:07 ] Completed Epoch: 13 batch 680: forward pass 105.650 ms, 44.64 s total | |
[ 2023-10-08 01:42:07 ] Completed Epoch: 13 batch 680: backward pass 79.108 ms, 44.72 s total | |
[ 2023-10-08 01:42:07 ] Completed Epoch: 13 batch 680: computing loss 90.343 ms, 44.81 s total | |
EPOCH: [13], BATCH: [680/889], loss: 0.378, loss_box_reg: 0.113, loss_classifier: 0.097, loss_mask: 0.124, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 680 | |
[ 2023-10-08 01:42:09 ] Completed saving temp checkpoint 1,771.296 ms, 46.58 s total | |
[ 2023-10-08 01:42:09 ] Completed replacing temp checkpoint with checkpoint 43.076 ms, 46.62 s total | |
[ 2023-10-08 01:42:09 ] Completed Epoch: 13 batch 681: moving batch data to device 6.770 ms, 46.63 s total | |
[ 2023-10-08 01:42:09 ] Completed Epoch: 13 batch 681: forward pass 107.463 ms, 46.74 s total | |
[ 2023-10-08 01:42:09 ] Completed Epoch: 13 batch 681: backward pass 94.715 ms, 46.83 s total | |
[ 2023-10-08 01:42:09 ] Completed Epoch: 13 batch 681: computing loss 109.641 ms, 46.94 s total | |
EPOCH: [13], BATCH: [681/889], loss: 0.399, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 681 | |
[ 2023-10-08 01:42:10 ] Completed saving temp checkpoint 1,016.200 ms, 47.96 s total | |
[ 2023-10-08 01:42:10 ] Completed replacing temp checkpoint with checkpoint 72.297 ms, 48.03 s total | |
[ 2023-10-08 01:42:10 ] Completed Epoch: 13 batch 682: moving batch data to device 6.808 ms, 48.04 s total | |
[ 2023-10-08 01:42:10 ] Completed Epoch: 13 batch 682: forward pass 107.287 ms, 48.14 s total | |
[ 2023-10-08 01:42:10 ] Completed Epoch: 13 batch 682: backward pass 77.950 ms, 48.22 s total | |
[ 2023-10-08 01:42:11 ] Completed Epoch: 13 batch 682: computing loss 106.716 ms, 48.33 s total | |
EPOCH: [13], BATCH: [682/889], loss: 0.448, loss_box_reg: 0.142, loss_classifier: 0.116, loss_mask: 0.140, loss_objectness: 0.017, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 682 | |
[ 2023-10-08 01:42:12 ] Completed saving temp checkpoint 1,075.568 ms, 49.40 s total | |
[ 2023-10-08 01:42:12 ] Completed replacing temp checkpoint with checkpoint 53.984 ms, 49.46 s total | |
[ 2023-10-08 01:42:12 ] Completed Epoch: 13 batch 683: moving batch data to device 5.938 ms, 49.46 s total | |
[ 2023-10-08 01:42:12 ] Completed Epoch: 13 batch 683: forward pass 101.217 ms, 49.57 s total | |
[ 2023-10-08 01:42:12 ] Completed Epoch: 13 batch 683: backward pass 53.267 ms, 49.62 s total | |
[ 2023-10-08 01:42:12 ] Completed Epoch: 13 batch 683: computing loss 139.362 ms, 49.76 s total | |
EPOCH: [13], BATCH: [683/889], loss: 0.380, loss_box_reg: 0.116, loss_classifier: 0.102, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 683 | |
[ 2023-10-08 01:42:13 ] Completed saving temp checkpoint 1,195.542 ms, 50.95 s total | |
[ 2023-10-08 01:42:13 ] Completed replacing temp checkpoint with checkpoint 80.424 ms, 51.03 s total | |
[ 2023-10-08 01:42:13 ] Completed Epoch: 13 batch 684: moving batch data to device 5.885 ms, 51.04 s total | |
[ 2023-10-08 01:42:13 ] Completed Epoch: 13 batch 684: forward pass 109.662 ms, 51.15 s total | |
[ 2023-10-08 01:42:13 ] Completed Epoch: 13 batch 684: backward pass 80.762 ms, 51.23 s total | |
[ 2023-10-08 01:42:14 ] Completed Epoch: 13 batch 684: computing loss 88.349 ms, 51.32 s total | |
EPOCH: [13], BATCH: [684/889], loss: 0.381, loss_box_reg: 0.114, loss_classifier: 0.094, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 684 | |
[ 2023-10-08 01:42:15 ] Completed saving temp checkpoint 1,083.350 ms, 52.40 s total | |
[ 2023-10-08 01:42:15 ] Completed replacing temp checkpoint with checkpoint 78.441 ms, 52.48 s total | |
[ 2023-10-08 01:42:15 ] Completed Epoch: 13 batch 685: moving batch data to device 7.356 ms, 52.49 s total | |
[ 2023-10-08 01:42:15 ] Completed Epoch: 13 batch 685: forward pass 100.686 ms, 52.59 s total | |
[ 2023-10-08 01:42:15 ] Completed Epoch: 13 batch 685: backward pass 55.602 ms, 52.64 s total | |
[ 2023-10-08 01:42:15 ] Completed Epoch: 13 batch 685: computing loss 134.624 ms, 52.78 s total | |
EPOCH: [13], BATCH: [685/889], loss: 0.414, loss_box_reg: 0.118, loss_classifier: 0.113, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 685 | |
[ 2023-10-08 01:42:16 ] Completed saving temp checkpoint 1,007.810 ms, 53.79 s total | |
[ 2023-10-08 01:42:16 ] Completed replacing temp checkpoint with checkpoint 74.274 ms, 53.86 s total | |
[ 2023-10-08 01:42:16 ] Completed Epoch: 13 batch 686: moving batch data to device 6.450 ms, 53.87 s total | |
[ 2023-10-08 01:42:16 ] Completed Epoch: 13 batch 686: forward pass 104.544 ms, 53.97 s total | |
[ 2023-10-08 01:42:16 ] Completed Epoch: 13 batch 686: backward pass 65.891 ms, 54.04 s total | |
[ 2023-10-08 01:42:16 ] Completed Epoch: 13 batch 686: computing loss 131.755 ms, 54.17 s total | |
EPOCH: [13], BATCH: [686/889], loss: 0.409, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 686 | |
[ 2023-10-08 01:42:18 ] Completed saving temp checkpoint 1,589.437 ms, 55.76 s total | |
[ 2023-10-08 01:42:18 ] Completed replacing temp checkpoint with checkpoint 77.717 ms, 55.84 s total | |
[ 2023-10-08 01:42:18 ] Completed Epoch: 13 batch 687: moving batch data to device 7.200 ms, 55.84 s total | |
[ 2023-10-08 01:42:18 ] Completed Epoch: 13 batch 687: forward pass 108.212 ms, 55.95 s total | |
[ 2023-10-08 01:42:18 ] Completed Epoch: 13 batch 687: backward pass 46.510 ms, 56.00 s total | |
[ 2023-10-08 01:42:18 ] Completed Epoch: 13 batch 687: computing loss 146.865 ms, 56.15 s total | |
EPOCH: [13], BATCH: [687/889], loss: 0.353, loss_box_reg: 0.102, loss_classifier: 0.089, loss_mask: 0.127, loss_objectness: 0.013, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 687 | |
[ 2023-10-08 01:42:20 ] Completed saving temp checkpoint 1,399.759 ms, 57.55 s total | |
[ 2023-10-08 01:42:20 ] Completed replacing temp checkpoint with checkpoint 40.991 ms, 57.59 s total | |
[ 2023-10-08 01:42:20 ] Completed Epoch: 13 batch 688: moving batch data to device 6.492 ms, 57.59 s total | |
[ 2023-10-08 01:42:20 ] Completed Epoch: 13 batch 688: forward pass 102.481 ms, 57.70 s total | |
[ 2023-10-08 01:42:20 ] Completed Epoch: 13 batch 688: backward pass 56.902 ms, 57.75 s total | |
[ 2023-10-08 01:42:20 ] Completed Epoch: 13 batch 688: computing loss 134.132 ms, 57.89 s total | |
EPOCH: [13], BATCH: [688/889], loss: 0.403, loss_box_reg: 0.120, loss_classifier: 0.103, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 688 | |
[ 2023-10-08 01:42:21 ] Completed saving temp checkpoint 1,358.930 ms, 59.25 s total | |
[ 2023-10-08 01:42:22 ] Completed replacing temp checkpoint with checkpoint 58.851 ms, 59.30 s total | |
[ 2023-10-08 01:42:22 ] Completed Epoch: 13 batch 689: moving batch data to device 8.436 ms, 59.31 s total | |
[ 2023-10-08 01:42:22 ] Completed Epoch: 13 batch 689: forward pass 106.338 ms, 59.42 s total | |
[ 2023-10-08 01:42:22 ] Completed Epoch: 13 batch 689: backward pass 72.708 ms, 59.49 s total | |
[ 2023-10-08 01:42:22 ] Completed Epoch: 13 batch 689: computing loss 119.460 ms, 59.61 s total | |
EPOCH: [13], BATCH: [689/889], loss: 0.404, loss_box_reg: 0.120, loss_classifier: 0.104, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 689 | |
[ 2023-10-08 01:42:23 ] Completed saving temp checkpoint 1,015.835 ms, 60.63 s total | |
[ 2023-10-08 01:42:23 ] Completed replacing temp checkpoint with checkpoint 46.606 ms, 60.67 s total | |
[ 2023-10-08 01:42:23 ] Completed Epoch: 13 batch 690: moving batch data to device 6.203 ms, 60.68 s total | |
[ 2023-10-08 01:42:23 ] Completed Epoch: 13 batch 690: forward pass 106.271 ms, 60.79 s total | |
[ 2023-10-08 01:42:23 ] Completed Epoch: 13 batch 690: backward pass 47.428 ms, 60.83 s total | |
[ 2023-10-08 01:42:23 ] Completed Epoch: 13 batch 690: computing loss 144.883 ms, 60.98 s total | |
EPOCH: [13], BATCH: [690/889], loss: 0.373, loss_box_reg: 0.107, loss_classifier: 0.095, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 690 | |
[ 2023-10-08 01:42:24 ] Completed saving temp checkpoint 1,158.842 ms, 62.14 s total | |
[ 2023-10-08 01:42:24 ] Completed replacing temp checkpoint with checkpoint 81.188 ms, 62.22 s total | |
[ 2023-10-08 01:42:24 ] Completed Epoch: 13 batch 691: moving batch data to device 8.954 ms, 62.23 s total | |
[ 2023-10-08 01:42:25 ] Completed Epoch: 13 batch 691: forward pass 110.753 ms, 62.34 s total | |
[ 2023-10-08 01:42:25 ] Completed Epoch: 13 batch 691: backward pass 80.227 ms, 62.42 s total | |
[ 2023-10-08 01:42:25 ] Completed Epoch: 13 batch 691: computing loss 114.722 ms, 62.53 s total | |
EPOCH: [13], BATCH: [691/889], loss: 0.359, loss_box_reg: 0.108, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 691 | |
[ 2023-10-08 01:42:26 ] Completed saving temp checkpoint 1,018.327 ms, 63.55 s total | |
[ 2023-10-08 01:42:26 ] Completed replacing temp checkpoint with checkpoint 58.493 ms, 63.61 s total | |
[ 2023-10-08 01:42:26 ] Completed Epoch: 13 batch 692: moving batch data to device 6.287 ms, 63.62 s total | |
[ 2023-10-08 01:42:26 ] Completed Epoch: 13 batch 692: forward pass 104.703 ms, 63.72 s total | |
[ 2023-10-08 01:42:26 ] Completed Epoch: 13 batch 692: backward pass 76.836 ms, 63.80 s total | |
[ 2023-10-08 01:42:26 ] Completed Epoch: 13 batch 692: computing loss 118.377 ms, 63.92 s total | |
EPOCH: [13], BATCH: [692/889], loss: 0.370, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 692 | |
[ 2023-10-08 01:42:27 ] Completed saving temp checkpoint 1,162.709 ms, 65.08 s total | |
[ 2023-10-08 01:42:27 ] Completed replacing temp checkpoint with checkpoint 77.880 ms, 65.16 s total | |
[ 2023-10-08 01:42:27 ] Completed Epoch: 13 batch 693: moving batch data to device 6.269 ms, 65.16 s total | |
[ 2023-10-08 01:42:27 ] Completed Epoch: 13 batch 693: forward pass 110.648 ms, 65.27 s total | |
[ 2023-10-08 01:42:28 ] Completed Epoch: 13 batch 693: backward pass 69.785 ms, 65.34 s total | |
[ 2023-10-08 01:42:28 ] Completed Epoch: 13 batch 693: computing loss 113.928 ms, 65.46 s total | |
EPOCH: [13], BATCH: [693/889], loss: 0.415, loss_box_reg: 0.124, loss_classifier: 0.105, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 693 | |
[ 2023-10-08 01:42:29 ] Completed saving temp checkpoint 1,014.684 ms, 66.47 s total | |
[ 2023-10-08 01:42:29 ] Completed replacing temp checkpoint with checkpoint 52.195 ms, 66.52 s total | |
[ 2023-10-08 01:42:29 ] Completed Epoch: 13 batch 694: moving batch data to device 7.269 ms, 66.53 s total | |
[ 2023-10-08 01:42:29 ] Completed Epoch: 13 batch 694: forward pass 101.947 ms, 66.63 s total | |
[ 2023-10-08 01:42:29 ] Completed Epoch: 13 batch 694: backward pass 33.888 ms, 66.67 s total | |
[ 2023-10-08 01:42:29 ] Completed Epoch: 13 batch 694: computing loss 136.936 ms, 66.80 s total | |
EPOCH: [13], BATCH: [694/889], loss: 0.435, loss_box_reg: 0.133, loss_classifier: 0.110, loss_mask: 0.138, loss_objectness: 0.020, loss_rpn_box_reg: 0.035 | |
Saving checkpoint at epoch 13 train batch 694 | |
[ 2023-10-08 01:42:30 ] Completed saving temp checkpoint 1,192.971 ms, 68.00 s total | |
[ 2023-10-08 01:42:30 ] Completed replacing temp checkpoint with checkpoint 58.126 ms, 68.05 s total | |
[ 2023-10-08 01:42:30 ] Completed Epoch: 13 batch 695: moving batch data to device 5.855 ms, 68.06 s total | |
[ 2023-10-08 01:42:30 ] Completed Epoch: 13 batch 695: forward pass 102.710 ms, 68.16 s total | |
[ 2023-10-08 01:42:30 ] Completed Epoch: 13 batch 695: backward pass 81.669 ms, 68.25 s total | |
[ 2023-10-08 01:42:31 ] Completed Epoch: 13 batch 695: computing loss 121.380 ms, 68.37 s total | |
EPOCH: [13], BATCH: [695/889], loss: 0.406, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 695 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 01:55:45 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 01:55:45 ] Completed importing Timer 0.021 ms, 0.00 s total | |
[ 2023-10-08 01:55:45 ] Completed importing everything else 526.177 ms, 0.53 s total | |
[ 2023-10-08 01:55:45 ] Completed defined other functions 0.023 ms, 0.53 s total | |
| distributed init (rank 4): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 1): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 01:55:48 ] Completed main preliminaries 2,670.097 ms, 3.20 s total | |
loading annotations into memory... | |
Done (t=11.33s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-08 01:56:01 ] Completed loading data 13,273.341 ms, 16.47 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 01:56:01 ] Completed creating data samplers 105.959 ms, 16.58 s total | |
[ 2023-10-08 01:56:01 ] Completed creating data loaders 0.215 ms, 16.58 s total | |
[ 2023-10-08 01:56:02 ] Completed creating model and .to(device) 638.304 ms, 17.21 s total | |
[ 2023-10-08 01:56:03 ] Completed preparing model for distributed training 1,416.382 ms, 18.63 s total | |
[ 2023-10-08 01:56:03 ] Completed optimizer and scaler 0.626 ms, 18.63 s total | |
[ 2023-10-08 01:56:03 ] Completed learning rate schedulers 0.258 ms, 18.63 s total | |
[ 2023-10-08 01:56:04 ] Completed init coco evaluator 951.953 ms, 19.58 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 01:56:05 ] Completed retrieving checkpoint 929.446 ms, 20.51 s total | |
EPOCH :: 13 | |
[ 2023-10-08 01:56:05 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 01:56:05 ] Completed training preliminaries 0.913 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 695 | |
[ 2023-10-08 01:56:06 ] Completed Epoch: 13 batch 695: moving batch data to device 522.244 ms, 0.52 s total | |
[ 2023-10-08 01:56:07 ] Completed Epoch: 13 batch 695: forward pass 846.108 ms, 1.37 s total | |
[ 2023-10-08 01:56:07 ] Completed Epoch: 13 batch 695: backward pass 177.229 ms, 1.55 s total | |
[ 2023-10-08 01:56:07 ] Completed Epoch: 13 batch 695: computing loss 527.307 ms, 2.07 s total | |
EPOCH: [13], BATCH: [695/889], loss: 0.404, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 695 | |
[ 2023-10-08 01:56:08 ] Completed saving temp checkpoint 826.990 ms, 2.90 s total | |
[ 2023-10-08 01:56:08 ] Completed replacing temp checkpoint with checkpoint 178.703 ms, 3.08 s total | |
[ 2023-10-08 01:56:08 ] Completed Epoch: 13 batch 696: moving batch data to device 3.936 ms, 3.08 s total | |
[ 2023-10-08 01:56:08 ] Completed Epoch: 13 batch 696: forward pass 109.345 ms, 3.19 s total | |
[ 2023-10-08 01:56:08 ] Completed Epoch: 13 batch 696: backward pass 128.837 ms, 3.32 s total | |
[ 2023-10-08 01:56:09 ] Completed Epoch: 13 batch 696: computing loss 97.569 ms, 3.42 s total | |
EPOCH: [13], BATCH: [696/889], loss: 0.364, loss_box_reg: 0.109, loss_classifier: 0.086, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 696 | |
[ 2023-10-08 01:56:10 ] Completed saving temp checkpoint 1,047.460 ms, 4.47 s total | |
[ 2023-10-08 01:56:10 ] Completed replacing temp checkpoint with checkpoint 52.011 ms, 4.52 s total | |
[ 2023-10-08 01:56:10 ] Completed Epoch: 13 batch 697: moving batch data to device 3.586 ms, 4.52 s total | |
[ 2023-10-08 01:56:10 ] Completed Epoch: 13 batch 697: forward pass 110.296 ms, 4.63 s total | |
[ 2023-10-08 01:56:10 ] Completed Epoch: 13 batch 697: backward pass 70.152 ms, 4.70 s total | |
[ 2023-10-08 01:56:10 ] Completed Epoch: 13 batch 697: computing loss 146.967 ms, 4.85 s total | |
EPOCH: [13], BATCH: [697/889], loss: 0.394, loss_box_reg: 0.118, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 697 | |
[ 2023-10-08 01:56:11 ] Completed saving temp checkpoint 1,014.375 ms, 5.86 s total | |
[ 2023-10-08 01:56:11 ] Completed replacing temp checkpoint with checkpoint 73.752 ms, 5.94 s total | |
[ 2023-10-08 01:56:11 ] Completed Epoch: 13 batch 698: moving batch data to device 13.309 ms, 5.95 s total | |
[ 2023-10-08 01:56:11 ] Completed Epoch: 13 batch 698: forward pass 107.339 ms, 6.06 s total | |
[ 2023-10-08 01:56:11 ] Completed Epoch: 13 batch 698: backward pass 81.096 ms, 6.14 s total | |
[ 2023-10-08 01:56:11 ] Completed Epoch: 13 batch 698: computing loss 136.091 ms, 6.28 s total | |
EPOCH: [13], BATCH: [698/889], loss: 0.394, loss_box_reg: 0.120, loss_classifier: 0.094, loss_mask: 0.139, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 698 | |
[ 2023-10-08 01:56:12 ] Completed saving temp checkpoint 997.170 ms, 7.27 s total | |
[ 2023-10-08 01:56:12 ] Completed replacing temp checkpoint with checkpoint 70.209 ms, 7.34 s total | |
[ 2023-10-08 01:56:12 ] Completed Epoch: 13 batch 699: moving batch data to device 2.890 ms, 7.35 s total | |
[ 2023-10-08 01:56:13 ] Completed Epoch: 13 batch 699: forward pass 113.591 ms, 7.46 s total | |
[ 2023-10-08 01:56:13 ] Completed Epoch: 13 batch 699: backward pass 78.322 ms, 7.54 s total | |
[ 2023-10-08 01:56:13 ] Completed Epoch: 13 batch 699: computing loss 134.492 ms, 7.67 s total | |
EPOCH: [13], BATCH: [699/889], loss: 0.380, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 699 | |
[ 2023-10-08 01:56:14 ] Completed saving temp checkpoint 861.805 ms, 8.53 s total | |
[ 2023-10-08 01:56:14 ] Completed replacing temp checkpoint with checkpoint 63.080 ms, 8.60 s total | |
[ 2023-10-08 01:56:14 ] Completed Epoch: 13 batch 700: moving batch data to device 4.526 ms, 8.60 s total | |
[ 2023-10-08 01:56:14 ] Completed Epoch: 13 batch 700: forward pass 107.501 ms, 8.71 s total | |
[ 2023-10-08 01:56:14 ] Completed Epoch: 13 batch 700: backward pass 79.823 ms, 8.79 s total | |
[ 2023-10-08 01:56:14 ] Completed Epoch: 13 batch 700: computing loss 130.156 ms, 8.92 s total | |
EPOCH: [13], BATCH: [700/889], loss: 0.380, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 700 | |
[ 2023-10-08 01:56:15 ] Completed saving temp checkpoint 1,366.562 ms, 10.29 s total | |
[ 2023-10-08 01:56:16 ] Completed replacing temp checkpoint with checkpoint 89.388 ms, 10.38 s total | |
[ 2023-10-08 01:56:16 ] Completed Epoch: 13 batch 701: moving batch data to device 8.327 ms, 10.38 s total | |
[ 2023-10-08 01:56:16 ] Completed Epoch: 13 batch 701: forward pass 107.247 ms, 10.49 s total | |
[ 2023-10-08 01:56:16 ] Completed Epoch: 13 batch 701: backward pass 44.721 ms, 10.54 s total | |
[ 2023-10-08 01:56:16 ] Completed Epoch: 13 batch 701: computing loss 168.888 ms, 10.70 s total | |
EPOCH: [13], BATCH: [701/889], loss: 0.351, loss_box_reg: 0.107, loss_classifier: 0.087, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 701 | |
[ 2023-10-08 01:56:17 ] Completed saving temp checkpoint 1,253.711 ms, 11.96 s total | |
[ 2023-10-08 01:56:17 ] Completed replacing temp checkpoint with checkpoint 86.242 ms, 12.04 s total | |
[ 2023-10-08 01:56:17 ] Completed Epoch: 13 batch 702: moving batch data to device 13.674 ms, 12.06 s total | |
[ 2023-10-08 01:56:17 ] Completed Epoch: 13 batch 702: forward pass 104.892 ms, 12.16 s total | |
[ 2023-10-08 01:56:17 ] Completed Epoch: 13 batch 702: backward pass 79.969 ms, 12.24 s total | |
[ 2023-10-08 01:56:17 ] Completed Epoch: 13 batch 702: computing loss 114.798 ms, 12.36 s total | |
EPOCH: [13], BATCH: [702/889], loss: 0.386, loss_box_reg: 0.108, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 702 | |
[ 2023-10-08 01:56:19 ] Completed saving temp checkpoint 1,793.261 ms, 14.15 s total | |
[ 2023-10-08 01:56:19 ] Completed replacing temp checkpoint with checkpoint 83.514 ms, 14.23 s total | |
[ 2023-10-08 01:56:19 ] Completed Epoch: 13 batch 703: moving batch data to device 2.646 ms, 14.24 s total | |
[ 2023-10-08 01:56:19 ] Completed Epoch: 13 batch 703: forward pass 107.618 ms, 14.34 s total | |
[ 2023-10-08 01:56:20 ] Completed Epoch: 13 batch 703: backward pass 69.917 ms, 14.41 s total | |
[ 2023-10-08 01:56:20 ] Completed Epoch: 13 batch 703: computing loss 129.319 ms, 14.54 s total | |
EPOCH: [13], BATCH: [703/889], loss: 0.405, loss_box_reg: 0.122, loss_classifier: 0.104, loss_mask: 0.129, loss_objectness: 0.018, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 703 | |
[ 2023-10-08 01:56:21 ] Completed saving temp checkpoint 1,582.468 ms, 16.13 s total | |
[ 2023-10-08 01:56:21 ] Completed replacing temp checkpoint with checkpoint 95.820 ms, 16.22 s total | |
[ 2023-10-08 01:56:21 ] Completed Epoch: 13 batch 704: moving batch data to device 8.976 ms, 16.23 s total | |
[ 2023-10-08 01:56:21 ] Completed Epoch: 13 batch 704: forward pass 105.875 ms, 16.34 s total | |
[ 2023-10-08 01:56:22 ] Completed Epoch: 13 batch 704: backward pass 47.861 ms, 16.38 s total | |
[ 2023-10-08 01:56:22 ] Completed Epoch: 13 batch 704: computing loss 194.741 ms, 16.58 s total | |
EPOCH: [13], BATCH: [704/889], loss: 0.375, loss_box_reg: 0.115, loss_classifier: 0.094, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 704 | |
[ 2023-10-08 01:56:23 ] Completed saving temp checkpoint 1,493.570 ms, 18.07 s total | |
[ 2023-10-08 01:56:23 ] Completed replacing temp checkpoint with checkpoint 93.600 ms, 18.17 s total | |
[ 2023-10-08 01:56:23 ] Completed Epoch: 13 batch 705: moving batch data to device 9.541 ms, 18.18 s total | |
[ 2023-10-08 01:56:23 ] Completed Epoch: 13 batch 705: forward pass 108.814 ms, 18.29 s total | |
[ 2023-10-08 01:56:24 ] Completed Epoch: 13 batch 705: backward pass 79.995 ms, 18.37 s total | |
[ 2023-10-08 01:56:24 ] Completed Epoch: 13 batch 705: computing loss 130.809 ms, 18.50 s total | |
EPOCH: [13], BATCH: [705/889], loss: 0.374, loss_box_reg: 0.108, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 705 | |
[ 2023-10-08 01:56:25 ] Completed saving temp checkpoint 1,073.757 ms, 19.57 s total | |
[ 2023-10-08 01:56:25 ] Completed replacing temp checkpoint with checkpoint 59.390 ms, 19.63 s total | |
[ 2023-10-08 01:56:25 ] Completed Epoch: 13 batch 706: moving batch data to device 6.408 ms, 19.64 s total | |
[ 2023-10-08 01:56:25 ] Completed Epoch: 13 batch 706: forward pass 109.973 ms, 19.75 s total | |
[ 2023-10-08 01:56:25 ] Completed Epoch: 13 batch 706: backward pass 79.279 ms, 19.82 s total | |
[ 2023-10-08 01:56:25 ] Completed Epoch: 13 batch 706: computing loss 118.458 ms, 19.94 s total | |
EPOCH: [13], BATCH: [706/889], loss: 0.366, loss_box_reg: 0.111, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.019 | |
Saving checkpoint at epoch 13 train batch 706 | |
[ 2023-10-08 01:56:26 ] Completed saving temp checkpoint 1,119.954 ms, 21.06 s total | |
[ 2023-10-08 01:56:26 ] Completed replacing temp checkpoint with checkpoint 78.490 ms, 21.14 s total | |
[ 2023-10-08 01:56:26 ] Completed Epoch: 13 batch 707: moving batch data to device 7.441 ms, 21.15 s total | |
[ 2023-10-08 01:56:26 ] Completed Epoch: 13 batch 707: forward pass 107.054 ms, 21.26 s total | |
[ 2023-10-08 01:56:26 ] Completed Epoch: 13 batch 707: backward pass 42.669 ms, 21.30 s total | |
[ 2023-10-08 01:56:27 ] Completed Epoch: 13 batch 707: computing loss 209.375 ms, 21.51 s total | |
EPOCH: [13], BATCH: [707/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 707 | |
[ 2023-10-08 01:56:28 ] Completed saving temp checkpoint 980.745 ms, 22.49 s total | |
[ 2023-10-08 01:56:28 ] Completed replacing temp checkpoint with checkpoint 57.256 ms, 22.55 s total | |
[ 2023-10-08 01:56:28 ] Completed Epoch: 13 batch 708: moving batch data to device 8.483 ms, 22.55 s total | |
[ 2023-10-08 01:56:28 ] Completed Epoch: 13 batch 708: forward pass 106.493 ms, 22.66 s total | |
[ 2023-10-08 01:56:28 ] Completed Epoch: 13 batch 708: backward pass 50.786 ms, 22.71 s total | |
[ 2023-10-08 01:56:28 ] Completed Epoch: 13 batch 708: computing loss 141.346 ms, 22.85 s total | |
EPOCH: [13], BATCH: [708/889], loss: 0.375, loss_box_reg: 0.116, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 708 | |
[ 2023-10-08 01:56:29 ] Completed saving temp checkpoint 1,077.884 ms, 23.93 s total | |
[ 2023-10-08 01:56:29 ] Completed replacing temp checkpoint with checkpoint 76.120 ms, 24.01 s total | |
[ 2023-10-08 01:56:29 ] Completed Epoch: 13 batch 709: moving batch data to device 8.650 ms, 24.02 s total | |
[ 2023-10-08 01:56:29 ] Completed Epoch: 13 batch 709: forward pass 106.965 ms, 24.12 s total | |
[ 2023-10-08 01:56:29 ] Completed Epoch: 13 batch 709: backward pass 75.958 ms, 24.20 s total | |
[ 2023-10-08 01:56:29 ] Completed Epoch: 13 batch 709: computing loss 120.747 ms, 24.32 s total | |
EPOCH: [13], BATCH: [709/889], loss: 0.390, loss_box_reg: 0.120, loss_classifier: 0.092, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 709 | |
[ 2023-10-08 01:56:31 ] Completed saving temp checkpoint 1,060.664 ms, 25.38 s total | |
[ 2023-10-08 01:56:31 ] Completed replacing temp checkpoint with checkpoint 46.704 ms, 25.43 s total | |
[ 2023-10-08 01:56:31 ] Completed Epoch: 13 batch 710: moving batch data to device 8.214 ms, 25.44 s total | |
[ 2023-10-08 01:56:31 ] Completed Epoch: 13 batch 710: forward pass 106.622 ms, 25.54 s total | |
[ 2023-10-08 01:56:31 ] Completed Epoch: 13 batch 710: backward pass 77.176 ms, 25.62 s total | |
[ 2023-10-08 01:56:31 ] Completed Epoch: 13 batch 710: computing loss 114.604 ms, 25.73 s total | |
EPOCH: [13], BATCH: [710/889], loss: 0.420, loss_box_reg: 0.130, loss_classifier: 0.106, loss_mask: 0.142, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 710 | |
[ 2023-10-08 01:56:32 ] Completed saving temp checkpoint 1,171.822 ms, 26.91 s total | |
[ 2023-10-08 01:56:32 ] Completed replacing temp checkpoint with checkpoint 64.597 ms, 26.97 s total | |
[ 2023-10-08 01:56:32 ] Completed Epoch: 13 batch 711: moving batch data to device 4.608 ms, 26.97 s total | |
[ 2023-10-08 01:56:32 ] Completed Epoch: 13 batch 711: forward pass 108.088 ms, 27.08 s total | |
[ 2023-10-08 01:56:32 ] Completed Epoch: 13 batch 711: backward pass 43.021 ms, 27.13 s total | |
[ 2023-10-08 01:56:32 ] Completed Epoch: 13 batch 711: computing loss 155.182 ms, 27.28 s total | |
EPOCH: [13], BATCH: [711/889], loss: 0.373, loss_box_reg: 0.114, loss_classifier: 0.097, loss_mask: 0.128, loss_objectness: 0.012, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 711 | |
[ 2023-10-08 01:56:33 ] Completed saving temp checkpoint 1,049.312 ms, 28.33 s total | |
[ 2023-10-08 01:56:34 ] Completed replacing temp checkpoint with checkpoint 73.100 ms, 28.40 s total | |
[ 2023-10-08 01:56:34 ] Completed Epoch: 13 batch 712: moving batch data to device 6.781 ms, 28.41 s total | |
[ 2023-10-08 01:56:34 ] Completed Epoch: 13 batch 712: forward pass 107.216 ms, 28.52 s total | |
[ 2023-10-08 01:56:34 ] Completed Epoch: 13 batch 712: backward pass 69.358 ms, 28.59 s total | |
[ 2023-10-08 01:56:34 ] Completed Epoch: 13 batch 712: computing loss 304.600 ms, 28.89 s total | |
EPOCH: [13], BATCH: [712/889], loss: 0.403, loss_box_reg: 0.125, loss_classifier: 0.099, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 712 | |
[ 2023-10-08 01:56:35 ] Completed saving temp checkpoint 1,144.952 ms, 30.04 s total | |
[ 2023-10-08 01:56:35 ] Completed replacing temp checkpoint with checkpoint 56.406 ms, 30.09 s total | |
[ 2023-10-08 01:56:35 ] Completed Epoch: 13 batch 713: moving batch data to device 4.452 ms, 30.10 s total | |
[ 2023-10-08 01:56:35 ] Completed Epoch: 13 batch 713: forward pass 105.055 ms, 30.20 s total | |
[ 2023-10-08 01:56:35 ] Completed Epoch: 13 batch 713: backward pass 69.900 ms, 30.27 s total | |
[ 2023-10-08 01:56:36 ] Completed Epoch: 13 batch 713: computing loss 127.278 ms, 30.40 s total | |
EPOCH: [13], BATCH: [713/889], loss: 0.341, loss_box_reg: 0.100, loss_classifier: 0.081, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 713 | |
[ 2023-10-08 01:56:37 ] Completed saving temp checkpoint 1,167.554 ms, 31.57 s total | |
[ 2023-10-08 01:56:37 ] Completed replacing temp checkpoint with checkpoint 66.526 ms, 31.63 s total | |
[ 2023-10-08 01:56:37 ] Completed Epoch: 13 batch 714: moving batch data to device 7.025 ms, 31.64 s total | |
[ 2023-10-08 01:56:37 ] Completed Epoch: 13 batch 714: forward pass 103.546 ms, 31.74 s total | |
[ 2023-10-08 01:56:37 ] Completed Epoch: 13 batch 714: backward pass 34.969 ms, 31.78 s total | |
[ 2023-10-08 01:56:37 ] Completed Epoch: 13 batch 714: computing loss 135.131 ms, 31.91 s total | |
EPOCH: [13], BATCH: [714/889], loss: 0.372, loss_box_reg: 0.118, loss_classifier: 0.092, loss_mask: 0.124, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 714 | |
[ 2023-10-08 01:56:39 ] Completed saving temp checkpoint 1,806.228 ms, 33.72 s total | |
[ 2023-10-08 01:56:39 ] Completed replacing temp checkpoint with checkpoint 80.123 ms, 33.80 s total | |
[ 2023-10-08 01:56:39 ] Completed Epoch: 13 batch 715: moving batch data to device 8.613 ms, 33.81 s total | |
[ 2023-10-08 01:56:39 ] Completed Epoch: 13 batch 715: forward pass 105.220 ms, 33.91 s total | |
[ 2023-10-08 01:56:39 ] Completed Epoch: 13 batch 715: backward pass 42.116 ms, 33.96 s total | |
[ 2023-10-08 01:56:39 ] Completed Epoch: 13 batch 715: computing loss 146.858 ms, 34.10 s total | |
EPOCH: [13], BATCH: [715/889], loss: 0.346, loss_box_reg: 0.105, loss_classifier: 0.089, loss_mask: 0.117, loss_objectness: 0.013, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 715 | |
[ 2023-10-08 01:56:41 ] Completed saving temp checkpoint 1,519.652 ms, 35.62 s total | |
[ 2023-10-08 01:56:41 ] Completed replacing temp checkpoint with checkpoint 68.269 ms, 35.69 s total | |
[ 2023-10-08 01:56:41 ] Completed Epoch: 13 batch 716: moving batch data to device 7.116 ms, 35.70 s total | |
[ 2023-10-08 01:56:41 ] Completed Epoch: 13 batch 716: forward pass 106.004 ms, 35.80 s total | |
[ 2023-10-08 01:56:41 ] Completed Epoch: 13 batch 716: backward pass 76.106 ms, 35.88 s total | |
[ 2023-10-08 01:56:41 ] Completed Epoch: 13 batch 716: computing loss 111.928 ms, 35.99 s total | |
EPOCH: [13], BATCH: [716/889], loss: 0.362, loss_box_reg: 0.104, loss_classifier: 0.098, loss_mask: 0.120, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 716 | |
[ 2023-10-08 01:56:42 ] Completed saving temp checkpoint 1,241.774 ms, 37.23 s total | |
[ 2023-10-08 01:56:42 ] Completed replacing temp checkpoint with checkpoint 103.407 ms, 37.34 s total | |
[ 2023-10-08 01:56:42 ] Completed Epoch: 13 batch 717: moving batch data to device 7.517 ms, 37.35 s total | |
[ 2023-10-08 01:56:43 ] Completed Epoch: 13 batch 717: forward pass 108.986 ms, 37.45 s total | |
[ 2023-10-08 01:56:43 ] Completed Epoch: 13 batch 717: backward pass 44.657 ms, 37.50 s total | |
[ 2023-10-08 01:56:43 ] Completed Epoch: 13 batch 717: computing loss 150.311 ms, 37.65 s total | |
EPOCH: [13], BATCH: [717/889], loss: 0.371, loss_box_reg: 0.114, loss_classifier: 0.095, loss_mask: 0.123, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 717 | |
[ 2023-10-08 01:56:44 ] Completed saving temp checkpoint 1,026.821 ms, 38.68 s total | |
[ 2023-10-08 01:56:44 ] Completed replacing temp checkpoint with checkpoint 66.710 ms, 38.74 s total | |
[ 2023-10-08 01:56:44 ] Completed Epoch: 13 batch 718: moving batch data to device 7.947 ms, 38.75 s total | |
[ 2023-10-08 01:56:44 ] Completed Epoch: 13 batch 718: forward pass 111.484 ms, 38.86 s total | |
[ 2023-10-08 01:56:44 ] Completed Epoch: 13 batch 718: backward pass 72.845 ms, 38.93 s total | |
[ 2023-10-08 01:56:44 ] Completed Epoch: 13 batch 718: computing loss 122.257 ms, 39.06 s total | |
EPOCH: [13], BATCH: [718/889], loss: 0.428, loss_box_reg: 0.131, loss_classifier: 0.112, loss_mask: 0.133, loss_objectness: 0.019, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 718 | |
[ 2023-10-08 01:56:45 ] Completed saving temp checkpoint 1,077.112 ms, 40.13 s total | |
[ 2023-10-08 01:56:45 ] Completed replacing temp checkpoint with checkpoint 87.054 ms, 40.22 s total | |
[ 2023-10-08 01:56:45 ] Completed Epoch: 13 batch 719: moving batch data to device 7.410 ms, 40.23 s total | |
[ 2023-10-08 01:56:45 ] Completed Epoch: 13 batch 719: forward pass 108.744 ms, 40.34 s total | |
[ 2023-10-08 01:56:46 ] Completed Epoch: 13 batch 719: backward pass 72.093 ms, 40.41 s total | |
[ 2023-10-08 01:56:46 ] Completed Epoch: 13 batch 719: computing loss 123.600 ms, 40.53 s total | |
EPOCH: [13], BATCH: [719/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 719 | |
[ 2023-10-08 01:56:47 ] Completed saving temp checkpoint 1,011.889 ms, 41.54 s total | |
[ 2023-10-08 01:56:47 ] Completed replacing temp checkpoint with checkpoint 53.091 ms, 41.60 s total | |
[ 2023-10-08 01:56:47 ] Completed Epoch: 13 batch 720: moving batch data to device 7.950 ms, 41.61 s total | |
[ 2023-10-08 01:56:47 ] Completed Epoch: 13 batch 720: forward pass 101.506 ms, 41.71 s total | |
[ 2023-10-08 01:56:47 ] Completed Epoch: 13 batch 720: backward pass 66.120 ms, 41.77 s total | |
[ 2023-10-08 01:56:47 ] Completed Epoch: 13 batch 720: computing loss 222.902 ms, 42.00 s total | |
EPOCH: [13], BATCH: [720/889], loss: 0.368, loss_box_reg: 0.105, loss_classifier: 0.092, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 720 | |
[ 2023-10-08 01:56:48 ] Completed saving temp checkpoint 1,100.939 ms, 43.10 s total | |
[ 2023-10-08 01:56:48 ] Completed replacing temp checkpoint with checkpoint 78.717 ms, 43.18 s total | |
[ 2023-10-08 01:56:48 ] Completed Epoch: 13 batch 721: moving batch data to device 10.851 ms, 43.19 s total | |
[ 2023-10-08 01:56:48 ] Completed Epoch: 13 batch 721: forward pass 103.992 ms, 43.29 s total | |
[ 2023-10-08 01:56:49 ] Completed Epoch: 13 batch 721: backward pass 72.754 ms, 43.36 s total | |
[ 2023-10-08 01:56:49 ] Completed Epoch: 13 batch 721: computing loss 123.795 ms, 43.49 s total | |
EPOCH: [13], BATCH: [721/889], loss: 0.374, loss_box_reg: 0.110, loss_classifier: 0.093, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 721 | |
[ 2023-10-08 01:56:50 ] Completed saving temp checkpoint 1,393.385 ms, 44.88 s total | |
[ 2023-10-08 01:56:50 ] Completed replacing temp checkpoint with checkpoint 76.370 ms, 44.96 s total | |
[ 2023-10-08 01:56:50 ] Completed Epoch: 13 batch 722: moving batch data to device 6.245 ms, 44.96 s total | |
[ 2023-10-08 01:56:50 ] Completed Epoch: 13 batch 722: forward pass 105.886 ms, 45.07 s total | |
[ 2023-10-08 01:56:50 ] Completed Epoch: 13 batch 722: backward pass 80.388 ms, 45.15 s total | |
[ 2023-10-08 01:56:50 ] Completed Epoch: 13 batch 722: computing loss 90.171 ms, 45.24 s total | |
EPOCH: [13], BATCH: [722/889], loss: 0.372, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 722 | |
[ 2023-10-08 01:56:52 ] Completed saving temp checkpoint 1,972.839 ms, 47.21 s total | |
[ 2023-10-08 01:56:52 ] Completed replacing temp checkpoint with checkpoint 92.623 ms, 47.31 s total | |
[ 2023-10-08 01:56:52 ] Completed Epoch: 13 batch 723: moving batch data to device 6.275 ms, 47.31 s total | |
[ 2023-10-08 01:56:53 ] Completed Epoch: 13 batch 723: forward pass 104.387 ms, 47.42 s total | |
[ 2023-10-08 01:56:53 ] Completed Epoch: 13 batch 723: backward pass 82.491 ms, 47.50 s total | |
[ 2023-10-08 01:56:53 ] Completed Epoch: 13 batch 723: computing loss 84.799 ms, 47.58 s total | |
EPOCH: [13], BATCH: [723/889], loss: 0.347, loss_box_reg: 0.107, loss_classifier: 0.088, loss_mask: 0.121, loss_objectness: 0.011, loss_rpn_box_reg: 0.018 | |
Saving checkpoint at epoch 13 train batch 723 | |
[ 2023-10-08 01:56:54 ] Completed saving temp checkpoint 1,421.752 ms, 49.01 s total | |
[ 2023-10-08 01:56:54 ] Completed replacing temp checkpoint with checkpoint 64.960 ms, 49.07 s total | |
[ 2023-10-08 01:56:54 ] Completed Epoch: 13 batch 724: moving batch data to device 4.719 ms, 49.07 s total | |
[ 2023-10-08 01:56:54 ] Completed Epoch: 13 batch 724: forward pass 102.074 ms, 49.18 s total | |
[ 2023-10-08 01:56:54 ] Completed Epoch: 13 batch 724: backward pass 68.540 ms, 49.25 s total | |
[ 2023-10-08 01:56:55 ] Completed Epoch: 13 batch 724: computing loss 327.864 ms, 49.57 s total | |
EPOCH: [13], BATCH: [724/889], loss: 0.396, loss_box_reg: 0.119, loss_classifier: 0.094, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 724 | |
[ 2023-10-08 01:56:56 ] Completed saving temp checkpoint 1,108.011 ms, 50.68 s total | |
[ 2023-10-08 01:56:56 ] Completed replacing temp checkpoint with checkpoint 62.951 ms, 50.74 s total | |
[ 2023-10-08 01:56:56 ] Completed Epoch: 13 batch 725: moving batch data to device 5.875 ms, 50.75 s total | |
[ 2023-10-08 01:56:56 ] Completed Epoch: 13 batch 725: forward pass 104.483 ms, 50.85 s total | |
[ 2023-10-08 01:56:56 ] Completed Epoch: 13 batch 725: backward pass 72.883 ms, 50.93 s total | |
[ 2023-10-08 01:56:56 ] Completed Epoch: 13 batch 725: computing loss 121.549 ms, 51.05 s total | |
EPOCH: [13], BATCH: [725/889], loss: 0.431, loss_box_reg: 0.131, loss_classifier: 0.110, loss_mask: 0.139, loss_objectness: 0.018, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 725 | |
[ 2023-10-08 01:56:57 ] Completed saving temp checkpoint 949.175 ms, 52.00 s total | |
[ 2023-10-08 01:56:57 ] Completed replacing temp checkpoint with checkpoint 44.939 ms, 52.04 s total | |
[ 2023-10-08 01:56:57 ] Completed Epoch: 13 batch 726: moving batch data to device 5.582 ms, 52.05 s total | |
[ 2023-10-08 01:56:57 ] Completed Epoch: 13 batch 726: forward pass 104.306 ms, 52.15 s total | |
[ 2023-10-08 01:56:57 ] Completed Epoch: 13 batch 726: backward pass 73.143 ms, 52.23 s total | |
[ 2023-10-08 01:56:57 ] Completed Epoch: 13 batch 726: computing loss 116.470 ms, 52.34 s total | |
EPOCH: [13], BATCH: [726/889], loss: 0.409, loss_box_reg: 0.130, loss_classifier: 0.105, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 726 | |
[ 2023-10-08 01:56:59 ] Completed saving temp checkpoint 1,427.820 ms, 53.77 s total | |
[ 2023-10-08 01:56:59 ] Completed replacing temp checkpoint with checkpoint 75.814 ms, 53.85 s total | |
[ 2023-10-08 01:56:59 ] Completed Epoch: 13 batch 727: moving batch data to device 5.159 ms, 53.85 s total | |
[ 2023-10-08 01:56:59 ] Completed Epoch: 13 batch 727: forward pass 105.025 ms, 53.96 s total | |
[ 2023-10-08 01:56:59 ] Completed Epoch: 13 batch 727: backward pass 42.042 ms, 54.00 s total | |
[ 2023-10-08 01:56:59 ] Completed Epoch: 13 batch 727: computing loss 149.142 ms, 54.15 s total | |
EPOCH: [13], BATCH: [727/889], loss: 0.349, loss_box_reg: 0.103, loss_classifier: 0.086, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 727 | |
[ 2023-10-08 01:57:00 ] Completed saving temp checkpoint 1,065.151 ms, 55.21 s total | |
[ 2023-10-08 01:57:00 ] Completed replacing temp checkpoint with checkpoint 59.142 ms, 55.27 s total | |
[ 2023-10-08 01:57:00 ] Completed Epoch: 13 batch 728: moving batch data to device 4.628 ms, 55.28 s total | |
[ 2023-10-08 01:57:01 ] Completed Epoch: 13 batch 728: forward pass 108.572 ms, 55.39 s total | |
[ 2023-10-08 01:57:01 ] Completed Epoch: 13 batch 728: backward pass 77.236 ms, 55.46 s total | |
[ 2023-10-08 01:57:01 ] Completed Epoch: 13 batch 728: computing loss 95.404 ms, 55.56 s total | |
EPOCH: [13], BATCH: [728/889], loss: 0.390, loss_box_reg: 0.114, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 728 | |
[ 2023-10-08 01:57:02 ] Completed saving temp checkpoint 1,260.546 ms, 56.82 s total | |
[ 2023-10-08 01:57:02 ] Completed replacing temp checkpoint with checkpoint 80.611 ms, 56.90 s total | |
[ 2023-10-08 01:57:02 ] Completed Epoch: 13 batch 729: moving batch data to device 7.019 ms, 56.91 s total | |
[ 2023-10-08 01:57:02 ] Completed Epoch: 13 batch 729: forward pass 105.489 ms, 57.01 s total | |
[ 2023-10-08 01:57:02 ] Completed Epoch: 13 batch 729: backward pass 73.379 ms, 57.08 s total | |
[ 2023-10-08 01:57:02 ] Completed Epoch: 13 batch 729: computing loss 95.125 ms, 57.18 s total | |
EPOCH: [13], BATCH: [729/889], loss: 0.392, loss_box_reg: 0.116, loss_classifier: 0.093, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.036 | |
Saving checkpoint at epoch 13 train batch 729 | |
[ 2023-10-08 01:57:03 ] Completed saving temp checkpoint 953.255 ms, 58.13 s total | |
[ 2023-10-08 01:57:03 ] Completed replacing temp checkpoint with checkpoint 70.097 ms, 58.20 s total | |
[ 2023-10-08 01:57:03 ] Completed Epoch: 13 batch 730: moving batch data to device 9.129 ms, 58.21 s total | |
[ 2023-10-08 01:57:03 ] Completed Epoch: 13 batch 730: forward pass 106.613 ms, 58.32 s total | |
[ 2023-10-08 01:57:04 ] Completed Epoch: 13 batch 730: backward pass 76.437 ms, 58.40 s total | |
[ 2023-10-08 01:57:04 ] Completed Epoch: 13 batch 730: computing loss 116.800 ms, 58.51 s total | |
EPOCH: [13], BATCH: [730/889], loss: 0.388, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 730 | |
[ 2023-10-08 01:57:05 ] Completed saving temp checkpoint 1,123.110 ms, 59.64 s total | |
[ 2023-10-08 01:57:05 ] Completed replacing temp checkpoint with checkpoint 65.979 ms, 59.70 s total | |
[ 2023-10-08 01:57:05 ] Completed Epoch: 13 batch 731: moving batch data to device 7.638 ms, 59.71 s total | |
[ 2023-10-08 01:57:05 ] Completed Epoch: 13 batch 731: forward pass 106.618 ms, 59.82 s total | |
[ 2023-10-08 01:57:05 ] Completed Epoch: 13 batch 731: backward pass 73.572 ms, 59.89 s total | |
[ 2023-10-08 01:57:05 ] Completed Epoch: 13 batch 731: computing loss 116.736 ms, 60.01 s total | |
EPOCH: [13], BATCH: [731/889], loss: 0.385, loss_box_reg: 0.115, loss_classifier: 0.096, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 731 | |
[ 2023-10-08 01:57:06 ] Completed saving temp checkpoint 967.538 ms, 60.97 s total | |
[ 2023-10-08 01:57:06 ] Completed replacing temp checkpoint with checkpoint 71.989 ms, 61.05 s total | |
[ 2023-10-08 01:57:06 ] Completed Epoch: 13 batch 732: moving batch data to device 6.385 ms, 61.05 s total | |
[ 2023-10-08 01:57:06 ] Completed Epoch: 13 batch 732: forward pass 106.508 ms, 61.16 s total | |
[ 2023-10-08 01:57:06 ] Completed Epoch: 13 batch 732: backward pass 36.381 ms, 61.19 s total | |
[ 2023-10-08 01:57:06 ] Completed Epoch: 13 batch 732: computing loss 162.416 ms, 61.36 s total | |
EPOCH: [13], BATCH: [732/889], loss: 0.396, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 732 | |
[ 2023-10-08 01:57:08 ] Completed saving temp checkpoint 1,226.487 ms, 62.58 s total | |
[ 2023-10-08 01:57:08 ] Completed replacing temp checkpoint with checkpoint 39.046 ms, 62.62 s total | |
[ 2023-10-08 01:57:08 ] Completed Epoch: 13 batch 733: moving batch data to device 5.045 ms, 62.63 s total | |
[ 2023-10-08 01:57:08 ] Completed Epoch: 13 batch 733: forward pass 104.037 ms, 62.73 s total | |
[ 2023-10-08 01:57:08 ] Completed Epoch: 13 batch 733: backward pass 65.985 ms, 62.80 s total | |
[ 2023-10-08 01:57:08 ] Completed Epoch: 13 batch 733: computing loss 104.399 ms, 62.90 s total | |
EPOCH: [13], BATCH: [733/889], loss: 0.357, loss_box_reg: 0.105, loss_classifier: 0.090, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 733 | |
[ 2023-10-08 01:57:10 ] Completed saving temp checkpoint 1,685.018 ms, 64.59 s total | |
[ 2023-10-08 01:57:10 ] Completed replacing temp checkpoint with checkpoint 64.342 ms, 64.65 s total | |
[ 2023-10-08 01:57:10 ] Completed Epoch: 13 batch 734: moving batch data to device 7.246 ms, 64.66 s total | |
[ 2023-10-08 01:57:10 ] Completed Epoch: 13 batch 734: forward pass 106.073 ms, 64.76 s total | |
[ 2023-10-08 01:57:10 ] Completed Epoch: 13 batch 734: backward pass 83.349 ms, 64.85 s total | |
[ 2023-10-08 01:57:10 ] Completed Epoch: 13 batch 734: computing loss 115.051 ms, 64.96 s total | |
EPOCH: [13], BATCH: [734/889], loss: 0.399, loss_box_reg: 0.118, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 734 | |
[ 2023-10-08 01:57:12 ] Completed saving temp checkpoint 2,007.961 ms, 66.97 s total | |
[ 2023-10-08 01:57:12 ] Completed replacing temp checkpoint with checkpoint 76.782 ms, 67.05 s total | |
[ 2023-10-08 01:57:12 ] Completed Epoch: 13 batch 735: moving batch data to device 6.620 ms, 67.05 s total | |
[ 2023-10-08 01:57:12 ] Completed Epoch: 13 batch 735: forward pass 99.990 ms, 67.15 s total | |
[ 2023-10-08 01:57:12 ] Completed Epoch: 13 batch 735: backward pass 70.176 ms, 67.22 s total | |
[ 2023-10-08 01:57:12 ] Completed Epoch: 13 batch 735: computing loss 128.016 ms, 67.35 s total | |
EPOCH: [13], BATCH: [735/889], loss: 0.380, loss_box_reg: 0.122, loss_classifier: 0.091, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 735 | |
[ 2023-10-08 01:57:14 ] Completed saving temp checkpoint 1,195.007 ms, 68.55 s total | |
[ 2023-10-08 01:57:14 ] Completed replacing temp checkpoint with checkpoint 53.772 ms, 68.60 s total | |
[ 2023-10-08 01:57:14 ] Completed Epoch: 13 batch 736: moving batch data to device 8.065 ms, 68.61 s total | |
[ 2023-10-08 01:57:14 ] Completed Epoch: 13 batch 736: forward pass 105.259 ms, 68.71 s total | |
[ 2023-10-08 01:57:14 ] Completed Epoch: 13 batch 736: backward pass 83.512 ms, 68.80 s total | |
[ 2023-10-08 01:57:14 ] Completed Epoch: 13 batch 736: computing loss 88.254 ms, 68.89 s total | |
EPOCH: [13], BATCH: [736/889], loss: 0.390, loss_box_reg: 0.120, loss_classifier: 0.098, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 736 | |
[ 2023-10-08 01:57:15 ] Completed saving temp checkpoint 1,035.521 ms, 69.92 s total | |
[ 2023-10-08 01:57:15 ] Completed replacing temp checkpoint with checkpoint 51.668 ms, 69.97 s total | |
[ 2023-10-08 01:57:15 ] Completed Epoch: 13 batch 737: moving batch data to device 4.510 ms, 69.98 s total | |
[ 2023-10-08 01:57:15 ] Completed Epoch: 13 batch 737: forward pass 99.815 ms, 70.08 s total | |
[ 2023-10-08 01:57:15 ] Completed Epoch: 13 batch 737: backward pass 69.466 ms, 70.15 s total | |
[ 2023-10-08 01:57:15 ] Completed Epoch: 13 batch 737: computing loss 98.629 ms, 70.25 s total | |
EPOCH: [13], BATCH: [737/889], loss: 0.368, loss_box_reg: 0.115, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.012, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 737 | |
[ 2023-10-08 01:57:16 ] Completed saving temp checkpoint 934.065 ms, 71.18 s total | |
[ 2023-10-08 01:57:16 ] Completed replacing temp checkpoint with checkpoint 48.182 ms, 71.23 s total | |
[ 2023-10-08 01:57:16 ] Completed Epoch: 13 batch 738: moving batch data to device 5.838 ms, 71.23 s total | |
[ 2023-10-08 01:57:16 ] Completed Epoch: 13 batch 738: forward pass 103.776 ms, 71.34 s total | |
[ 2023-10-08 01:57:17 ] Completed Epoch: 13 batch 738: backward pass 68.464 ms, 71.41 s total | |
[ 2023-10-08 01:57:17 ] Completed Epoch: 13 batch 738: computing loss 115.601 ms, 71.52 s total | |
EPOCH: [13], BATCH: [738/889], loss: 0.361, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 738 | |
[ 2023-10-08 01:57:18 ] Completed saving temp checkpoint 1,074.715 ms, 72.60 s total | |
[ 2023-10-08 01:57:18 ] Completed replacing temp checkpoint with checkpoint 63.398 ms, 72.66 s total | |
[ 2023-10-08 01:57:18 ] Completed Epoch: 13 batch 739: moving batch data to device 7.530 ms, 72.67 s total | |
[ 2023-10-08 01:57:18 ] Completed Epoch: 13 batch 739: forward pass 102.255 ms, 72.77 s total | |
[ 2023-10-08 01:57:18 ] Completed Epoch: 13 batch 739: backward pass 51.637 ms, 72.82 s total | |
[ 2023-10-08 01:57:18 ] Completed Epoch: 13 batch 739: computing loss 148.229 ms, 72.97 s total | |
EPOCH: [13], BATCH: [739/889], loss: 0.368, loss_box_reg: 0.106, loss_classifier: 0.089, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 739 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 02:10:34 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:10:34 ] Completed importing Timer 0.021 ms, 0.00 s total | |
[ 2023-10-08 02:10:35 ] Completed importing everything else 583.042 ms, 0.58 s total | |
[ 2023-10-08 02:10:35 ] Completed defined other functions 0.026 ms, 0.58 s total | |
| distributed init (rank 2): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 1): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 02:10:43 ] Completed main preliminaries 7,432.592 ms, 8.02 s total | |
loading annotations into memory... | |
Done (t=11.23s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-08 02:10:56 ] Completed loading data 13,053.319 ms, 21.07 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 02:10:56 ] Completed creating data samplers 100.610 ms, 21.17 s total | |
[ 2023-10-08 02:10:56 ] Completed creating data loaders 0.210 ms, 21.17 s total | |
[ 2023-10-08 02:10:56 ] Completed creating model and .to(device) 668.945 ms, 21.84 s total | |
[ 2023-10-08 02:10:58 ] Completed preparing model for distributed training 1,621.409 ms, 23.46 s total | |
[ 2023-10-08 02:10:58 ] Completed optimizer and scaler 0.603 ms, 23.46 s total | |
[ 2023-10-08 02:10:58 ] Completed learning rate schedulers 0.223 ms, 23.46 s total | |
[ 2023-10-08 02:10:59 ] Completed init coco evaluator 947.531 ms, 24.41 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 02:11:00 ] Completed retrieving checkpoint 861.206 ms, 25.27 s total | |
EPOCH :: 13 | |
[ 2023-10-08 02:11:00 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:11:00 ] Completed training preliminaries 0.873 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 739 | |
[ 2023-10-08 02:11:00 ] Completed Epoch: 13 batch 739: moving batch data to device 517.862 ms, 0.52 s total | |
[ 2023-10-08 02:11:01 ] Completed Epoch: 13 batch 739: forward pass 1,073.185 ms, 1.59 s total | |
[ 2023-10-08 02:11:02 ] Completed Epoch: 13 batch 739: backward pass 157.575 ms, 1.75 s total | |
[ 2023-10-08 02:11:02 ] Completed Epoch: 13 batch 739: computing loss 183.840 ms, 1.93 s total | |
EPOCH: [13], BATCH: [739/889], loss: 0.368, loss_box_reg: 0.108, loss_classifier: 0.088, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 739 | |
[ 2023-10-08 02:11:03 ] Completed saving temp checkpoint 1,048.269 ms, 2.98 s total | |
[ 2023-10-08 02:11:03 ] Completed replacing temp checkpoint with checkpoint 172.257 ms, 3.15 s total | |
[ 2023-10-08 02:11:03 ] Completed Epoch: 13 batch 740: moving batch data to device 6.275 ms, 3.16 s total | |
[ 2023-10-08 02:11:03 ] Completed Epoch: 13 batch 740: forward pass 108.404 ms, 3.27 s total | |
[ 2023-10-08 02:11:03 ] Completed Epoch: 13 batch 740: backward pass 92.023 ms, 3.36 s total | |
[ 2023-10-08 02:11:03 ] Completed Epoch: 13 batch 740: computing loss 137.039 ms, 3.50 s total | |
EPOCH: [13], BATCH: [740/889], loss: 0.361, loss_box_reg: 0.106, loss_classifier: 0.088, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 740 | |
[ 2023-10-08 02:11:05 ] Completed saving temp checkpoint 1,340.469 ms, 4.84 s total | |
[ 2023-10-08 02:11:05 ] Completed replacing temp checkpoint with checkpoint 74.974 ms, 4.91 s total | |
[ 2023-10-08 02:11:05 ] Completed Epoch: 13 batch 741: moving batch data to device 3.898 ms, 4.92 s total | |
[ 2023-10-08 02:11:05 ] Completed Epoch: 13 batch 741: forward pass 109.704 ms, 5.03 s total | |
[ 2023-10-08 02:11:05 ] Completed Epoch: 13 batch 741: backward pass 79.400 ms, 5.11 s total | |
[ 2023-10-08 02:11:05 ] Completed Epoch: 13 batch 741: computing loss 212.634 ms, 5.32 s total | |
EPOCH: [13], BATCH: [741/889], loss: 0.382, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 741 | |
[ 2023-10-08 02:11:06 ] Completed saving temp checkpoint 1,056.608 ms, 6.38 s total | |
[ 2023-10-08 02:11:06 ] Completed replacing temp checkpoint with checkpoint 75.945 ms, 6.45 s total | |
[ 2023-10-08 02:11:06 ] Completed Epoch: 13 batch 742: moving batch data to device 7.203 ms, 6.46 s total | |
[ 2023-10-08 02:11:06 ] Completed Epoch: 13 batch 742: forward pass 111.016 ms, 6.57 s total | |
[ 2023-10-08 02:11:06 ] Completed Epoch: 13 batch 742: backward pass 80.709 ms, 6.65 s total | |
[ 2023-10-08 02:11:07 ] Completed Epoch: 13 batch 742: computing loss 134.858 ms, 6.79 s total | |
EPOCH: [13], BATCH: [742/889], loss: 0.373, loss_box_reg: 0.114, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 742 | |
[ 2023-10-08 02:11:08 ] Completed saving temp checkpoint 1,101.089 ms, 7.89 s total | |
[ 2023-10-08 02:11:08 ] Completed replacing temp checkpoint with checkpoint 81.003 ms, 7.97 s total | |
[ 2023-10-08 02:11:08 ] Completed Epoch: 13 batch 743: moving batch data to device 39.215 ms, 8.01 s total | |
[ 2023-10-08 02:11:08 ] Completed Epoch: 13 batch 743: forward pass 113.162 ms, 8.12 s total | |
[ 2023-10-08 02:11:08 ] Completed Epoch: 13 batch 743: backward pass 79.536 ms, 8.20 s total | |
[ 2023-10-08 02:11:08 ] Completed Epoch: 13 batch 743: computing loss 205.348 ms, 8.40 s total | |
EPOCH: [13], BATCH: [743/889], loss: 0.355, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.125, loss_objectness: 0.011, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 743 | |
[ 2023-10-08 02:11:10 ] Completed saving temp checkpoint 1,653.818 ms, 10.06 s total | |
[ 2023-10-08 02:11:10 ] Completed replacing temp checkpoint with checkpoint 60.459 ms, 10.12 s total | |
[ 2023-10-08 02:11:10 ] Completed Epoch: 13 batch 744: moving batch data to device 2.432 ms, 10.12 s total | |
[ 2023-10-08 02:11:10 ] Completed Epoch: 13 batch 744: forward pass 180.409 ms, 10.30 s total | |
[ 2023-10-08 02:11:10 ] Completed Epoch: 13 batch 744: backward pass 70.928 ms, 10.37 s total | |
[ 2023-10-08 02:11:10 ] Completed Epoch: 13 batch 744: computing loss 231.364 ms, 10.60 s total | |
EPOCH: [13], BATCH: [744/889], loss: 0.409, loss_box_reg: 0.125, loss_classifier: 0.102, loss_mask: 0.139, loss_objectness: 0.014, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 744 | |
[ 2023-10-08 02:11:12 ] Completed saving temp checkpoint 1,266.988 ms, 11.87 s total | |
[ 2023-10-08 02:11:12 ] Completed replacing temp checkpoint with checkpoint 69.078 ms, 11.94 s total | |
[ 2023-10-08 02:11:12 ] Completed Epoch: 13 batch 745: moving batch data to device 5.022 ms, 11.94 s total | |
[ 2023-10-08 02:11:12 ] Completed Epoch: 13 batch 745: forward pass 105.329 ms, 12.05 s total | |
[ 2023-10-08 02:11:12 ] Completed Epoch: 13 batch 745: backward pass 73.584 ms, 12.12 s total | |
[ 2023-10-08 02:11:12 ] Completed Epoch: 13 batch 745: computing loss 129.046 ms, 12.25 s total | |
EPOCH: [13], BATCH: [745/889], loss: 0.403, loss_box_reg: 0.124, loss_classifier: 0.104, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 745 | |
[ 2023-10-08 02:11:13 ] Completed saving temp checkpoint 984.171 ms, 13.24 s total | |
[ 2023-10-08 02:11:13 ] Completed replacing temp checkpoint with checkpoint 67.019 ms, 13.30 s total | |
[ 2023-10-08 02:11:13 ] Completed Epoch: 13 batch 746: moving batch data to device 10.130 ms, 13.31 s total | |
[ 2023-10-08 02:11:13 ] Completed Epoch: 13 batch 746: forward pass 109.449 ms, 13.42 s total | |
[ 2023-10-08 02:11:13 ] Completed Epoch: 13 batch 746: backward pass 34.267 ms, 13.46 s total | |
[ 2023-10-08 02:11:13 ] Completed Epoch: 13 batch 746: computing loss 171.769 ms, 13.63 s total | |
EPOCH: [13], BATCH: [746/889], loss: 0.402, loss_box_reg: 0.123, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 746 | |
[ 2023-10-08 02:11:15 ] Completed saving temp checkpoint 1,160.410 ms, 14.79 s total | |
[ 2023-10-08 02:11:15 ] Completed replacing temp checkpoint with checkpoint 68.228 ms, 14.86 s total | |
[ 2023-10-08 02:11:15 ] Completed Epoch: 13 batch 747: moving batch data to device 5.502 ms, 14.86 s total | |
[ 2023-10-08 02:11:15 ] Completed Epoch: 13 batch 747: forward pass 116.338 ms, 14.98 s total | |
[ 2023-10-08 02:11:15 ] Completed Epoch: 13 batch 747: backward pass 78.229 ms, 15.06 s total | |
[ 2023-10-08 02:11:15 ] Completed Epoch: 13 batch 747: computing loss 130.884 ms, 15.19 s total | |
EPOCH: [13], BATCH: [747/889], loss: 0.391, loss_box_reg: 0.121, loss_classifier: 0.098, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 747 | |
[ 2023-10-08 02:11:16 ] Completed saving temp checkpoint 1,026.293 ms, 16.22 s total | |
[ 2023-10-08 02:11:16 ] Completed replacing temp checkpoint with checkpoint 76.588 ms, 16.29 s total | |
[ 2023-10-08 02:11:16 ] Completed Epoch: 13 batch 748: moving batch data to device 9.335 ms, 16.30 s total | |
[ 2023-10-08 02:11:16 ] Completed Epoch: 13 batch 748: forward pass 105.762 ms, 16.41 s total | |
[ 2023-10-08 02:11:16 ] Completed Epoch: 13 batch 748: backward pass 74.824 ms, 16.48 s total | |
[ 2023-10-08 02:11:16 ] Completed Epoch: 13 batch 748: computing loss 118.114 ms, 16.60 s total | |
EPOCH: [13], BATCH: [748/889], loss: 0.368, loss_box_reg: 0.109, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 748 | |
[ 2023-10-08 02:11:18 ] Completed saving temp checkpoint 1,199.234 ms, 17.80 s total | |
[ 2023-10-08 02:11:18 ] Completed replacing temp checkpoint with checkpoint 49.044 ms, 17.85 s total | |
[ 2023-10-08 02:11:18 ] Completed Epoch: 13 batch 749: moving batch data to device 5.344 ms, 17.85 s total | |
[ 2023-10-08 02:11:18 ] Completed Epoch: 13 batch 749: forward pass 101.536 ms, 17.96 s total | |
[ 2023-10-08 02:11:18 ] Completed Epoch: 13 batch 749: backward pass 83.114 ms, 18.04 s total | |
[ 2023-10-08 02:11:18 ] Completed Epoch: 13 batch 749: computing loss 207.099 ms, 18.25 s total | |
EPOCH: [13], BATCH: [749/889], loss: 0.378, loss_box_reg: 0.113, loss_classifier: 0.095, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 749 | |
[ 2023-10-08 02:11:19 ] Completed saving temp checkpoint 1,061.857 ms, 19.31 s total | |
[ 2023-10-08 02:11:19 ] Completed replacing temp checkpoint with checkpoint 72.088 ms, 19.38 s total | |
[ 2023-10-08 02:11:19 ] Completed Epoch: 13 batch 750: moving batch data to device 7.396 ms, 19.39 s total | |
[ 2023-10-08 02:11:19 ] Completed Epoch: 13 batch 750: forward pass 105.763 ms, 19.49 s total | |
[ 2023-10-08 02:11:19 ] Completed Epoch: 13 batch 750: backward pass 37.947 ms, 19.53 s total | |
[ 2023-10-08 02:11:19 ] Completed Epoch: 13 batch 750: computing loss 158.970 ms, 19.69 s total | |
EPOCH: [13], BATCH: [750/889], loss: 0.394, loss_box_reg: 0.115, loss_classifier: 0.104, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 750 | |
[ 2023-10-08 02:11:21 ] Completed saving temp checkpoint 1,170.520 ms, 20.86 s total | |
[ 2023-10-08 02:11:21 ] Completed replacing temp checkpoint with checkpoint 65.625 ms, 20.93 s total | |
[ 2023-10-08 02:11:21 ] Completed Epoch: 13 batch 751: moving batch data to device 11.498 ms, 20.94 s total | |
[ 2023-10-08 02:11:21 ] Completed Epoch: 13 batch 751: forward pass 104.664 ms, 21.04 s total | |
[ 2023-10-08 02:11:21 ] Completed Epoch: 13 batch 751: backward pass 78.937 ms, 21.12 s total | |
[ 2023-10-08 02:11:21 ] Completed Epoch: 13 batch 751: computing loss 112.970 ms, 21.23 s total | |
EPOCH: [13], BATCH: [751/889], loss: 0.392, loss_box_reg: 0.118, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 751 | |
[ 2023-10-08 02:11:22 ] Completed saving temp checkpoint 1,065.798 ms, 22.30 s total | |
[ 2023-10-08 02:11:22 ] Completed replacing temp checkpoint with checkpoint 75.845 ms, 22.38 s total | |
[ 2023-10-08 02:11:22 ] Completed Epoch: 13 batch 752: moving batch data to device 6.684 ms, 22.38 s total | |
[ 2023-10-08 02:11:22 ] Completed Epoch: 13 batch 752: forward pass 108.230 ms, 22.49 s total | |
[ 2023-10-08 02:11:22 ] Completed Epoch: 13 batch 752: backward pass 66.332 ms, 22.56 s total | |
[ 2023-10-08 02:11:23 ] Completed Epoch: 13 batch 752: computing loss 194.185 ms, 22.75 s total | |
EPOCH: [13], BATCH: [752/889], loss: 0.381, loss_box_reg: 0.119, loss_classifier: 0.096, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 752 | |
[ 2023-10-08 02:11:24 ] Completed saving temp checkpoint 1,327.478 ms, 24.08 s total | |
[ 2023-10-08 02:11:24 ] Completed replacing temp checkpoint with checkpoint 86.188 ms, 24.16 s total | |
[ 2023-10-08 02:11:24 ] Completed Epoch: 13 batch 753: moving batch data to device 6.546 ms, 24.17 s total | |
[ 2023-10-08 02:11:24 ] Completed Epoch: 13 batch 753: forward pass 111.039 ms, 24.28 s total | |
[ 2023-10-08 02:11:24 ] Completed Epoch: 13 batch 753: backward pass 73.249 ms, 24.36 s total | |
[ 2023-10-08 02:11:24 ] Completed Epoch: 13 batch 753: computing loss 112.179 ms, 24.47 s total | |
EPOCH: [13], BATCH: [753/889], loss: 0.347, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.124, loss_objectness: 0.011, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 753 | |
[ 2023-10-08 02:11:26 ] Completed saving temp checkpoint 1,403.432 ms, 25.87 s total | |
[ 2023-10-08 02:11:26 ] Completed replacing temp checkpoint with checkpoint 66.197 ms, 25.94 s total | |
[ 2023-10-08 02:11:26 ] Completed Epoch: 13 batch 754: moving batch data to device 8.064 ms, 25.95 s total | |
[ 2023-10-08 02:11:26 ] Completed Epoch: 13 batch 754: forward pass 105.835 ms, 26.05 s total | |
[ 2023-10-08 02:11:26 ] Completed Epoch: 13 batch 754: backward pass 81.318 ms, 26.13 s total | |
[ 2023-10-08 02:11:26 ] Completed Epoch: 13 batch 754: computing loss 107.383 ms, 26.24 s total | |
EPOCH: [13], BATCH: [754/889], loss: 0.403, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 754 | |
[ 2023-10-08 02:11:28 ] Completed saving temp checkpoint 1,824.914 ms, 28.06 s total | |
[ 2023-10-08 02:11:28 ] Completed replacing temp checkpoint with checkpoint 80.522 ms, 28.15 s total | |
[ 2023-10-08 02:11:28 ] Completed Epoch: 13 batch 755: moving batch data to device 7.614 ms, 28.15 s total | |
[ 2023-10-08 02:11:28 ] Completed Epoch: 13 batch 755: forward pass 116.591 ms, 28.27 s total | |
[ 2023-10-08 02:11:28 ] Completed Epoch: 13 batch 755: backward pass 79.625 ms, 28.35 s total | |
[ 2023-10-08 02:11:28 ] Completed Epoch: 13 batch 755: computing loss 109.431 ms, 28.46 s total | |
EPOCH: [13], BATCH: [755/889], loss: 0.383, loss_box_reg: 0.115, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 755 | |
[ 2023-10-08 02:11:30 ] Completed saving temp checkpoint 1,428.389 ms, 29.89 s total | |
[ 2023-10-08 02:11:30 ] Completed replacing temp checkpoint with checkpoint 66.854 ms, 29.95 s total | |
[ 2023-10-08 02:11:30 ] Completed Epoch: 13 batch 756: moving batch data to device 7.697 ms, 29.96 s total | |
[ 2023-10-08 02:11:30 ] Completed Epoch: 13 batch 756: forward pass 108.665 ms, 30.07 s total | |
[ 2023-10-08 02:11:30 ] Completed Epoch: 13 batch 756: backward pass 74.063 ms, 30.14 s total | |
[ 2023-10-08 02:11:30 ] Completed Epoch: 13 batch 756: computing loss 185.797 ms, 30.33 s total | |
EPOCH: [13], BATCH: [756/889], loss: 0.365, loss_box_reg: 0.110, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 756 | |
[ 2023-10-08 02:11:32 ] Completed saving temp checkpoint 1,588.206 ms, 31.92 s total | |
[ 2023-10-08 02:11:32 ] Completed replacing temp checkpoint with checkpoint 70.649 ms, 31.99 s total | |
[ 2023-10-08 02:11:32 ] Completed Epoch: 13 batch 757: moving batch data to device 8.059 ms, 32.00 s total | |
[ 2023-10-08 02:11:32 ] Completed Epoch: 13 batch 757: forward pass 106.154 ms, 32.10 s total | |
[ 2023-10-08 02:11:32 ] Completed Epoch: 13 batch 757: backward pass 42.606 ms, 32.15 s total | |
[ 2023-10-08 02:11:32 ] Completed Epoch: 13 batch 757: computing loss 148.375 ms, 32.29 s total | |
EPOCH: [13], BATCH: [757/889], loss: 0.372, loss_box_reg: 0.113, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 757 | |
[ 2023-10-08 02:11:33 ] Completed saving temp checkpoint 1,020.062 ms, 33.31 s total | |
[ 2023-10-08 02:11:33 ] Completed replacing temp checkpoint with checkpoint 71.167 ms, 33.39 s total | |
[ 2023-10-08 02:11:33 ] Completed Epoch: 13 batch 758: moving batch data to device 7.094 ms, 33.39 s total | |
[ 2023-10-08 02:11:33 ] Completed Epoch: 13 batch 758: forward pass 104.734 ms, 33.50 s total | |
[ 2023-10-08 02:11:33 ] Completed Epoch: 13 batch 758: backward pass 75.829 ms, 33.57 s total | |
[ 2023-10-08 02:11:33 ] Completed Epoch: 13 batch 758: computing loss 115.193 ms, 33.69 s total | |
EPOCH: [13], BATCH: [758/889], loss: 0.361, loss_box_reg: 0.111, loss_classifier: 0.094, loss_mask: 0.122, loss_objectness: 0.012, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 758 | |
[ 2023-10-08 02:11:35 ] Completed saving temp checkpoint 1,439.964 ms, 35.13 s total | |
[ 2023-10-08 02:11:35 ] Completed replacing temp checkpoint with checkpoint 86.899 ms, 35.21 s total | |
[ 2023-10-08 02:11:35 ] Completed Epoch: 13 batch 759: moving batch data to device 8.843 ms, 35.22 s total | |
[ 2023-10-08 02:11:35 ] Completed Epoch: 13 batch 759: forward pass 103.842 ms, 35.33 s total | |
[ 2023-10-08 02:11:35 ] Completed Epoch: 13 batch 759: backward pass 81.327 ms, 35.41 s total | |
[ 2023-10-08 02:11:35 ] Completed Epoch: 13 batch 759: computing loss 115.118 ms, 35.52 s total | |
EPOCH: [13], BATCH: [759/889], loss: 0.413, loss_box_reg: 0.120, loss_classifier: 0.106, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 759 | |
[ 2023-10-08 02:11:36 ] Completed saving temp checkpoint 1,078.101 ms, 36.60 s total | |
[ 2023-10-08 02:11:36 ] Completed replacing temp checkpoint with checkpoint 65.840 ms, 36.67 s total | |
[ 2023-10-08 02:11:36 ] Completed Epoch: 13 batch 760: moving batch data to device 7.645 ms, 36.68 s total | |
[ 2023-10-08 02:11:37 ] Completed Epoch: 13 batch 760: forward pass 103.870 ms, 36.78 s total | |
[ 2023-10-08 02:11:37 ] Completed Epoch: 13 batch 760: backward pass 47.543 ms, 36.83 s total | |
[ 2023-10-08 02:11:37 ] Completed Epoch: 13 batch 760: computing loss 154.569 ms, 36.98 s total | |
EPOCH: [13], BATCH: [760/889], loss: 0.381, loss_box_reg: 0.110, loss_classifier: 0.100, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 760 | |
[ 2023-10-08 02:11:38 ] Completed saving temp checkpoint 1,563.584 ms, 38.55 s total | |
[ 2023-10-08 02:11:38 ] Completed replacing temp checkpoint with checkpoint 97.111 ms, 38.64 s total | |
[ 2023-10-08 02:11:38 ] Completed Epoch: 13 batch 761: moving batch data to device 9.644 ms, 38.65 s total | |
[ 2023-10-08 02:11:39 ] Completed Epoch: 13 batch 761: forward pass 104.669 ms, 38.76 s total | |
[ 2023-10-08 02:11:39 ] Completed Epoch: 13 batch 761: backward pass 74.425 ms, 38.83 s total | |
[ 2023-10-08 02:11:39 ] Completed Epoch: 13 batch 761: computing loss 117.306 ms, 38.95 s total | |
EPOCH: [13], BATCH: [761/889], loss: 0.410, loss_box_reg: 0.124, loss_classifier: 0.100, loss_mask: 0.134, loss_objectness: 0.019, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 761 | |
[ 2023-10-08 02:11:40 ] Completed saving temp checkpoint 1,276.176 ms, 40.22 s total | |
[ 2023-10-08 02:11:40 ] Completed replacing temp checkpoint with checkpoint 82.192 ms, 40.31 s total | |
[ 2023-10-08 02:11:40 ] Completed Epoch: 13 batch 762: moving batch data to device 6.995 ms, 40.31 s total | |
[ 2023-10-08 02:11:40 ] Completed Epoch: 13 batch 762: forward pass 106.594 ms, 40.42 s total | |
[ 2023-10-08 02:11:40 ] Completed Epoch: 13 batch 762: backward pass 51.376 ms, 40.47 s total | |
[ 2023-10-08 02:11:40 ] Completed Epoch: 13 batch 762: computing loss 137.695 ms, 40.61 s total | |
EPOCH: [13], BATCH: [762/889], loss: 0.431, loss_box_reg: 0.131, loss_classifier: 0.109, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.039 | |
Saving checkpoint at epoch 13 train batch 762 | |
[ 2023-10-08 02:11:42 ] Completed saving temp checkpoint 1,444.797 ms, 42.05 s total | |
[ 2023-10-08 02:11:42 ] Completed replacing temp checkpoint with checkpoint 54.045 ms, 42.11 s total | |
[ 2023-10-08 02:11:42 ] Completed Epoch: 13 batch 763: moving batch data to device 4.654 ms, 42.11 s total | |
[ 2023-10-08 02:11:42 ] Completed Epoch: 13 batch 763: forward pass 101.952 ms, 42.21 s total | |
[ 2023-10-08 02:11:42 ] Completed Epoch: 13 batch 763: backward pass 49.850 ms, 42.26 s total | |
[ 2023-10-08 02:11:42 ] Completed Epoch: 13 batch 763: computing loss 141.349 ms, 42.41 s total | |
EPOCH: [13], BATCH: [763/889], loss: 0.387, loss_box_reg: 0.121, loss_classifier: 0.094, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 763 | |
[ 2023-10-08 02:11:43 ] Completed saving temp checkpoint 1,294.020 ms, 43.70 s total | |
[ 2023-10-08 02:11:43 ] Completed replacing temp checkpoint with checkpoint 45.333 ms, 43.75 s total | |
[ 2023-10-08 02:11:44 ] Completed Epoch: 13 batch 764: moving batch data to device 5.280 ms, 43.75 s total | |
[ 2023-10-08 02:11:44 ] Completed Epoch: 13 batch 764: forward pass 105.047 ms, 43.86 s total | |
[ 2023-10-08 02:11:44 ] Completed Epoch: 13 batch 764: backward pass 73.262 ms, 43.93 s total | |
[ 2023-10-08 02:11:44 ] Completed Epoch: 13 batch 764: computing loss 118.395 ms, 44.05 s total | |
EPOCH: [13], BATCH: [764/889], loss: 0.362, loss_box_reg: 0.112, loss_classifier: 0.097, loss_mask: 0.121, loss_objectness: 0.012, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 764 | |
[ 2023-10-08 02:11:45 ] Completed saving temp checkpoint 1,458.861 ms, 45.51 s total | |
[ 2023-10-08 02:11:45 ] Completed replacing temp checkpoint with checkpoint 102.191 ms, 45.61 s total | |
[ 2023-10-08 02:11:45 ] Completed Epoch: 13 batch 765: moving batch data to device 7.723 ms, 45.62 s total | |
[ 2023-10-08 02:11:45 ] Completed Epoch: 13 batch 765: forward pass 104.582 ms, 45.72 s total | |
[ 2023-10-08 02:11:46 ] Completed Epoch: 13 batch 765: backward pass 71.119 ms, 45.79 s total | |
[ 2023-10-08 02:11:46 ] Completed Epoch: 13 batch 765: computing loss 123.602 ms, 45.92 s total | |
EPOCH: [13], BATCH: [765/889], loss: 0.366, loss_box_reg: 0.106, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 765 | |
[ 2023-10-08 02:11:47 ] Completed saving temp checkpoint 1,413.756 ms, 47.33 s total | |
[ 2023-10-08 02:11:47 ] Completed replacing temp checkpoint with checkpoint 91.981 ms, 47.42 s total | |
[ 2023-10-08 02:11:47 ] Completed Epoch: 13 batch 766: moving batch data to device 7.573 ms, 47.43 s total | |
[ 2023-10-08 02:11:47 ] Completed Epoch: 13 batch 766: forward pass 104.198 ms, 47.53 s total | |
[ 2023-10-08 02:11:47 ] Completed Epoch: 13 batch 766: backward pass 75.280 ms, 47.61 s total | |
[ 2023-10-08 02:11:47 ] Completed Epoch: 13 batch 766: computing loss 92.962 ms, 47.70 s total | |
EPOCH: [13], BATCH: [766/889], loss: 0.363, loss_box_reg: 0.107, loss_classifier: 0.089, loss_mask: 0.122, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 766 | |
[ 2023-10-08 02:11:50 ] Completed saving temp checkpoint 2,055.116 ms, 49.76 s total | |
[ 2023-10-08 02:11:50 ] Completed replacing temp checkpoint with checkpoint 104.747 ms, 49.86 s total | |
[ 2023-10-08 02:11:50 ] Completed Epoch: 13 batch 767: moving batch data to device 7.231 ms, 49.87 s total | |
[ 2023-10-08 02:11:50 ] Completed Epoch: 13 batch 767: forward pass 101.711 ms, 49.97 s total | |
[ 2023-10-08 02:11:50 ] Completed Epoch: 13 batch 767: backward pass 43.830 ms, 50.01 s total | |
[ 2023-10-08 02:11:50 ] Completed Epoch: 13 batch 767: computing loss 142.833 ms, 50.16 s total | |
EPOCH: [13], BATCH: [767/889], loss: 0.398, loss_box_reg: 0.127, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 767 | |
[ 2023-10-08 02:11:51 ] Completed saving temp checkpoint 1,293.653 ms, 51.45 s total | |
[ 2023-10-08 02:11:51 ] Completed replacing temp checkpoint with checkpoint 89.140 ms, 51.54 s total | |
[ 2023-10-08 02:11:51 ] Completed Epoch: 13 batch 768: moving batch data to device 9.564 ms, 51.55 s total | |
[ 2023-10-08 02:11:51 ] Completed Epoch: 13 batch 768: forward pass 102.134 ms, 51.65 s total | |
[ 2023-10-08 02:11:51 ] Completed Epoch: 13 batch 768: backward pass 73.674 ms, 51.72 s total | |
[ 2023-10-08 02:11:52 ] Completed Epoch: 13 batch 768: computing loss 216.438 ms, 51.94 s total | |
EPOCH: [13], BATCH: [768/889], loss: 0.410, loss_box_reg: 0.122, loss_classifier: 0.100, loss_mask: 0.143, loss_objectness: 0.018, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 768 | |
[ 2023-10-08 02:11:53 ] Completed saving temp checkpoint 1,666.371 ms, 53.61 s total | |
[ 2023-10-08 02:11:53 ] Completed replacing temp checkpoint with checkpoint 93.203 ms, 53.70 s total | |
[ 2023-10-08 02:11:53 ] Completed Epoch: 13 batch 769: moving batch data to device 7.233 ms, 53.71 s total | |
[ 2023-10-08 02:11:54 ] Completed Epoch: 13 batch 769: forward pass 104.423 ms, 53.81 s total | |
[ 2023-10-08 02:11:54 ] Completed Epoch: 13 batch 769: backward pass 35.036 ms, 53.85 s total | |
[ 2023-10-08 02:11:54 ] Completed Epoch: 13 batch 769: computing loss 157.190 ms, 54.00 s total | |
EPOCH: [13], BATCH: [769/889], loss: 0.364, loss_box_reg: 0.108, loss_classifier: 0.091, loss_mask: 0.124, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 769 | |
[ 2023-10-08 02:11:55 ] Completed saving temp checkpoint 1,262.791 ms, 55.27 s total | |
[ 2023-10-08 02:11:55 ] Completed replacing temp checkpoint with checkpoint 84.186 ms, 55.35 s total | |
[ 2023-10-08 02:11:55 ] Completed Epoch: 13 batch 770: moving batch data to device 8.123 ms, 55.36 s total | |
[ 2023-10-08 02:11:55 ] Completed Epoch: 13 batch 770: forward pass 103.825 ms, 55.46 s total | |
[ 2023-10-08 02:11:55 ] Completed Epoch: 13 batch 770: backward pass 43.409 ms, 55.51 s total | |
[ 2023-10-08 02:11:55 ] Completed Epoch: 13 batch 770: computing loss 139.025 ms, 55.65 s total | |
EPOCH: [13], BATCH: [770/889], loss: 0.373, loss_box_reg: 0.110, loss_classifier: 0.100, loss_mask: 0.123, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 770 | |
[ 2023-10-08 02:11:57 ] Completed saving temp checkpoint 1,332.848 ms, 56.98 s total | |
[ 2023-10-08 02:11:57 ] Completed replacing temp checkpoint with checkpoint 72.744 ms, 57.05 s total | |
[ 2023-10-08 02:11:57 ] Completed Epoch: 13 batch 771: moving batch data to device 7.265 ms, 57.06 s total | |
[ 2023-10-08 02:11:57 ] Completed Epoch: 13 batch 771: forward pass 111.886 ms, 57.17 s total | |
[ 2023-10-08 02:11:57 ] Completed Epoch: 13 batch 771: backward pass 39.940 ms, 57.21 s total | |
[ 2023-10-08 02:11:57 ] Completed Epoch: 13 batch 771: computing loss 153.842 ms, 57.36 s total | |
EPOCH: [13], BATCH: [771/889], loss: 0.428, loss_box_reg: 0.135, loss_classifier: 0.112, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 771 | |
[ 2023-10-08 02:11:58 ] Completed saving temp checkpoint 1,205.849 ms, 58.57 s total | |
[ 2023-10-08 02:11:58 ] Completed replacing temp checkpoint with checkpoint 62.969 ms, 58.63 s total | |
[ 2023-10-08 02:11:58 ] Completed Epoch: 13 batch 772: moving batch data to device 4.590 ms, 58.64 s total | |
[ 2023-10-08 02:11:58 ] Completed Epoch: 13 batch 772: forward pass 106.324 ms, 58.74 s total | |
[ 2023-10-08 02:11:59 ] Completed Epoch: 13 batch 772: backward pass 67.030 ms, 58.81 s total | |
[ 2023-10-08 02:11:59 ] Completed Epoch: 13 batch 772: computing loss 102.838 ms, 58.91 s total | |
EPOCH: [13], BATCH: [772/889], loss: 0.362, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 772 | |
[ 2023-10-08 02:12:00 ] Completed saving temp checkpoint 1,315.294 ms, 60.23 s total | |
[ 2023-10-08 02:12:00 ] Completed replacing temp checkpoint with checkpoint 82.634 ms, 60.31 s total | |
[ 2023-10-08 02:12:00 ] Completed Epoch: 13 batch 773: moving batch data to device 8.534 ms, 60.32 s total | |
[ 2023-10-08 02:12:00 ] Completed Epoch: 13 batch 773: forward pass 108.068 ms, 60.43 s total | |
[ 2023-10-08 02:12:00 ] Completed Epoch: 13 batch 773: backward pass 50.130 ms, 60.48 s total | |
[ 2023-10-08 02:12:00 ] Completed Epoch: 13 batch 773: computing loss 119.269 ms, 60.60 s total | |
EPOCH: [13], BATCH: [773/889], loss: 0.397, loss_box_reg: 0.122, loss_classifier: 0.098, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 773 | |
[ 2023-10-08 02:12:02 ] Completed saving temp checkpoint 1,195.751 ms, 61.79 s total | |
[ 2023-10-08 02:12:02 ] Completed replacing temp checkpoint with checkpoint 56.804 ms, 61.85 s total | |
[ 2023-10-08 02:12:02 ] Completed Epoch: 13 batch 774: moving batch data to device 9.166 ms, 61.86 s total | |
[ 2023-10-08 02:12:02 ] Completed Epoch: 13 batch 774: forward pass 110.497 ms, 61.97 s total | |
[ 2023-10-08 02:12:02 ] Completed Epoch: 13 batch 774: backward pass 78.826 ms, 62.05 s total | |
[ 2023-10-08 02:12:02 ] Completed Epoch: 13 batch 774: computing loss 113.015 ms, 62.16 s total | |
EPOCH: [13], BATCH: [774/889], loss: 0.377, loss_box_reg: 0.109, loss_classifier: 0.096, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 774 | |
[ 2023-10-08 02:12:03 ] Completed saving temp checkpoint 1,295.906 ms, 63.46 s total | |
[ 2023-10-08 02:12:03 ] Completed replacing temp checkpoint with checkpoint 89.839 ms, 63.55 s total | |
[ 2023-10-08 02:12:03 ] Completed Epoch: 13 batch 775: moving batch data to device 8.607 ms, 63.56 s total | |
[ 2023-10-08 02:12:03 ] Completed Epoch: 13 batch 775: forward pass 107.256 ms, 63.66 s total | |
[ 2023-10-08 02:12:03 ] Completed Epoch: 13 batch 775: backward pass 58.819 ms, 63.72 s total | |
[ 2023-10-08 02:12:04 ] Completed Epoch: 13 batch 775: computing loss 128.359 ms, 63.85 s total | |
EPOCH: [13], BATCH: [775/889], loss: 0.416, loss_box_reg: 0.129, loss_classifier: 0.107, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 775 | |
[ 2023-10-08 02:12:05 ] Completed saving temp checkpoint 1,161.136 ms, 65.01 s total | |
[ 2023-10-08 02:12:05 ] Completed replacing temp checkpoint with checkpoint 71.325 ms, 65.08 s total | |
[ 2023-10-08 02:12:05 ] Completed Epoch: 13 batch 776: moving batch data to device 8.555 ms, 65.09 s total | |
[ 2023-10-08 02:12:05 ] Completed Epoch: 13 batch 776: forward pass 107.145 ms, 65.20 s total | |
[ 2023-10-08 02:12:05 ] Completed Epoch: 13 batch 776: backward pass 75.477 ms, 65.27 s total | |
[ 2023-10-08 02:12:05 ] Completed Epoch: 13 batch 776: computing loss 111.426 ms, 65.39 s total | |
EPOCH: [13], BATCH: [776/889], loss: 0.374, loss_box_reg: 0.111, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 776 | |
[ 2023-10-08 02:12:06 ] Completed saving temp checkpoint 1,275.099 ms, 66.66 s total | |
[ 2023-10-08 02:12:06 ] Completed replacing temp checkpoint with checkpoint 73.391 ms, 66.73 s total | |
[ 2023-10-08 02:12:06 ] Completed Epoch: 13 batch 777: moving batch data to device 7.041 ms, 66.74 s total | |
[ 2023-10-08 02:12:07 ] Completed Epoch: 13 batch 777: forward pass 109.685 ms, 66.85 s total | |
[ 2023-10-08 02:12:07 ] Completed Epoch: 13 batch 777: backward pass 80.707 ms, 66.93 s total | |
[ 2023-10-08 02:12:07 ] Completed Epoch: 13 batch 777: computing loss 118.612 ms, 67.05 s total | |
EPOCH: [13], BATCH: [777/889], loss: 0.415, loss_box_reg: 0.129, loss_classifier: 0.109, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 777 | |
[ 2023-10-08 02:12:08 ] Completed saving temp checkpoint 1,158.376 ms, 68.21 s total | |
[ 2023-10-08 02:12:08 ] Completed replacing temp checkpoint with checkpoint 81.735 ms, 68.29 s total | |
[ 2023-10-08 02:12:08 ] Completed Epoch: 13 batch 778: moving batch data to device 7.978 ms, 68.30 s total | |
[ 2023-10-08 02:12:08 ] Completed Epoch: 13 batch 778: forward pass 101.564 ms, 68.40 s total | |
[ 2023-10-08 02:12:08 ] Completed Epoch: 13 batch 778: backward pass 75.771 ms, 68.48 s total | |
[ 2023-10-08 02:12:08 ] Completed Epoch: 13 batch 778: computing loss 118.249 ms, 68.59 s total | |
EPOCH: [13], BATCH: [778/889], loss: 0.375, loss_box_reg: 0.112, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 778 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 02:25:25 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:25:25 ] Completed importing Timer 0.019 ms, 0.00 s total | |
[ 2023-10-08 02:25:25 ] Completed importing everything else 592.101 ms, 0.59 s total | |
[ 2023-10-08 02:25:25 ] Completed defined other functions 0.032 ms, 0.59 s total | |
| distributed init (rank 5): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 2): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 02:25:28 ] Completed main preliminaries 2,685.988 ms, 3.28 s total | |
loading annotations into memory... | |
Done (t=10.36s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.27s) | |
creating index... | |
index created! | |
[ 2023-10-08 02:25:40 ] Completed loading data 12,072.906 ms, 15.35 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 02:25:40 ] Completed creating data samplers 92.705 ms, 15.44 s total | |
[ 2023-10-08 02:25:40 ] Completed creating data loaders 0.197 ms, 15.44 s total | |
[ 2023-10-08 02:25:41 ] Completed creating model and .to(device) 668.984 ms, 16.11 s total | |
[ 2023-10-08 02:25:43 ] Completed preparing model for distributed training 2,134.598 ms, 18.25 s total | |
[ 2023-10-08 02:25:43 ] Completed optimizer and scaler 0.631 ms, 18.25 s total | |
[ 2023-10-08 02:25:43 ] Completed learning rate schedulers 0.233 ms, 18.25 s total | |
[ 2023-10-08 02:25:44 ] Completed init coco evaluator 942.040 ms, 19.19 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 02:25:45 ] Completed retrieving checkpoint 956.142 ms, 20.15 s total | |
EPOCH :: 13 | |
[ 2023-10-08 02:25:45 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:25:45 ] Completed training preliminaries 1.030 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 778 | |
[ 2023-10-08 02:25:46 ] Completed Epoch: 13 batch 778: moving batch data to device 613.527 ms, 0.61 s total | |
[ 2023-10-08 02:25:46 ] Completed Epoch: 13 batch 778: forward pass 976.563 ms, 1.59 s total | |
[ 2023-10-08 02:25:47 ] Completed Epoch: 13 batch 778: backward pass 179.536 ms, 1.77 s total | |
[ 2023-10-08 02:25:47 ] Completed Epoch: 13 batch 778: computing loss 171.064 ms, 1.94 s total | |
EPOCH: [13], BATCH: [778/889], loss: 0.373, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 778 | |
[ 2023-10-08 02:25:48 ] Completed saving temp checkpoint 1,315.139 ms, 3.26 s total | |
[ 2023-10-08 02:25:48 ] Completed replacing temp checkpoint with checkpoint 153.297 ms, 3.41 s total | |
[ 2023-10-08 02:25:48 ] Completed Epoch: 13 batch 779: moving batch data to device 4.381 ms, 3.41 s total | |
[ 2023-10-08 02:25:49 ] Completed Epoch: 13 batch 779: forward pass 213.054 ms, 3.63 s total | |
[ 2023-10-08 02:25:49 ] Completed Epoch: 13 batch 779: backward pass 106.319 ms, 3.73 s total | |
[ 2023-10-08 02:25:49 ] Completed Epoch: 13 batch 779: computing loss 132.626 ms, 3.87 s total | |
EPOCH: [13], BATCH: [779/889], loss: 0.368, loss_box_reg: 0.109, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 779 | |
[ 2023-10-08 02:25:50 ] Completed saving temp checkpoint 1,561.580 ms, 5.43 s total | |
[ 2023-10-08 02:25:50 ] Completed replacing temp checkpoint with checkpoint 70.778 ms, 5.50 s total | |
[ 2023-10-08 02:25:50 ] Completed Epoch: 13 batch 780: moving batch data to device 3.737 ms, 5.50 s total | |
[ 2023-10-08 02:25:51 ] Completed Epoch: 13 batch 780: forward pass 111.343 ms, 5.61 s total | |
[ 2023-10-08 02:25:51 ] Completed Epoch: 13 batch 780: backward pass 71.289 ms, 5.69 s total | |
[ 2023-10-08 02:25:51 ] Completed Epoch: 13 batch 780: computing loss 144.886 ms, 5.83 s total | |
EPOCH: [13], BATCH: [780/889], loss: 0.423, loss_box_reg: 0.129, loss_classifier: 0.112, loss_mask: 0.135, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 780 | |
[ 2023-10-08 02:25:52 ] Completed saving temp checkpoint 1,590.175 ms, 7.42 s total | |
[ 2023-10-08 02:25:52 ] Completed replacing temp checkpoint with checkpoint 52.320 ms, 7.47 s total | |
[ 2023-10-08 02:25:52 ] Completed Epoch: 13 batch 781: moving batch data to device 3.761 ms, 7.48 s total | |
[ 2023-10-08 02:25:52 ] Completed Epoch: 13 batch 781: forward pass 107.998 ms, 7.58 s total | |
[ 2023-10-08 02:25:53 ] Completed Epoch: 13 batch 781: backward pass 81.426 ms, 7.67 s total | |
[ 2023-10-08 02:25:53 ] Completed Epoch: 13 batch 781: computing loss 134.976 ms, 7.80 s total | |
EPOCH: [13], BATCH: [781/889], loss: 0.384, loss_box_reg: 0.116, loss_classifier: 0.103, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 781 | |
[ 2023-10-08 02:25:54 ] Completed saving temp checkpoint 1,197.258 ms, 9.00 s total | |
[ 2023-10-08 02:25:54 ] Completed replacing temp checkpoint with checkpoint 83.386 ms, 9.08 s total | |
[ 2023-10-08 02:25:54 ] Completed Epoch: 13 batch 782: moving batch data to device 10.285 ms, 9.09 s total | |
[ 2023-10-08 02:25:54 ] Completed Epoch: 13 batch 782: forward pass 109.694 ms, 9.20 s total | |
[ 2023-10-08 02:25:54 ] Completed Epoch: 13 batch 782: backward pass 77.309 ms, 9.28 s total | |
[ 2023-10-08 02:25:54 ] Completed Epoch: 13 batch 782: computing loss 133.509 ms, 9.41 s total | |
EPOCH: [13], BATCH: [782/889], loss: 0.394, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 782 | |
[ 2023-10-08 02:25:55 ] Completed saving temp checkpoint 973.232 ms, 10.39 s total | |
[ 2023-10-08 02:25:55 ] Completed replacing temp checkpoint with checkpoint 49.549 ms, 10.44 s total | |
[ 2023-10-08 02:25:55 ] Completed Epoch: 13 batch 783: moving batch data to device 4.255 ms, 10.44 s total | |
[ 2023-10-08 02:25:56 ] Completed Epoch: 13 batch 783: forward pass 180.644 ms, 10.62 s total | |
[ 2023-10-08 02:25:56 ] Completed Epoch: 13 batch 783: backward pass 78.696 ms, 10.70 s total | |
[ 2023-10-08 02:25:56 ] Completed Epoch: 13 batch 783: computing loss 127.323 ms, 10.83 s total | |
EPOCH: [13], BATCH: [783/889], loss: 0.387, loss_box_reg: 0.117, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 783 | |
[ 2023-10-08 02:25:57 ] Completed saving temp checkpoint 1,153.341 ms, 11.98 s total | |
[ 2023-10-08 02:25:57 ] Completed replacing temp checkpoint with checkpoint 67.041 ms, 12.05 s total | |
[ 2023-10-08 02:25:57 ] Completed Epoch: 13 batch 784: moving batch data to device 2.320 ms, 12.05 s total | |
[ 2023-10-08 02:25:57 ] Completed Epoch: 13 batch 784: forward pass 107.866 ms, 12.16 s total | |
[ 2023-10-08 02:25:57 ] Completed Epoch: 13 batch 784: backward pass 82.200 ms, 12.24 s total | |
[ 2023-10-08 02:25:57 ] Completed Epoch: 13 batch 784: computing loss 130.671 ms, 12.37 s total | |
EPOCH: [13], BATCH: [784/889], loss: 0.386, loss_box_reg: 0.121, loss_classifier: 0.095, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 784 | |
[ 2023-10-08 02:25:58 ] Completed saving temp checkpoint 1,003.810 ms, 13.37 s total | |
[ 2023-10-08 02:25:58 ] Completed replacing temp checkpoint with checkpoint 41.369 ms, 13.41 s total | |
[ 2023-10-08 02:25:58 ] Completed Epoch: 13 batch 785: moving batch data to device 5.031 ms, 13.42 s total | |
[ 2023-10-08 02:25:58 ] Completed Epoch: 13 batch 785: forward pass 161.500 ms, 13.58 s total | |
[ 2023-10-08 02:25:59 ] Completed Epoch: 13 batch 785: backward pass 69.364 ms, 13.65 s total | |
[ 2023-10-08 02:25:59 ] Completed Epoch: 13 batch 785: computing loss 122.374 ms, 13.77 s total | |
EPOCH: [13], BATCH: [785/889], loss: 0.388, loss_box_reg: 0.116, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 785 | |
[ 2023-10-08 02:26:00 ] Completed saving temp checkpoint 1,142.941 ms, 14.92 s total | |
[ 2023-10-08 02:26:00 ] Completed replacing temp checkpoint with checkpoint 78.098 ms, 14.99 s total | |
[ 2023-10-08 02:26:00 ] Completed Epoch: 13 batch 786: moving batch data to device 6.438 ms, 15.00 s total | |
[ 2023-10-08 02:26:00 ] Completed Epoch: 13 batch 786: forward pass 109.226 ms, 15.11 s total | |
[ 2023-10-08 02:26:00 ] Completed Epoch: 13 batch 786: backward pass 92.472 ms, 15.20 s total | |
[ 2023-10-08 02:26:00 ] Completed Epoch: 13 batch 786: computing loss 108.613 ms, 15.31 s total | |
EPOCH: [13], BATCH: [786/889], loss: 0.406, loss_box_reg: 0.132, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 786 | |
[ 2023-10-08 02:26:01 ] Completed saving temp checkpoint 1,018.567 ms, 16.33 s total | |
[ 2023-10-08 02:26:01 ] Completed replacing temp checkpoint with checkpoint 46.951 ms, 16.38 s total | |
[ 2023-10-08 02:26:01 ] Completed Epoch: 13 batch 787: moving batch data to device 8.672 ms, 16.38 s total | |
[ 2023-10-08 02:26:01 ] Completed Epoch: 13 batch 787: forward pass 107.121 ms, 16.49 s total | |
[ 2023-10-08 02:26:01 ] Completed Epoch: 13 batch 787: backward pass 80.870 ms, 16.57 s total | |
[ 2023-10-08 02:26:02 ] Completed Epoch: 13 batch 787: computing loss 116.795 ms, 16.69 s total | |
EPOCH: [13], BATCH: [787/889], loss: 0.380, loss_box_reg: 0.113, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 787 | |
[ 2023-10-08 02:26:03 ] Completed saving temp checkpoint 1,204.313 ms, 17.89 s total | |
[ 2023-10-08 02:26:03 ] Completed replacing temp checkpoint with checkpoint 86.591 ms, 17.98 s total | |
[ 2023-10-08 02:26:03 ] Completed Epoch: 13 batch 788: moving batch data to device 6.830 ms, 17.99 s total | |
[ 2023-10-08 02:26:03 ] Completed Epoch: 13 batch 788: forward pass 109.409 ms, 18.10 s total | |
[ 2023-10-08 02:26:03 ] Completed Epoch: 13 batch 788: backward pass 79.059 ms, 18.18 s total | |
[ 2023-10-08 02:26:03 ] Completed Epoch: 13 batch 788: computing loss 162.441 ms, 18.34 s total | |
EPOCH: [13], BATCH: [788/889], loss: 0.368, loss_box_reg: 0.107, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.018, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 788 | |
[ 2023-10-08 02:26:04 ] Completed saving temp checkpoint 904.875 ms, 19.24 s total | |
[ 2023-10-08 02:26:04 ] Completed replacing temp checkpoint with checkpoint 60.148 ms, 19.30 s total | |
[ 2023-10-08 02:26:04 ] Completed Epoch: 13 batch 789: moving batch data to device 6.771 ms, 19.31 s total | |
[ 2023-10-08 02:26:04 ] Completed Epoch: 13 batch 789: forward pass 104.449 ms, 19.41 s total | |
[ 2023-10-08 02:26:04 ] Completed Epoch: 13 batch 789: backward pass 72.062 ms, 19.49 s total | |
[ 2023-10-08 02:26:05 ] Completed Epoch: 13 batch 789: computing loss 119.655 ms, 19.61 s total | |
EPOCH: [13], BATCH: [789/889], loss: 0.342, loss_box_reg: 0.104, loss_classifier: 0.082, loss_mask: 0.124, loss_objectness: 0.012, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 789 | |
[ 2023-10-08 02:26:06 ] Completed saving temp checkpoint 1,030.929 ms, 20.64 s total | |
[ 2023-10-08 02:26:06 ] Completed replacing temp checkpoint with checkpoint 78.208 ms, 20.72 s total | |
[ 2023-10-08 02:26:06 ] Completed Epoch: 13 batch 790: moving batch data to device 7.274 ms, 20.72 s total | |
[ 2023-10-08 02:26:06 ] Completed Epoch: 13 batch 790: forward pass 109.990 ms, 20.83 s total | |
[ 2023-10-08 02:26:06 ] Completed Epoch: 13 batch 790: backward pass 77.281 ms, 20.91 s total | |
[ 2023-10-08 02:26:06 ] Completed Epoch: 13 batch 790: computing loss 113.462 ms, 21.02 s total | |
EPOCH: [13], BATCH: [790/889], loss: 0.417, loss_box_reg: 0.127, loss_classifier: 0.105, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 790 | |
[ 2023-10-08 02:26:07 ] Completed saving temp checkpoint 1,018.498 ms, 22.04 s total | |
[ 2023-10-08 02:26:07 ] Completed replacing temp checkpoint with checkpoint 75.502 ms, 22.12 s total | |
[ 2023-10-08 02:26:07 ] Completed Epoch: 13 batch 791: moving batch data to device 6.926 ms, 22.12 s total | |
[ 2023-10-08 02:26:07 ] Completed Epoch: 13 batch 791: forward pass 110.271 ms, 22.23 s total | |
[ 2023-10-08 02:26:07 ] Completed Epoch: 13 batch 791: backward pass 75.043 ms, 22.31 s total | |
[ 2023-10-08 02:26:07 ] Completed Epoch: 13 batch 791: computing loss 122.963 ms, 22.43 s total | |
EPOCH: [13], BATCH: [791/889], loss: 0.396, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 791 | |
[ 2023-10-08 02:26:09 ] Completed saving temp checkpoint 1,586.945 ms, 24.02 s total | |
[ 2023-10-08 02:26:09 ] Completed replacing temp checkpoint with checkpoint 90.792 ms, 24.11 s total | |
[ 2023-10-08 02:26:09 ] Completed Epoch: 13 batch 792: moving batch data to device 7.698 ms, 24.12 s total | |
[ 2023-10-08 02:26:09 ] Completed Epoch: 13 batch 792: forward pass 107.002 ms, 24.22 s total | |
[ 2023-10-08 02:26:09 ] Completed Epoch: 13 batch 792: backward pass 79.864 ms, 24.30 s total | |
[ 2023-10-08 02:26:09 ] Completed Epoch: 13 batch 792: computing loss 119.083 ms, 24.42 s total | |
EPOCH: [13], BATCH: [792/889], loss: 0.374, loss_box_reg: 0.115, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 792 | |
[ 2023-10-08 02:26:11 ] Completed saving temp checkpoint 1,402.829 ms, 25.83 s total | |
[ 2023-10-08 02:26:11 ] Completed replacing temp checkpoint with checkpoint 60.979 ms, 25.89 s total | |
[ 2023-10-08 02:26:11 ] Completed Epoch: 13 batch 793: moving batch data to device 4.929 ms, 25.89 s total | |
[ 2023-10-08 02:26:11 ] Completed Epoch: 13 batch 793: forward pass 109.623 ms, 26.00 s total | |
[ 2023-10-08 02:26:11 ] Completed Epoch: 13 batch 793: backward pass 69.783 ms, 26.07 s total | |
[ 2023-10-08 02:26:11 ] Completed Epoch: 13 batch 793: computing loss 128.275 ms, 26.20 s total | |
EPOCH: [13], BATCH: [793/889], loss: 0.392, loss_box_reg: 0.120, loss_classifier: 0.100, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 793 | |
[ 2023-10-08 02:26:13 ] Completed saving temp checkpoint 1,805.598 ms, 28.01 s total | |
[ 2023-10-08 02:26:13 ] Completed replacing temp checkpoint with checkpoint 72.384 ms, 28.08 s total | |
[ 2023-10-08 02:26:13 ] Completed Epoch: 13 batch 794: moving batch data to device 4.696 ms, 28.08 s total | |
[ 2023-10-08 02:26:13 ] Completed Epoch: 13 batch 794: forward pass 105.159 ms, 28.19 s total | |
[ 2023-10-08 02:26:13 ] Completed Epoch: 13 batch 794: backward pass 69.920 ms, 28.26 s total | |
[ 2023-10-08 02:26:13 ] Completed Epoch: 13 batch 794: computing loss 130.483 ms, 28.39 s total | |
EPOCH: [13], BATCH: [794/889], loss: 0.388, loss_box_reg: 0.115, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 794 | |
[ 2023-10-08 02:26:14 ] Completed saving temp checkpoint 1,061.192 ms, 29.45 s total | |
[ 2023-10-08 02:26:14 ] Completed replacing temp checkpoint with checkpoint 46.991 ms, 29.50 s total | |
[ 2023-10-08 02:26:14 ] Completed Epoch: 13 batch 795: moving batch data to device 5.901 ms, 29.50 s total | |
[ 2023-10-08 02:26:15 ] Completed Epoch: 13 batch 795: forward pass 106.036 ms, 29.61 s total | |
[ 2023-10-08 02:26:15 ] Completed Epoch: 13 batch 795: backward pass 87.228 ms, 29.70 s total | |
[ 2023-10-08 02:26:15 ] Completed Epoch: 13 batch 795: computing loss 106.971 ms, 29.80 s total | |
EPOCH: [13], BATCH: [795/889], loss: 0.406, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.137, loss_objectness: 0.017, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 795 | |
[ 2023-10-08 02:26:16 ] Completed saving temp checkpoint 1,217.055 ms, 31.02 s total | |
[ 2023-10-08 02:26:16 ] Completed replacing temp checkpoint with checkpoint 61.465 ms, 31.08 s total | |
[ 2023-10-08 02:26:16 ] Completed Epoch: 13 batch 796: moving batch data to device 5.154 ms, 31.09 s total | |
[ 2023-10-08 02:26:16 ] Completed Epoch: 13 batch 796: forward pass 107.304 ms, 31.19 s total | |
[ 2023-10-08 02:26:16 ] Completed Epoch: 13 batch 796: backward pass 47.777 ms, 31.24 s total | |
[ 2023-10-08 02:26:16 ] Completed Epoch: 13 batch 796: computing loss 144.752 ms, 31.39 s total | |
EPOCH: [13], BATCH: [796/889], loss: 0.382, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.016, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 796 | |
[ 2023-10-08 02:26:17 ] Completed saving temp checkpoint 1,040.598 ms, 32.43 s total | |
[ 2023-10-08 02:26:17 ] Completed replacing temp checkpoint with checkpoint 50.519 ms, 32.48 s total | |
[ 2023-10-08 02:26:17 ] Completed Epoch: 13 batch 797: moving batch data to device 6.844 ms, 32.48 s total | |
[ 2023-10-08 02:26:17 ] Completed Epoch: 13 batch 797: forward pass 104.774 ms, 32.59 s total | |
[ 2023-10-08 02:26:18 ] Completed Epoch: 13 batch 797: backward pass 54.265 ms, 32.64 s total | |
[ 2023-10-08 02:26:18 ] Completed Epoch: 13 batch 797: computing loss 144.447 ms, 32.79 s total | |
EPOCH: [13], BATCH: [797/889], loss: 0.371, loss_box_reg: 0.110, loss_classifier: 0.094, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 797 | |
[ 2023-10-08 02:26:19 ] Completed saving temp checkpoint 1,207.208 ms, 34.00 s total | |
[ 2023-10-08 02:26:19 ] Completed replacing temp checkpoint with checkpoint 66.175 ms, 34.06 s total | |
[ 2023-10-08 02:26:19 ] Completed Epoch: 13 batch 798: moving batch data to device 6.521 ms, 34.07 s total | |
[ 2023-10-08 02:26:19 ] Completed Epoch: 13 batch 798: forward pass 107.656 ms, 34.18 s total | |
[ 2023-10-08 02:26:19 ] Completed Epoch: 13 batch 798: backward pass 56.871 ms, 34.23 s total | |
[ 2023-10-08 02:26:19 ] Completed Epoch: 13 batch 798: computing loss 130.064 ms, 34.36 s total | |
EPOCH: [13], BATCH: [798/889], loss: 0.402, loss_box_reg: 0.120, loss_classifier: 0.096, loss_mask: 0.139, loss_objectness: 0.016, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 798 | |
[ 2023-10-08 02:26:20 ] Completed saving temp checkpoint 1,004.668 ms, 35.37 s total | |
[ 2023-10-08 02:26:20 ] Completed replacing temp checkpoint with checkpoint 34.458 ms, 35.40 s total | |
[ 2023-10-08 02:26:20 ] Completed Epoch: 13 batch 799: moving batch data to device 5.552 ms, 35.41 s total | |
[ 2023-10-08 02:26:20 ] Completed Epoch: 13 batch 799: forward pass 106.868 ms, 35.51 s total | |
[ 2023-10-08 02:26:20 ] Completed Epoch: 13 batch 799: backward pass 45.985 ms, 35.56 s total | |
[ 2023-10-08 02:26:21 ] Completed Epoch: 13 batch 799: computing loss 145.736 ms, 35.71 s total | |
EPOCH: [13], BATCH: [799/889], loss: 0.420, loss_box_reg: 0.130, loss_classifier: 0.104, loss_mask: 0.138, loss_objectness: 0.020, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 799 | |
[ 2023-10-08 02:26:22 ] Completed saving temp checkpoint 1,138.065 ms, 36.84 s total | |
[ 2023-10-08 02:26:22 ] Completed replacing temp checkpoint with checkpoint 60.114 ms, 36.90 s total | |
[ 2023-10-08 02:26:22 ] Completed Epoch: 13 batch 800: moving batch data to device 4.718 ms, 36.91 s total | |
[ 2023-10-08 02:26:22 ] Completed Epoch: 13 batch 800: forward pass 103.390 ms, 37.01 s total | |
[ 2023-10-08 02:26:22 ] Completed Epoch: 13 batch 800: backward pass 43.151 ms, 37.06 s total | |
[ 2023-10-08 02:26:22 ] Completed Epoch: 13 batch 800: computing loss 125.583 ms, 37.18 s total | |
EPOCH: [13], BATCH: [800/889], loss: 0.372, loss_box_reg: 0.110, loss_classifier: 0.091, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 800 | |
[ 2023-10-08 02:26:23 ] Completed saving temp checkpoint 1,068.080 ms, 38.25 s total | |
[ 2023-10-08 02:26:23 ] Completed replacing temp checkpoint with checkpoint 72.516 ms, 38.32 s total | |
[ 2023-10-08 02:26:23 ] Completed Epoch: 13 batch 801: moving batch data to device 8.333 ms, 38.33 s total | |
[ 2023-10-08 02:26:23 ] Completed Epoch: 13 batch 801: forward pass 101.339 ms, 38.43 s total | |
[ 2023-10-08 02:26:23 ] Completed Epoch: 13 batch 801: backward pass 74.927 ms, 38.51 s total | |
[ 2023-10-08 02:26:24 ] Completed Epoch: 13 batch 801: computing loss 111.953 ms, 38.62 s total | |
EPOCH: [13], BATCH: [801/889], loss: 0.385, loss_box_reg: 0.115, loss_classifier: 0.098, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 801 | |
[ 2023-10-08 02:26:25 ] Completed saving temp checkpoint 1,529.212 ms, 40.15 s total | |
[ 2023-10-08 02:26:25 ] Completed replacing temp checkpoint with checkpoint 75.241 ms, 40.22 s total | |
[ 2023-10-08 02:26:25 ] Completed Epoch: 13 batch 802: moving batch data to device 8.201 ms, 40.23 s total | |
[ 2023-10-08 02:26:25 ] Completed Epoch: 13 batch 802: forward pass 105.243 ms, 40.34 s total | |
[ 2023-10-08 02:26:25 ] Completed Epoch: 13 batch 802: backward pass 82.027 ms, 40.42 s total | |
[ 2023-10-08 02:26:25 ] Completed Epoch: 13 batch 802: computing loss 89.041 ms, 40.51 s total | |
EPOCH: [13], BATCH: [802/889], loss: 0.393, loss_box_reg: 0.122, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 802 | |
[ 2023-10-08 02:26:27 ] Completed saving temp checkpoint 1,418.284 ms, 41.93 s total | |
[ 2023-10-08 02:26:27 ] Completed replacing temp checkpoint with checkpoint 83.941 ms, 42.01 s total | |
[ 2023-10-08 02:26:27 ] Completed Epoch: 13 batch 803: moving batch data to device 8.324 ms, 42.02 s total | |
[ 2023-10-08 02:26:27 ] Completed Epoch: 13 batch 803: forward pass 107.593 ms, 42.12 s total | |
[ 2023-10-08 02:26:27 ] Completed Epoch: 13 batch 803: backward pass 68.038 ms, 42.19 s total | |
[ 2023-10-08 02:26:27 ] Completed Epoch: 13 batch 803: computing loss 129.408 ms, 42.32 s total | |
EPOCH: [13], BATCH: [803/889], loss: 0.339, loss_box_reg: 0.101, loss_classifier: 0.084, loss_mask: 0.115, loss_objectness: 0.013, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 803 | |
[ 2023-10-08 02:26:29 ] Completed saving temp checkpoint 1,907.435 ms, 44.23 s total | |
[ 2023-10-08 02:26:29 ] Completed replacing temp checkpoint with checkpoint 99.468 ms, 44.33 s total | |
[ 2023-10-08 02:26:29 ] Completed Epoch: 13 batch 804: moving batch data to device 7.710 ms, 44.34 s total | |
[ 2023-10-08 02:26:29 ] Completed Epoch: 13 batch 804: forward pass 103.227 ms, 44.44 s total | |
[ 2023-10-08 02:26:29 ] Completed Epoch: 13 batch 804: backward pass 37.987 ms, 44.48 s total | |
[ 2023-10-08 02:26:30 ] Completed Epoch: 13 batch 804: computing loss 155.717 ms, 44.63 s total | |
EPOCH: [13], BATCH: [804/889], loss: 0.354, loss_box_reg: 0.106, loss_classifier: 0.092, loss_mask: 0.122, loss_objectness: 0.012, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 804 | |
[ 2023-10-08 02:26:31 ] Completed saving temp checkpoint 1,479.754 ms, 46.11 s total | |
[ 2023-10-08 02:26:31 ] Completed replacing temp checkpoint with checkpoint 97.524 ms, 46.21 s total | |
[ 2023-10-08 02:26:31 ] Completed Epoch: 13 batch 805: moving batch data to device 7.410 ms, 46.22 s total | |
[ 2023-10-08 02:26:31 ] Completed Epoch: 13 batch 805: forward pass 109.655 ms, 46.33 s total | |
[ 2023-10-08 02:26:31 ] Completed Epoch: 13 batch 805: backward pass 73.832 ms, 46.40 s total | |
[ 2023-10-08 02:26:31 ] Completed Epoch: 13 batch 805: computing loss 123.169 ms, 46.53 s total | |
EPOCH: [13], BATCH: [805/889], loss: 0.366, loss_box_reg: 0.108, loss_classifier: 0.089, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 805 | |
[ 2023-10-08 02:26:33 ] Completed saving temp checkpoint 1,479.033 ms, 48.00 s total | |
[ 2023-10-08 02:26:33 ] Completed replacing temp checkpoint with checkpoint 89.870 ms, 48.09 s total | |
[ 2023-10-08 02:26:33 ] Completed Epoch: 13 batch 806: moving batch data to device 6.765 ms, 48.10 s total | |
[ 2023-10-08 02:26:33 ] Completed Epoch: 13 batch 806: forward pass 106.857 ms, 48.21 s total | |
[ 2023-10-08 02:26:33 ] Completed Epoch: 13 batch 806: backward pass 66.236 ms, 48.27 s total | |
[ 2023-10-08 02:26:33 ] Completed Epoch: 13 batch 806: computing loss 133.700 ms, 48.41 s total | |
EPOCH: [13], BATCH: [806/889], loss: 0.388, loss_box_reg: 0.119, loss_classifier: 0.094, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 806 | |
[ 2023-10-08 02:26:34 ] Completed saving temp checkpoint 943.396 ms, 49.35 s total | |
[ 2023-10-08 02:26:34 ] Completed replacing temp checkpoint with checkpoint 46.622 ms, 49.40 s total | |
[ 2023-10-08 02:26:34 ] Completed Epoch: 13 batch 807: moving batch data to device 8.149 ms, 49.41 s total | |
[ 2023-10-08 02:26:34 ] Completed Epoch: 13 batch 807: forward pass 106.590 ms, 49.51 s total | |
[ 2023-10-08 02:26:34 ] Completed Epoch: 13 batch 807: backward pass 79.566 ms, 49.59 s total | |
[ 2023-10-08 02:26:35 ] Completed Epoch: 13 batch 807: computing loss 108.155 ms, 49.70 s total | |
EPOCH: [13], BATCH: [807/889], loss: 0.399, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.128, loss_objectness: 0.019, loss_rpn_box_reg: 0.036 | |
Saving checkpoint at epoch 13 train batch 807 | |
[ 2023-10-08 02:26:36 ] Completed saving temp checkpoint 1,050.638 ms, 50.75 s total | |
[ 2023-10-08 02:26:36 ] Completed replacing temp checkpoint with checkpoint 82.395 ms, 50.83 s total | |
[ 2023-10-08 02:26:36 ] Completed Epoch: 13 batch 808: moving batch data to device 7.323 ms, 50.84 s total | |
[ 2023-10-08 02:26:36 ] Completed Epoch: 13 batch 808: forward pass 104.392 ms, 50.94 s total | |
[ 2023-10-08 02:26:36 ] Completed Epoch: 13 batch 808: backward pass 66.547 ms, 51.01 s total | |
[ 2023-10-08 02:26:36 ] Completed Epoch: 13 batch 808: computing loss 119.460 ms, 51.13 s total | |
EPOCH: [13], BATCH: [808/889], loss: 0.403, loss_box_reg: 0.129, loss_classifier: 0.095, loss_mask: 0.139, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 808 | |
[ 2023-10-08 02:26:37 ] Completed saving temp checkpoint 972.568 ms, 52.10 s total | |
[ 2023-10-08 02:26:37 ] Completed replacing temp checkpoint with checkpoint 50.536 ms, 52.15 s total | |
[ 2023-10-08 02:26:37 ] Completed Epoch: 13 batch 809: moving batch data to device 4.591 ms, 52.16 s total | |
[ 2023-10-08 02:26:37 ] Completed Epoch: 13 batch 809: forward pass 132.686 ms, 52.29 s total | |
[ 2023-10-08 02:26:37 ] Completed Epoch: 13 batch 809: backward pass 71.356 ms, 52.36 s total | |
[ 2023-10-08 02:26:37 ] Completed Epoch: 13 batch 809: computing loss 110.183 ms, 52.47 s total | |
EPOCH: [13], BATCH: [809/889], loss: 0.393, loss_box_reg: 0.123, loss_classifier: 0.095, loss_mask: 0.127, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 809 | |
[ 2023-10-08 02:26:38 ] Completed saving temp checkpoint 1,103.840 ms, 53.58 s total | |
[ 2023-10-08 02:26:39 ] Completed replacing temp checkpoint with checkpoint 60.823 ms, 53.64 s total | |
[ 2023-10-08 02:26:39 ] Completed Epoch: 13 batch 810: moving batch data to device 5.635 ms, 53.64 s total | |
[ 2023-10-08 02:26:39 ] Completed Epoch: 13 batch 810: forward pass 105.186 ms, 53.75 s total | |
[ 2023-10-08 02:26:39 ] Completed Epoch: 13 batch 810: backward pass 37.261 ms, 53.79 s total | |
[ 2023-10-08 02:26:39 ] Completed Epoch: 13 batch 810: computing loss 151.887 ms, 53.94 s total | |
EPOCH: [13], BATCH: [810/889], loss: 0.410, loss_box_reg: 0.126, loss_classifier: 0.105, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 810 | |
[ 2023-10-08 02:26:40 ] Completed saving temp checkpoint 1,016.028 ms, 54.95 s total | |
[ 2023-10-08 02:26:40 ] Completed replacing temp checkpoint with checkpoint 69.132 ms, 55.02 s total | |
[ 2023-10-08 02:26:40 ] Completed Epoch: 13 batch 811: moving batch data to device 5.990 ms, 55.03 s total | |
[ 2023-10-08 02:26:40 ] Completed Epoch: 13 batch 811: forward pass 101.741 ms, 55.13 s total | |
[ 2023-10-08 02:26:40 ] Completed Epoch: 13 batch 811: backward pass 69.973 ms, 55.20 s total | |
[ 2023-10-08 02:26:40 ] Completed Epoch: 13 batch 811: computing loss 129.029 ms, 55.33 s total | |
EPOCH: [13], BATCH: [811/889], loss: 0.399, loss_box_reg: 0.121, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.020, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 811 | |
[ 2023-10-08 02:26:41 ] Completed saving temp checkpoint 1,128.455 ms, 56.46 s total | |
[ 2023-10-08 02:26:41 ] Completed replacing temp checkpoint with checkpoint 32.398 ms, 56.49 s total | |
[ 2023-10-08 02:26:41 ] Completed Epoch: 13 batch 812: moving batch data to device 7.400 ms, 56.50 s total | |
[ 2023-10-08 02:26:42 ] Completed Epoch: 13 batch 812: forward pass 107.533 ms, 56.61 s total | |
[ 2023-10-08 02:26:42 ] Completed Epoch: 13 batch 812: backward pass 75.779 ms, 56.68 s total | |
[ 2023-10-08 02:26:42 ] Completed Epoch: 13 batch 812: computing loss 120.474 ms, 56.80 s total | |
EPOCH: [13], BATCH: [812/889], loss: 0.382, loss_box_reg: 0.116, loss_classifier: 0.096, loss_mask: 0.132, loss_objectness: 0.013, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 812 | |
[ 2023-10-08 02:26:43 ] Completed saving temp checkpoint 1,009.791 ms, 57.81 s total | |
[ 2023-10-08 02:26:43 ] Completed replacing temp checkpoint with checkpoint 49.024 ms, 57.86 s total | |
[ 2023-10-08 02:26:43 ] Completed Epoch: 13 batch 813: moving batch data to device 3.616 ms, 57.86 s total | |
[ 2023-10-08 02:26:43 ] Completed Epoch: 13 batch 813: forward pass 100.764 ms, 57.96 s total | |
[ 2023-10-08 02:26:43 ] Completed Epoch: 13 batch 813: backward pass 79.968 ms, 58.04 s total | |
[ 2023-10-08 02:26:43 ] Completed Epoch: 13 batch 813: computing loss 110.746 ms, 58.16 s total | |
EPOCH: [13], BATCH: [813/889], loss: 0.413, loss_box_reg: 0.124, loss_classifier: 0.104, loss_mask: 0.143, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 813 | |
[ 2023-10-08 02:26:44 ] Completed saving temp checkpoint 1,151.599 ms, 59.31 s total | |
[ 2023-10-08 02:26:44 ] Completed replacing temp checkpoint with checkpoint 82.688 ms, 59.39 s total | |
[ 2023-10-08 02:26:44 ] Completed Epoch: 13 batch 814: moving batch data to device 8.034 ms, 59.40 s total | |
[ 2023-10-08 02:26:44 ] Completed Epoch: 13 batch 814: forward pass 104.091 ms, 59.50 s total | |
[ 2023-10-08 02:26:44 ] Completed Epoch: 13 batch 814: backward pass 38.700 ms, 59.54 s total | |
[ 2023-10-08 02:26:45 ] Completed Epoch: 13 batch 814: computing loss 157.299 ms, 59.70 s total | |
EPOCH: [13], BATCH: [814/889], loss: 0.393, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 814 | |
[ 2023-10-08 02:26:46 ] Completed saving temp checkpoint 1,260.269 ms, 60.96 s total | |
[ 2023-10-08 02:26:46 ] Completed replacing temp checkpoint with checkpoint 52.901 ms, 61.01 s total | |
[ 2023-10-08 02:26:46 ] Completed Epoch: 13 batch 815: moving batch data to device 5.539 ms, 61.02 s total | |
[ 2023-10-08 02:26:46 ] Completed Epoch: 13 batch 815: forward pass 109.539 ms, 61.13 s total | |
[ 2023-10-08 02:26:46 ] Completed Epoch: 13 batch 815: backward pass 77.919 ms, 61.20 s total | |
[ 2023-10-08 02:26:46 ] Completed Epoch: 13 batch 815: computing loss 114.431 ms, 61.32 s total | |
EPOCH: [13], BATCH: [815/889], loss: 0.368, loss_box_reg: 0.112, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 815 | |
[ 2023-10-08 02:26:48 ] Completed saving temp checkpoint 1,983.797 ms, 63.30 s total | |
[ 2023-10-08 02:26:48 ] Completed replacing temp checkpoint with checkpoint 82.546 ms, 63.38 s total | |
[ 2023-10-08 02:26:48 ] Completed Epoch: 13 batch 816: moving batch data to device 7.523 ms, 63.39 s total | |
[ 2023-10-08 02:26:48 ] Completed Epoch: 13 batch 816: forward pass 104.256 ms, 63.50 s total | |
[ 2023-10-08 02:26:48 ] Completed Epoch: 13 batch 816: backward pass 66.891 ms, 63.56 s total | |
[ 2023-10-08 02:26:49 ] Completed Epoch: 13 batch 816: computing loss 120.948 ms, 63.68 s total | |
EPOCH: [13], BATCH: [816/889], loss: 0.405, loss_box_reg: 0.120, loss_classifier: 0.108, loss_mask: 0.131, loss_objectness: 0.019, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 816 | |
[ 2023-10-08 02:26:50 ] Completed saving temp checkpoint 1,547.277 ms, 65.23 s total | |
[ 2023-10-08 02:26:50 ] Completed replacing temp checkpoint with checkpoint 39.171 ms, 65.27 s total | |
[ 2023-10-08 02:26:50 ] Completed Epoch: 13 batch 817: moving batch data to device 5.060 ms, 65.28 s total | |
[ 2023-10-08 02:26:50 ] Completed Epoch: 13 batch 817: forward pass 107.186 ms, 65.38 s total | |
[ 2023-10-08 02:26:50 ] Completed Epoch: 13 batch 817: backward pass 82.853 ms, 65.47 s total | |
[ 2023-10-08 02:26:50 ] Completed Epoch: 13 batch 817: computing loss 111.663 ms, 65.58 s total | |
EPOCH: [13], BATCH: [817/889], loss: 0.382, loss_box_reg: 0.113, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 817 | |
[ 2023-10-08 02:26:52 ] Completed saving temp checkpoint 1,914.849 ms, 67.49 s total | |
[ 2023-10-08 02:26:52 ] Completed replacing temp checkpoint with checkpoint 95.059 ms, 67.59 s total | |
[ 2023-10-08 02:26:52 ] Completed Epoch: 13 batch 818: moving batch data to device 7.368 ms, 67.59 s total | |
[ 2023-10-08 02:26:53 ] Completed Epoch: 13 batch 818: forward pass 106.965 ms, 67.70 s total | |
[ 2023-10-08 02:26:53 ] Completed Epoch: 13 batch 818: backward pass 71.189 ms, 67.77 s total | |
[ 2023-10-08 02:26:53 ] Completed Epoch: 13 batch 818: computing loss 96.764 ms, 67.87 s total | |
EPOCH: [13], BATCH: [818/889], loss: 0.337, loss_box_reg: 0.095, loss_classifier: 0.084, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 818 | |
[ 2023-10-08 02:26:54 ] Completed saving temp checkpoint 1,034.467 ms, 68.90 s total | |
[ 2023-10-08 02:26:54 ] Completed replacing temp checkpoint with checkpoint 63.347 ms, 68.97 s total | |
[ 2023-10-08 02:26:54 ] Completed Epoch: 13 batch 819: moving batch data to device 5.018 ms, 68.97 s total | |
[ 2023-10-08 02:26:54 ] Completed Epoch: 13 batch 819: forward pass 101.666 ms, 69.07 s total | |
[ 2023-10-08 02:26:54 ] Completed Epoch: 13 batch 819: backward pass 48.865 ms, 69.12 s total | |
[ 2023-10-08 02:26:54 ] Completed Epoch: 13 batch 819: computing loss 136.163 ms, 69.26 s total | |
EPOCH: [13], BATCH: [819/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.103, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 819 | |
[ 2023-10-08 02:26:55 ] Completed saving temp checkpoint 1,111.956 ms, 70.37 s total | |
[ 2023-10-08 02:26:55 ] Completed replacing temp checkpoint with checkpoint 49.617 ms, 70.42 s total | |
[ 2023-10-08 02:26:55 ] Completed Epoch: 13 batch 820: moving batch data to device 6.188 ms, 70.43 s total | |
[ 2023-10-08 02:26:55 ] Completed Epoch: 13 batch 820: forward pass 103.665 ms, 70.53 s total | |
[ 2023-10-08 02:26:55 ] Completed Epoch: 13 batch 820: backward pass 43.804 ms, 70.57 s total | |
[ 2023-10-08 02:26:56 ] Completed Epoch: 13 batch 820: computing loss 143.786 ms, 70.72 s total | |
EPOCH: [13], BATCH: [820/889], loss: 0.364, loss_box_reg: 0.109, loss_classifier: 0.087, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 820 | |
[ 2023-10-08 02:26:57 ] Completed saving temp checkpoint 1,010.743 ms, 71.73 s total | |
[ 2023-10-08 02:26:57 ] Completed replacing temp checkpoint with checkpoint 71.493 ms, 71.80 s total | |
[ 2023-10-08 02:26:57 ] Completed Epoch: 13 batch 821: moving batch data to device 7.793 ms, 71.81 s total | |
[ 2023-10-08 02:26:57 ] Completed Epoch: 13 batch 821: forward pass 101.827 ms, 71.91 s total | |
[ 2023-10-08 02:26:57 ] Completed Epoch: 13 batch 821: backward pass 79.941 ms, 71.99 s total | |
[ 2023-10-08 02:26:57 ] Completed Epoch: 13 batch 821: computing loss 87.822 ms, 72.08 s total | |
EPOCH: [13], BATCH: [821/889], loss: 0.406, loss_box_reg: 0.122, loss_classifier: 0.104, loss_mask: 0.138, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 821 | |
[ 2023-10-08 02:26:58 ] Completed saving temp checkpoint 1,510.073 ms, 73.59 s total | |
[ 2023-10-08 02:26:59 ] Completed replacing temp checkpoint with checkpoint 61.659 ms, 73.65 s total | |
[ 2023-10-08 02:26:59 ] Completed Epoch: 13 batch 822: moving batch data to device 5.408 ms, 73.65 s total | |
[ 2023-10-08 02:26:59 ] Completed Epoch: 13 batch 822: forward pass 109.814 ms, 73.76 s total | |
[ 2023-10-08 02:26:59 ] Completed Epoch: 13 batch 822: backward pass 72.975 ms, 73.84 s total | |
[ 2023-10-08 02:26:59 ] Completed Epoch: 13 batch 822: computing loss 117.050 ms, 73.95 s total | |
EPOCH: [13], BATCH: [822/889], loss: 0.388, loss_box_reg: 0.118, loss_classifier: 0.105, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 822 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 02:40:24 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:40:24 ] Completed importing Timer 0.025 ms, 0.00 s total | |
[ 2023-10-08 02:40:25 ] Completed importing everything else 528.271 ms, 0.53 s total | |
[ 2023-10-08 02:40:25 ] Completed defined other functions 0.025 ms, 0.53 s total | |
| distributed init (rank 5): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 1): env:// | |
| distributed init (rank 0): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 02:40:27 ] Completed main preliminaries 2,791.290 ms, 3.32 s total | |
loading annotations into memory... | |
Done (t=11.34s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.28s) | |
creating index... | |
index created! | |
[ 2023-10-08 02:40:41 ] Completed loading data 13,291.573 ms, 16.61 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 02:40:41 ] Completed creating data samplers 106.787 ms, 16.72 s total | |
[ 2023-10-08 02:40:41 ] Completed creating data loaders 0.260 ms, 16.72 s total | |
[ 2023-10-08 02:40:41 ] Completed creating model and .to(device) 648.079 ms, 17.37 s total | |
[ 2023-10-08 02:40:42 ] Completed preparing model for distributed training 746.923 ms, 18.11 s total | |
[ 2023-10-08 02:40:42 ] Completed optimizer and scaler 0.638 ms, 18.11 s total | |
[ 2023-10-08 02:40:42 ] Completed learning rate schedulers 0.239 ms, 18.11 s total | |
[ 2023-10-08 02:40:43 ] Completed init coco evaluator 971.612 ms, 19.09 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 02:40:44 ] Completed retrieving checkpoint 862.084 ms, 19.95 s total | |
EPOCH :: 13 | |
[ 2023-10-08 02:40:44 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:40:44 ] Completed training preliminaries 0.903 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 822 | |
[ 2023-10-08 02:40:45 ] Completed Epoch: 13 batch 822: moving batch data to device 545.213 ms, 0.55 s total | |
[ 2023-10-08 02:40:46 ] Completed Epoch: 13 batch 822: forward pass 1,098.499 ms, 1.64 s total | |
[ 2023-10-08 02:40:46 ] Completed Epoch: 13 batch 822: backward pass 175.768 ms, 1.82 s total | |
[ 2023-10-08 02:40:46 ] Completed Epoch: 13 batch 822: computing loss 341.749 ms, 2.16 s total | |
EPOCH: [13], BATCH: [822/889], loss: 0.387, loss_box_reg: 0.118, loss_classifier: 0.104, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 822 | |
[ 2023-10-08 02:40:47 ] Completed saving temp checkpoint 985.608 ms, 3.15 s total | |
[ 2023-10-08 02:40:47 ] Completed replacing temp checkpoint with checkpoint 164.616 ms, 3.31 s total | |
[ 2023-10-08 02:40:47 ] Completed Epoch: 13 batch 823: moving batch data to device 3.042 ms, 3.32 s total | |
[ 2023-10-08 02:40:47 ] Completed Epoch: 13 batch 823: forward pass 110.312 ms, 3.43 s total | |
[ 2023-10-08 02:40:47 ] Completed Epoch: 13 batch 823: backward pass 71.621 ms, 3.50 s total | |
[ 2023-10-08 02:40:48 ] Completed Epoch: 13 batch 823: computing loss 150.115 ms, 3.65 s total | |
EPOCH: [13], BATCH: [823/889], loss: 0.383, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 823 | |
[ 2023-10-08 02:40:49 ] Completed saving temp checkpoint 891.374 ms, 4.54 s total | |
[ 2023-10-08 02:40:49 ] Completed replacing temp checkpoint with checkpoint 67.526 ms, 4.61 s total | |
[ 2023-10-08 02:40:49 ] Completed Epoch: 13 batch 824: moving batch data to device 3.458 ms, 4.61 s total | |
[ 2023-10-08 02:40:49 ] Completed Epoch: 13 batch 824: forward pass 113.747 ms, 4.72 s total | |
[ 2023-10-08 02:40:49 ] Completed Epoch: 13 batch 824: backward pass 115.117 ms, 4.84 s total | |
[ 2023-10-08 02:40:49 ] Completed Epoch: 13 batch 824: computing loss 103.215 ms, 4.94 s total | |
EPOCH: [13], BATCH: [824/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.106, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 824 | |
[ 2023-10-08 02:40:50 ] Completed saving temp checkpoint 984.908 ms, 5.93 s total | |
[ 2023-10-08 02:40:50 ] Completed replacing temp checkpoint with checkpoint 80.284 ms, 6.01 s total | |
[ 2023-10-08 02:40:50 ] Completed Epoch: 13 batch 825: moving batch data to device 5.979 ms, 6.01 s total | |
[ 2023-10-08 02:40:50 ] Completed Epoch: 13 batch 825: forward pass 112.484 ms, 6.13 s total | |
[ 2023-10-08 02:40:50 ] Completed Epoch: 13 batch 825: backward pass 66.146 ms, 6.19 s total | |
[ 2023-10-08 02:40:50 ] Completed Epoch: 13 batch 825: computing loss 148.216 ms, 6.34 s total | |
EPOCH: [13], BATCH: [825/889], loss: 0.386, loss_box_reg: 0.117, loss_classifier: 0.099, loss_mask: 0.125, loss_objectness: 0.017, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 825 | |
[ 2023-10-08 02:40:51 ] Completed saving temp checkpoint 755.174 ms, 7.10 s total | |
[ 2023-10-08 02:40:51 ] Completed replacing temp checkpoint with checkpoint 45.923 ms, 7.14 s total | |
[ 2023-10-08 02:40:51 ] Completed Epoch: 13 batch 826: moving batch data to device 14.290 ms, 7.16 s total | |
[ 2023-10-08 02:40:51 ] Completed Epoch: 13 batch 826: forward pass 104.859 ms, 7.26 s total | |
[ 2023-10-08 02:40:51 ] Completed Epoch: 13 batch 826: backward pass 93.229 ms, 7.35 s total | |
[ 2023-10-08 02:40:51 ] Completed Epoch: 13 batch 826: computing loss 121.982 ms, 7.48 s total | |
EPOCH: [13], BATCH: [826/889], loss: 0.379, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 826 | |
[ 2023-10-08 02:40:52 ] Completed saving temp checkpoint 755.437 ms, 8.23 s total | |
[ 2023-10-08 02:40:52 ] Completed replacing temp checkpoint with checkpoint 54.361 ms, 8.29 s total | |
[ 2023-10-08 02:40:52 ] Completed Epoch: 13 batch 827: moving batch data to device 3.854 ms, 8.29 s total | |
[ 2023-10-08 02:40:52 ] Completed Epoch: 13 batch 827: forward pass 109.785 ms, 8.40 s total | |
[ 2023-10-08 02:40:52 ] Completed Epoch: 13 batch 827: backward pass 70.907 ms, 8.47 s total | |
[ 2023-10-08 02:40:53 ] Completed Epoch: 13 batch 827: computing loss 142.617 ms, 8.61 s total | |
EPOCH: [13], BATCH: [827/889], loss: 0.428, loss_box_reg: 0.133, loss_classifier: 0.109, loss_mask: 0.141, loss_objectness: 0.018, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 827 | |
[ 2023-10-08 02:40:53 ] Completed saving temp checkpoint 773.102 ms, 9.39 s total | |
[ 2023-10-08 02:40:53 ] Completed replacing temp checkpoint with checkpoint 54.793 ms, 9.44 s total | |
[ 2023-10-08 02:40:53 ] Completed Epoch: 13 batch 828: moving batch data to device 4.061 ms, 9.44 s total | |
[ 2023-10-08 02:40:54 ] Completed Epoch: 13 batch 828: forward pass 106.652 ms, 9.55 s total | |
[ 2023-10-08 02:40:54 ] Completed Epoch: 13 batch 828: backward pass 69.128 ms, 9.62 s total | |
[ 2023-10-08 02:40:54 ] Completed Epoch: 13 batch 828: computing loss 140.697 ms, 9.76 s total | |
EPOCH: [13], BATCH: [828/889], loss: 0.408, loss_box_reg: 0.125, loss_classifier: 0.099, loss_mask: 0.135, loss_objectness: 0.018, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 828 | |
[ 2023-10-08 02:40:55 ] Completed saving temp checkpoint 1,215.317 ms, 10.98 s total | |
[ 2023-10-08 02:40:55 ] Completed replacing temp checkpoint with checkpoint 71.579 ms, 11.05 s total | |
[ 2023-10-08 02:40:55 ] Completed Epoch: 13 batch 829: moving batch data to device 6.765 ms, 11.05 s total | |
[ 2023-10-08 02:40:55 ] Completed Epoch: 13 batch 829: forward pass 105.732 ms, 11.16 s total | |
[ 2023-10-08 02:40:55 ] Completed Epoch: 13 batch 829: backward pass 75.999 ms, 11.24 s total | |
[ 2023-10-08 02:40:55 ] Completed Epoch: 13 batch 829: computing loss 120.688 ms, 11.36 s total | |
EPOCH: [13], BATCH: [829/889], loss: 0.354, loss_box_reg: 0.105, loss_classifier: 0.093, loss_mask: 0.118, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 829 | |
[ 2023-10-08 02:40:57 ] Completed saving temp checkpoint 1,478.132 ms, 12.83 s total | |
[ 2023-10-08 02:40:57 ] Completed replacing temp checkpoint with checkpoint 103.960 ms, 12.94 s total | |
[ 2023-10-08 02:40:57 ] Completed Epoch: 13 batch 830: moving batch data to device 3.460 ms, 12.94 s total | |
[ 2023-10-08 02:40:57 ] Completed Epoch: 13 batch 830: forward pass 106.327 ms, 13.05 s total | |
[ 2023-10-08 02:40:57 ] Completed Epoch: 13 batch 830: backward pass 72.184 ms, 13.12 s total | |
[ 2023-10-08 02:40:57 ] Completed Epoch: 13 batch 830: computing loss 128.551 ms, 13.25 s total | |
EPOCH: [13], BATCH: [830/889], loss: 0.391, loss_box_reg: 0.113, loss_classifier: 0.099, loss_mask: 0.127, loss_objectness: 0.020, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 830 | |
[ 2023-10-08 02:40:59 ] Completed saving temp checkpoint 1,793.859 ms, 15.04 s total | |
[ 2023-10-08 02:40:59 ] Completed replacing temp checkpoint with checkpoint 108.564 ms, 15.15 s total | |
[ 2023-10-08 02:40:59 ] Completed Epoch: 13 batch 831: moving batch data to device 7.232 ms, 15.16 s total | |
[ 2023-10-08 02:40:59 ] Completed Epoch: 13 batch 831: forward pass 106.903 ms, 15.27 s total | |
[ 2023-10-08 02:40:59 ] Completed Epoch: 13 batch 831: backward pass 79.642 ms, 15.35 s total | |
[ 2023-10-08 02:41:00 ] Completed Epoch: 13 batch 831: computing loss 307.734 ms, 15.65 s total | |
EPOCH: [13], BATCH: [831/889], loss: 0.363, loss_box_reg: 0.111, loss_classifier: 0.090, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 831 | |
[ 2023-10-08 02:41:01 ] Completed saving temp checkpoint 1,034.901 ms, 16.69 s total | |
[ 2023-10-08 02:41:01 ] Completed replacing temp checkpoint with checkpoint 61.668 ms, 16.75 s total | |
[ 2023-10-08 02:41:01 ] Completed Epoch: 13 batch 832: moving batch data to device 5.433 ms, 16.76 s total | |
[ 2023-10-08 02:41:01 ] Completed Epoch: 13 batch 832: forward pass 105.231 ms, 16.86 s total | |
[ 2023-10-08 02:41:01 ] Completed Epoch: 13 batch 832: backward pass 79.684 ms, 16.94 s total | |
[ 2023-10-08 02:41:01 ] Completed Epoch: 13 batch 832: computing loss 118.137 ms, 17.06 s total | |
EPOCH: [13], BATCH: [832/889], loss: 0.368, loss_box_reg: 0.107, loss_classifier: 0.093, loss_mask: 0.125, loss_objectness: 0.017, loss_rpn_box_reg: 0.026 | |
Saving checkpoint at epoch 13 train batch 832 | |
[ 2023-10-08 02:41:02 ] Completed saving temp checkpoint 1,194.090 ms, 18.25 s total | |
[ 2023-10-08 02:41:02 ] Completed replacing temp checkpoint with checkpoint 63.585 ms, 18.32 s total | |
[ 2023-10-08 02:41:02 ] Completed Epoch: 13 batch 833: moving batch data to device 6.940 ms, 18.32 s total | |
[ 2023-10-08 02:41:02 ] Completed Epoch: 13 batch 833: forward pass 101.426 ms, 18.42 s total | |
[ 2023-10-08 02:41:02 ] Completed Epoch: 13 batch 833: backward pass 29.383 ms, 18.45 s total | |
[ 2023-10-08 02:41:03 ] Completed Epoch: 13 batch 833: computing loss 157.103 ms, 18.61 s total | |
EPOCH: [13], BATCH: [833/889], loss: 0.386, loss_box_reg: 0.111, loss_classifier: 0.093, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 833 | |
[ 2023-10-08 02:41:04 ] Completed saving temp checkpoint 1,820.994 ms, 20.43 s total | |
[ 2023-10-08 02:41:04 ] Completed replacing temp checkpoint with checkpoint 63.041 ms, 20.49 s total | |
[ 2023-10-08 02:41:04 ] Completed Epoch: 13 batch 834: moving batch data to device 6.277 ms, 20.50 s total | |
[ 2023-10-08 02:41:05 ] Completed Epoch: 13 batch 834: forward pass 106.916 ms, 20.61 s total | |
[ 2023-10-08 02:41:05 ] Completed Epoch: 13 batch 834: backward pass 76.387 ms, 20.68 s total | |
[ 2023-10-08 02:41:05 ] Completed Epoch: 13 batch 834: computing loss 118.289 ms, 20.80 s total | |
EPOCH: [13], BATCH: [834/889], loss: 0.392, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 834 | |
[ 2023-10-08 02:41:06 ] Completed saving temp checkpoint 1,453.132 ms, 22.26 s total | |
[ 2023-10-08 02:41:06 ] Completed replacing temp checkpoint with checkpoint 68.477 ms, 22.32 s total | |
[ 2023-10-08 02:41:06 ] Completed Epoch: 13 batch 835: moving batch data to device 6.913 ms, 22.33 s total | |
[ 2023-10-08 02:41:06 ] Completed Epoch: 13 batch 835: forward pass 106.877 ms, 22.44 s total | |
[ 2023-10-08 02:41:06 ] Completed Epoch: 13 batch 835: backward pass 73.181 ms, 22.51 s total | |
[ 2023-10-08 02:41:07 ] Completed Epoch: 13 batch 835: computing loss 122.793 ms, 22.63 s total | |
EPOCH: [13], BATCH: [835/889], loss: 0.361, loss_box_reg: 0.108, loss_classifier: 0.092, loss_mask: 0.123, loss_objectness: 0.013, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 835 | |
[ 2023-10-08 02:41:08 ] Completed saving temp checkpoint 1,084.111 ms, 23.72 s total | |
[ 2023-10-08 02:41:08 ] Completed replacing temp checkpoint with checkpoint 46.109 ms, 23.76 s total | |
[ 2023-10-08 02:41:08 ] Completed Epoch: 13 batch 836: moving batch data to device 7.955 ms, 23.77 s total | |
[ 2023-10-08 02:41:08 ] Completed Epoch: 13 batch 836: forward pass 104.733 ms, 23.88 s total | |
[ 2023-10-08 02:41:08 ] Completed Epoch: 13 batch 836: backward pass 80.728 ms, 23.96 s total | |
[ 2023-10-08 02:41:08 ] Completed Epoch: 13 batch 836: computing loss 113.490 ms, 24.07 s total | |
EPOCH: [13], BATCH: [836/889], loss: 0.401, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 836 | |
[ 2023-10-08 02:41:09 ] Completed saving temp checkpoint 1,201.190 ms, 25.27 s total | |
[ 2023-10-08 02:41:09 ] Completed replacing temp checkpoint with checkpoint 66.010 ms, 25.34 s total | |
[ 2023-10-08 02:41:09 ] Completed Epoch: 13 batch 837: moving batch data to device 6.065 ms, 25.34 s total | |
[ 2023-10-08 02:41:09 ] Completed Epoch: 13 batch 837: forward pass 106.865 ms, 25.45 s total | |
[ 2023-10-08 02:41:10 ] Completed Epoch: 13 batch 837: backward pass 77.581 ms, 25.53 s total | |
[ 2023-10-08 02:41:10 ] Completed Epoch: 13 batch 837: computing loss 118.921 ms, 25.65 s total | |
EPOCH: [13], BATCH: [837/889], loss: 0.404, loss_box_reg: 0.120, loss_classifier: 0.106, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 837 | |
[ 2023-10-08 02:41:11 ] Completed saving temp checkpoint 1,079.150 ms, 26.73 s total | |
[ 2023-10-08 02:41:11 ] Completed replacing temp checkpoint with checkpoint 84.861 ms, 26.81 s total | |
[ 2023-10-08 02:41:11 ] Completed Epoch: 13 batch 838: moving batch data to device 7.596 ms, 26.82 s total | |
[ 2023-10-08 02:41:11 ] Completed Epoch: 13 batch 838: forward pass 104.497 ms, 26.92 s total | |
[ 2023-10-08 02:41:11 ] Completed Epoch: 13 batch 838: backward pass 73.936 ms, 27.00 s total | |
[ 2023-10-08 02:41:11 ] Completed Epoch: 13 batch 838: computing loss 123.188 ms, 27.12 s total | |
EPOCH: [13], BATCH: [838/889], loss: 0.410, loss_box_reg: 0.125, loss_classifier: 0.107, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 838 | |
[ 2023-10-08 02:41:13 ] Completed saving temp checkpoint 1,683.567 ms, 28.80 s total | |
[ 2023-10-08 02:41:13 ] Completed replacing temp checkpoint with checkpoint 72.235 ms, 28.88 s total | |
[ 2023-10-08 02:41:13 ] Completed Epoch: 13 batch 839: moving batch data to device 7.313 ms, 28.88 s total | |
[ 2023-10-08 02:41:13 ] Completed Epoch: 13 batch 839: forward pass 103.041 ms, 28.99 s total | |
[ 2023-10-08 02:41:13 ] Completed Epoch: 13 batch 839: backward pass 69.082 ms, 29.06 s total | |
[ 2023-10-08 02:41:13 ] Completed Epoch: 13 batch 839: computing loss 174.886 ms, 29.23 s total | |
EPOCH: [13], BATCH: [839/889], loss: 0.361, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 839 | |
[ 2023-10-08 02:41:15 ] Completed saving temp checkpoint 1,718.643 ms, 30.95 s total | |
[ 2023-10-08 02:41:15 ] Completed replacing temp checkpoint with checkpoint 105.196 ms, 31.06 s total | |
[ 2023-10-08 02:41:15 ] Completed Epoch: 13 batch 840: moving batch data to device 7.403 ms, 31.06 s total | |
[ 2023-10-08 02:41:15 ] Completed Epoch: 13 batch 840: forward pass 105.783 ms, 31.17 s total | |
[ 2023-10-08 02:41:15 ] Completed Epoch: 13 batch 840: backward pass 54.329 ms, 31.22 s total | |
[ 2023-10-08 02:41:15 ] Completed Epoch: 13 batch 840: computing loss 136.513 ms, 31.36 s total | |
EPOCH: [13], BATCH: [840/889], loss: 0.377, loss_box_reg: 0.112, loss_classifier: 0.094, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 840 | |
[ 2023-10-08 02:41:17 ] Completed saving temp checkpoint 2,057.722 ms, 33.42 s total | |
[ 2023-10-08 02:41:17 ] Completed replacing temp checkpoint with checkpoint 102.556 ms, 33.52 s total | |
[ 2023-10-08 02:41:18 ] Completed Epoch: 13 batch 841: moving batch data to device 7.729 ms, 33.53 s total | |
[ 2023-10-08 02:41:18 ] Completed Epoch: 13 batch 841: forward pass 111.006 ms, 33.64 s total | |
[ 2023-10-08 02:41:18 ] Completed Epoch: 13 batch 841: backward pass 88.302 ms, 33.73 s total | |
[ 2023-10-08 02:41:18 ] Completed Epoch: 13 batch 841: computing loss 109.186 ms, 33.84 s total | |
EPOCH: [13], BATCH: [841/889], loss: 0.377, loss_box_reg: 0.117, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 841 | |
[ 2023-10-08 02:41:20 ] Completed saving temp checkpoint 1,801.240 ms, 35.64 s total | |
[ 2023-10-08 02:41:20 ] Completed replacing temp checkpoint with checkpoint 92.218 ms, 35.73 s total | |
[ 2023-10-08 02:41:20 ] Completed Epoch: 13 batch 842: moving batch data to device 7.358 ms, 35.74 s total | |
[ 2023-10-08 02:41:20 ] Completed Epoch: 13 batch 842: forward pass 102.076 ms, 35.84 s total | |
[ 2023-10-08 02:41:20 ] Completed Epoch: 13 batch 842: backward pass 70.585 ms, 35.91 s total | |
[ 2023-10-08 02:41:20 ] Completed Epoch: 13 batch 842: computing loss 130.372 ms, 36.04 s total | |
EPOCH: [13], BATCH: [842/889], loss: 0.395, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 842 | |
[ 2023-10-08 02:41:22 ] Completed saving temp checkpoint 1,581.807 ms, 37.62 s total | |
[ 2023-10-08 02:41:22 ] Completed replacing temp checkpoint with checkpoint 78.916 ms, 37.70 s total | |
[ 2023-10-08 02:41:22 ] Completed Epoch: 13 batch 843: moving batch data to device 7.227 ms, 37.71 s total | |
[ 2023-10-08 02:41:22 ] Completed Epoch: 13 batch 843: forward pass 107.600 ms, 37.82 s total | |
[ 2023-10-08 02:41:22 ] Completed Epoch: 13 batch 843: backward pass 85.833 ms, 37.90 s total | |
[ 2023-10-08 02:41:22 ] Completed Epoch: 13 batch 843: computing loss 85.449 ms, 37.99 s total | |
EPOCH: [13], BATCH: [843/889], loss: 0.354, loss_box_reg: 0.104, loss_classifier: 0.088, loss_mask: 0.124, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 843 | |
[ 2023-10-08 02:41:23 ] Completed saving temp checkpoint 1,353.618 ms, 39.34 s total | |
[ 2023-10-08 02:41:23 ] Completed replacing temp checkpoint with checkpoint 58.132 ms, 39.40 s total | |
[ 2023-10-08 02:41:23 ] Completed Epoch: 13 batch 844: moving batch data to device 5.445 ms, 39.40 s total | |
[ 2023-10-08 02:41:23 ] Completed Epoch: 13 batch 844: forward pass 101.375 ms, 39.50 s total | |
[ 2023-10-08 02:41:24 ] Completed Epoch: 13 batch 844: backward pass 70.743 ms, 39.58 s total | |
[ 2023-10-08 02:41:24 ] Completed Epoch: 13 batch 844: computing loss 123.645 ms, 39.70 s total | |
EPOCH: [13], BATCH: [844/889], loss: 0.359, loss_box_reg: 0.106, loss_classifier: 0.090, loss_mask: 0.126, loss_objectness: 0.014, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 844 | |
[ 2023-10-08 02:41:25 ] Completed saving temp checkpoint 1,531.075 ms, 41.23 s total | |
[ 2023-10-08 02:41:25 ] Completed replacing temp checkpoint with checkpoint 67.891 ms, 41.30 s total | |
[ 2023-10-08 02:41:25 ] Completed Epoch: 13 batch 845: moving batch data to device 8.298 ms, 41.31 s total | |
[ 2023-10-08 02:41:25 ] Completed Epoch: 13 batch 845: forward pass 102.856 ms, 41.41 s total | |
[ 2023-10-08 02:41:25 ] Completed Epoch: 13 batch 845: backward pass 65.025 ms, 41.47 s total | |
[ 2023-10-08 02:41:26 ] Completed Epoch: 13 batch 845: computing loss 130.913 ms, 41.61 s total | |
EPOCH: [13], BATCH: [845/889], loss: 0.398, loss_box_reg: 0.125, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 845 | |
[ 2023-10-08 02:41:27 ] Completed saving temp checkpoint 1,366.414 ms, 42.97 s total | |
[ 2023-10-08 02:41:27 ] Completed replacing temp checkpoint with checkpoint 97.167 ms, 43.07 s total | |
[ 2023-10-08 02:41:27 ] Completed Epoch: 13 batch 846: moving batch data to device 7.931 ms, 43.08 s total | |
[ 2023-10-08 02:41:27 ] Completed Epoch: 13 batch 846: forward pass 106.414 ms, 43.18 s total | |
[ 2023-10-08 02:41:27 ] Completed Epoch: 13 batch 846: backward pass 73.278 ms, 43.26 s total | |
[ 2023-10-08 02:41:27 ] Completed Epoch: 13 batch 846: computing loss 115.888 ms, 43.37 s total | |
EPOCH: [13], BATCH: [846/889], loss: 0.393, loss_box_reg: 0.118, loss_classifier: 0.101, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 846 | |
[ 2023-10-08 02:41:29 ] Completed saving temp checkpoint 1,552.495 ms, 44.92 s total | |
[ 2023-10-08 02:41:29 ] Completed replacing temp checkpoint with checkpoint 104.921 ms, 45.03 s total | |
[ 2023-10-08 02:41:29 ] Completed Epoch: 13 batch 847: moving batch data to device 7.652 ms, 45.04 s total | |
[ 2023-10-08 02:41:29 ] Completed Epoch: 13 batch 847: forward pass 106.008 ms, 45.14 s total | |
[ 2023-10-08 02:41:29 ] Completed Epoch: 13 batch 847: backward pass 82.103 ms, 45.23 s total | |
[ 2023-10-08 02:41:29 ] Completed Epoch: 13 batch 847: computing loss 107.136 ms, 45.33 s total | |
EPOCH: [13], BATCH: [847/889], loss: 0.368, loss_box_reg: 0.110, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.011, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 847 | |
[ 2023-10-08 02:41:31 ] Completed saving temp checkpoint 1,425.781 ms, 46.76 s total | |
[ 2023-10-08 02:41:31 ] Completed replacing temp checkpoint with checkpoint 96.268 ms, 46.85 s total | |
[ 2023-10-08 02:41:31 ] Completed Epoch: 13 batch 848: moving batch data to device 9.449 ms, 46.86 s total | |
[ 2023-10-08 02:41:31 ] Completed Epoch: 13 batch 848: forward pass 106.714 ms, 46.97 s total | |
[ 2023-10-08 02:41:31 ] Completed Epoch: 13 batch 848: backward pass 80.140 ms, 47.05 s total | |
[ 2023-10-08 02:41:31 ] Completed Epoch: 13 batch 848: computing loss 112.575 ms, 47.16 s total | |
EPOCH: [13], BATCH: [848/889], loss: 0.406, loss_box_reg: 0.127, loss_classifier: 0.100, loss_mask: 0.140, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 848 | |
[ 2023-10-08 02:41:33 ] Completed saving temp checkpoint 1,647.153 ms, 48.81 s total | |
[ 2023-10-08 02:41:33 ] Completed replacing temp checkpoint with checkpoint 72.359 ms, 48.88 s total | |
[ 2023-10-08 02:41:33 ] Completed Epoch: 13 batch 849: moving batch data to device 4.879 ms, 48.89 s total | |
[ 2023-10-08 02:41:33 ] Completed Epoch: 13 batch 849: forward pass 107.096 ms, 49.00 s total | |
[ 2023-10-08 02:41:33 ] Completed Epoch: 13 batch 849: backward pass 77.321 ms, 49.07 s total | |
[ 2023-10-08 02:41:33 ] Completed Epoch: 13 batch 849: computing loss 114.650 ms, 49.19 s total | |
EPOCH: [13], BATCH: [849/889], loss: 0.375, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.017, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 849 | |
[ 2023-10-08 02:41:35 ] Completed saving temp checkpoint 1,395.223 ms, 50.58 s total | |
[ 2023-10-08 02:41:35 ] Completed replacing temp checkpoint with checkpoint 84.898 ms, 50.67 s total | |
[ 2023-10-08 02:41:35 ] Completed Epoch: 13 batch 850: moving batch data to device 7.643 ms, 50.67 s total | |
[ 2023-10-08 02:41:35 ] Completed Epoch: 13 batch 850: forward pass 111.407 ms, 50.79 s total | |
[ 2023-10-08 02:41:35 ] Completed Epoch: 13 batch 850: backward pass 78.779 ms, 50.87 s total | |
[ 2023-10-08 02:41:35 ] Completed Epoch: 13 batch 850: computing loss 109.389 ms, 50.97 s total | |
EPOCH: [13], BATCH: [850/889], loss: 0.373, loss_box_reg: 0.110, loss_classifier: 0.097, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 850 | |
[ 2023-10-08 02:41:37 ] Completed saving temp checkpoint 1,600.079 ms, 52.57 s total | |
[ 2023-10-08 02:41:37 ] Completed replacing temp checkpoint with checkpoint 79.330 ms, 52.65 s total | |
[ 2023-10-08 02:41:37 ] Completed Epoch: 13 batch 851: moving batch data to device 7.268 ms, 52.66 s total | |
[ 2023-10-08 02:41:37 ] Completed Epoch: 13 batch 851: forward pass 108.193 ms, 52.77 s total | |
[ 2023-10-08 02:41:37 ] Completed Epoch: 13 batch 851: backward pass 54.649 ms, 52.82 s total | |
[ 2023-10-08 02:41:37 ] Completed Epoch: 13 batch 851: computing loss 138.169 ms, 52.96 s total | |
EPOCH: [13], BATCH: [851/889], loss: 0.356, loss_box_reg: 0.103, loss_classifier: 0.088, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 851 | |
[ 2023-10-08 02:41:39 ] Completed saving temp checkpoint 1,631.381 ms, 54.59 s total | |
[ 2023-10-08 02:41:39 ] Completed replacing temp checkpoint with checkpoint 48.380 ms, 54.64 s total | |
[ 2023-10-08 02:41:39 ] Completed Epoch: 13 batch 852: moving batch data to device 7.231 ms, 54.65 s total | |
[ 2023-10-08 02:41:39 ] Completed Epoch: 13 batch 852: forward pass 109.790 ms, 54.76 s total | |
[ 2023-10-08 02:41:39 ] Completed Epoch: 13 batch 852: backward pass 65.554 ms, 54.82 s total | |
[ 2023-10-08 02:41:39 ] Completed Epoch: 13 batch 852: computing loss 124.638 ms, 54.95 s total | |
EPOCH: [13], BATCH: [852/889], loss: 0.440, loss_box_reg: 0.131, loss_classifier: 0.106, loss_mask: 0.141, loss_objectness: 0.019, loss_rpn_box_reg: 0.044 | |
Saving checkpoint at epoch 13 train batch 852 | |
[ 2023-10-08 02:41:41 ] Completed saving temp checkpoint 2,002.150 ms, 56.95 s total | |
[ 2023-10-08 02:41:41 ] Completed replacing temp checkpoint with checkpoint 96.762 ms, 57.05 s total | |
[ 2023-10-08 02:41:41 ] Completed Epoch: 13 batch 853: moving batch data to device 7.616 ms, 57.06 s total | |
[ 2023-10-08 02:41:41 ] Completed Epoch: 13 batch 853: forward pass 107.976 ms, 57.16 s total | |
[ 2023-10-08 02:41:41 ] Completed Epoch: 13 batch 853: backward pass 75.869 ms, 57.24 s total | |
[ 2023-10-08 02:41:41 ] Completed Epoch: 13 batch 853: computing loss 120.108 ms, 57.36 s total | |
EPOCH: [13], BATCH: [853/889], loss: 0.384, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 853 | |
[ 2023-10-08 02:41:43 ] Completed saving temp checkpoint 1,397.791 ms, 58.76 s total | |
[ 2023-10-08 02:41:43 ] Completed replacing temp checkpoint with checkpoint 95.555 ms, 58.85 s total | |
[ 2023-10-08 02:41:43 ] Completed Epoch: 13 batch 854: moving batch data to device 7.382 ms, 58.86 s total | |
[ 2023-10-08 02:41:43 ] Completed Epoch: 13 batch 854: forward pass 105.391 ms, 58.97 s total | |
[ 2023-10-08 02:41:43 ] Completed Epoch: 13 batch 854: backward pass 80.915 ms, 59.05 s total | |
[ 2023-10-08 02:41:43 ] Completed Epoch: 13 batch 854: computing loss 102.261 ms, 59.15 s total | |
EPOCH: [13], BATCH: [854/889], loss: 0.392, loss_box_reg: 0.123, loss_classifier: 0.097, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 854 | |
[ 2023-10-08 02:41:45 ] Completed saving temp checkpoint 1,937.740 ms, 61.09 s total | |
[ 2023-10-08 02:41:45 ] Completed replacing temp checkpoint with checkpoint 82.431 ms, 61.17 s total | |
[ 2023-10-08 02:41:45 ] Completed Epoch: 13 batch 855: moving batch data to device 5.779 ms, 61.17 s total | |
[ 2023-10-08 02:41:45 ] Completed Epoch: 13 batch 855: forward pass 100.034 ms, 61.27 s total | |
[ 2023-10-08 02:41:45 ] Completed Epoch: 13 batch 855: backward pass 73.366 ms, 61.35 s total | |
[ 2023-10-08 02:41:45 ] Completed Epoch: 13 batch 855: computing loss 112.731 ms, 61.46 s total | |
EPOCH: [13], BATCH: [855/889], loss: 0.361, loss_box_reg: 0.109, loss_classifier: 0.087, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 855 | |
[ 2023-10-08 02:41:47 ] Completed saving temp checkpoint 1,198.864 ms, 62.66 s total | |
[ 2023-10-08 02:41:47 ] Completed replacing temp checkpoint with checkpoint 88.377 ms, 62.75 s total | |
[ 2023-10-08 02:41:47 ] Completed Epoch: 13 batch 856: moving batch data to device 7.070 ms, 62.76 s total | |
[ 2023-10-08 02:41:47 ] Completed Epoch: 13 batch 856: forward pass 109.910 ms, 62.87 s total | |
[ 2023-10-08 02:41:47 ] Completed Epoch: 13 batch 856: backward pass 74.744 ms, 62.94 s total | |
[ 2023-10-08 02:41:47 ] Completed Epoch: 13 batch 856: computing loss 118.726 ms, 63.06 s total | |
EPOCH: [13], BATCH: [856/889], loss: 0.390, loss_box_reg: 0.119, loss_classifier: 0.092, loss_mask: 0.134, loss_objectness: 0.013, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 856 | |
[ 2023-10-08 02:41:48 ] Completed saving temp checkpoint 1,303.259 ms, 64.36 s total | |
[ 2023-10-08 02:41:48 ] Completed replacing temp checkpoint with checkpoint 69.103 ms, 64.43 s total | |
[ 2023-10-08 02:41:48 ] Completed Epoch: 13 batch 857: moving batch data to device 8.068 ms, 64.44 s total | |
[ 2023-10-08 02:41:49 ] Completed Epoch: 13 batch 857: forward pass 112.232 ms, 64.55 s total | |
[ 2023-10-08 02:41:49 ] Completed Epoch: 13 batch 857: backward pass 36.399 ms, 64.59 s total | |
[ 2023-10-08 02:41:49 ] Completed Epoch: 13 batch 857: computing loss 154.290 ms, 64.74 s total | |
EPOCH: [13], BATCH: [857/889], loss: 0.398, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.032 | |
Saving checkpoint at epoch 13 train batch 857 | |
[ 2023-10-08 02:41:50 ] Completed saving temp checkpoint 1,163.714 ms, 65.91 s total | |
[ 2023-10-08 02:41:50 ] Completed replacing temp checkpoint with checkpoint 89.530 ms, 66.00 s total | |
[ 2023-10-08 02:41:50 ] Completed Epoch: 13 batch 858: moving batch data to device 6.809 ms, 66.00 s total | |
[ 2023-10-08 02:41:50 ] Completed Epoch: 13 batch 858: forward pass 101.696 ms, 66.10 s total | |
[ 2023-10-08 02:41:50 ] Completed Epoch: 13 batch 858: backward pass 66.704 ms, 66.17 s total | |
[ 2023-10-08 02:41:50 ] Completed Epoch: 13 batch 858: computing loss 131.091 ms, 66.30 s total | |
EPOCH: [13], BATCH: [858/889], loss: 0.398, loss_box_reg: 0.122, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 858 | |
[ 2023-10-08 02:41:52 ] Completed saving temp checkpoint 1,261.359 ms, 67.56 s total | |
[ 2023-10-08 02:41:52 ] Completed replacing temp checkpoint with checkpoint 80.531 ms, 67.64 s total | |
[ 2023-10-08 02:41:52 ] Completed Epoch: 13 batch 859: moving batch data to device 7.190 ms, 67.65 s total | |
[ 2023-10-08 02:41:52 ] Completed Epoch: 13 batch 859: forward pass 106.145 ms, 67.76 s total | |
[ 2023-10-08 02:41:52 ] Completed Epoch: 13 batch 859: backward pass 65.293 ms, 67.82 s total | |
[ 2023-10-08 02:41:52 ] Completed Epoch: 13 batch 859: computing loss 129.365 ms, 67.95 s total | |
EPOCH: [13], BATCH: [859/889], loss: 0.366, loss_box_reg: 0.108, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 859 | |
[ 2023-10-08 02:41:53 ] Completed saving temp checkpoint 1,377.400 ms, 69.33 s total | |
[ 2023-10-08 02:41:53 ] Completed replacing temp checkpoint with checkpoint 74.328 ms, 69.40 s total | |
[ 2023-10-08 02:41:53 ] Completed Epoch: 13 batch 860: moving batch data to device 7.759 ms, 69.41 s total | |
[ 2023-10-08 02:41:53 ] Completed Epoch: 13 batch 860: forward pass 104.613 ms, 69.52 s total | |
[ 2023-10-08 02:41:54 ] Completed Epoch: 13 batch 860: backward pass 43.648 ms, 69.56 s total | |
[ 2023-10-08 02:41:54 ] Completed Epoch: 13 batch 860: computing loss 150.584 ms, 69.71 s total | |
EPOCH: [13], BATCH: [860/889], loss: 0.419, loss_box_reg: 0.124, loss_classifier: 0.109, loss_mask: 0.136, loss_objectness: 0.019, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 860 | |
[ 2023-10-08 02:41:55 ] Completed saving temp checkpoint 1,712.039 ms, 71.42 s total | |
[ 2023-10-08 02:41:55 ] Completed replacing temp checkpoint with checkpoint 76.037 ms, 71.50 s total | |
[ 2023-10-08 02:41:55 ] Completed Epoch: 13 batch 861: moving batch data to device 7.066 ms, 71.51 s total | |
[ 2023-10-08 02:41:56 ] Completed Epoch: 13 batch 861: forward pass 105.576 ms, 71.61 s total | |
[ 2023-10-08 02:41:56 ] Completed Epoch: 13 batch 861: backward pass 72.186 ms, 71.68 s total | |
[ 2023-10-08 02:41:56 ] Completed Epoch: 13 batch 861: computing loss 127.944 ms, 71.81 s total | |
EPOCH: [13], BATCH: [861/889], loss: 0.365, loss_box_reg: 0.104, loss_classifier: 0.091, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.029 | |
Saving checkpoint at epoch 13 train batch 861 | |
[ 2023-10-08 02:41:57 ] Completed saving temp checkpoint 1,442.723 ms, 73.25 s total | |
[ 2023-10-08 02:41:57 ] Completed replacing temp checkpoint with checkpoint 66.759 ms, 73.32 s total | |
[ 2023-10-08 02:41:57 ] Completed Epoch: 13 batch 862: moving batch data to device 7.845 ms, 73.33 s total | |
[ 2023-10-08 02:41:57 ] Completed Epoch: 13 batch 862: forward pass 105.749 ms, 73.43 s total | |
[ 2023-10-08 02:41:57 ] Completed Epoch: 13 batch 862: backward pass 42.272 ms, 73.48 s total | |
[ 2023-10-08 02:41:58 ] Completed Epoch: 13 batch 862: computing loss 127.388 ms, 73.60 s total | |
EPOCH: [13], BATCH: [862/889], loss: 0.403, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 862 | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 02:55:12 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:55:12 ] Completed importing Timer 0.021 ms, 0.00 s total | |
[ 2023-10-08 02:55:13 ] Completed importing everything else 558.017 ms, 0.56 s total | |
[ 2023-10-08 02:55:13 ] Completed defined other functions 0.032 ms, 0.56 s total | |
| distributed init (rank 4): env:// | |
| distributed init (rank 3): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 1): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 02:55:20 ] Completed main preliminaries 7,531.250 ms, 8.09 s total | |
loading annotations into memory... | |
Done (t=10.56s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.27s) | |
creating index... | |
index created! | |
[ 2023-10-08 02:55:32 ] Completed loading data 12,260.571 ms, 20.35 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 02:55:33 ] Completed creating data samplers 93.827 ms, 20.44 s total | |
[ 2023-10-08 02:55:33 ] Completed creating data loaders 0.206 ms, 20.44 s total | |
[ 2023-10-08 02:55:33 ] Completed creating model and .to(device) 651.415 ms, 21.10 s total | |
[ 2023-10-08 02:55:35 ] Completed preparing model for distributed training 2,006.848 ms, 23.10 s total | |
[ 2023-10-08 02:55:35 ] Completed optimizer and scaler 0.630 ms, 23.10 s total | |
[ 2023-10-08 02:55:35 ] Completed learning rate schedulers 0.257 ms, 23.10 s total | |
[ 2023-10-08 02:55:36 ] Completed init coco evaluator 953.271 ms, 24.06 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 02:55:37 ] Completed retrieving checkpoint 880.884 ms, 24.94 s total | |
EPOCH :: 13 | |
[ 2023-10-08 02:55:37 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:55:37 ] Completed training preliminaries 0.924 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 862 | |
[ 2023-10-08 02:55:37 ] Completed Epoch: 13 batch 862: moving batch data to device 470.412 ms, 0.47 s total | |
[ 2023-10-08 02:55:39 ] Completed Epoch: 13 batch 862: forward pass 1,195.002 ms, 1.67 s total | |
[ 2023-10-08 02:55:39 ] Completed Epoch: 13 batch 862: backward pass 168.974 ms, 1.84 s total | |
[ 2023-10-08 02:55:39 ] Completed Epoch: 13 batch 862: computing loss 183.903 ms, 2.02 s total | |
EPOCH: [13], BATCH: [862/889], loss: 0.406, loss_box_reg: 0.121, loss_classifier: 0.101, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 862 | |
[ 2023-10-08 02:55:40 ] Completed saving temp checkpoint 1,033.535 ms, 3.05 s total | |
[ 2023-10-08 02:55:40 ] Completed replacing temp checkpoint with checkpoint 175.965 ms, 3.23 s total | |
[ 2023-10-08 02:55:40 ] Completed Epoch: 13 batch 863: moving batch data to device 50.340 ms, 3.28 s total | |
[ 2023-10-08 02:55:40 ] Completed Epoch: 13 batch 863: forward pass 112.870 ms, 3.39 s total | |
[ 2023-10-08 02:55:40 ] Completed Epoch: 13 batch 863: backward pass 82.227 ms, 3.47 s total | |
[ 2023-10-08 02:55:41 ] Completed Epoch: 13 batch 863: computing loss 217.741 ms, 3.69 s total | |
EPOCH: [13], BATCH: [863/889], loss: 0.409, loss_box_reg: 0.122, loss_classifier: 0.104, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.033 | |
Saving checkpoint at epoch 13 train batch 863 | |
[ 2023-10-08 02:55:42 ] Completed saving temp checkpoint 1,079.712 ms, 4.77 s total | |
[ 2023-10-08 02:55:42 ] Completed replacing temp checkpoint with checkpoint 53.007 ms, 4.82 s total | |
[ 2023-10-08 02:55:42 ] Completed Epoch: 13 batch 864: moving batch data to device 6.150 ms, 4.83 s total | |
[ 2023-10-08 02:55:42 ] Completed Epoch: 13 batch 864: forward pass 107.105 ms, 4.94 s total | |
[ 2023-10-08 02:55:42 ] Completed Epoch: 13 batch 864: backward pass 92.955 ms, 5.03 s total | |
[ 2023-10-08 02:55:42 ] Completed Epoch: 13 batch 864: computing loss 127.873 ms, 5.16 s total | |
EPOCH: [13], BATCH: [864/889], loss: 0.349, loss_box_reg: 0.104, loss_classifier: 0.088, loss_mask: 0.125, loss_objectness: 0.012, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 864 | |
[ 2023-10-08 02:55:43 ] Completed saving temp checkpoint 999.740 ms, 6.16 s total | |
[ 2023-10-08 02:55:43 ] Completed replacing temp checkpoint with checkpoint 67.743 ms, 6.23 s total | |
[ 2023-10-08 02:55:43 ] Completed Epoch: 13 batch 865: moving batch data to device 3.425 ms, 6.23 s total | |
[ 2023-10-08 02:55:43 ] Completed Epoch: 13 batch 865: forward pass 130.505 ms, 6.36 s total | |
[ 2023-10-08 02:55:43 ] Completed Epoch: 13 batch 865: backward pass 45.068 ms, 6.41 s total | |
[ 2023-10-08 02:55:44 ] Completed Epoch: 13 batch 865: computing loss 213.918 ms, 6.62 s total | |
EPOCH: [13], BATCH: [865/889], loss: 0.363, loss_box_reg: 0.107, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 865 | |
[ 2023-10-08 02:55:45 ] Completed saving temp checkpoint 1,103.682 ms, 7.72 s total | |
[ 2023-10-08 02:55:45 ] Completed replacing temp checkpoint with checkpoint 72.869 ms, 7.80 s total | |
[ 2023-10-08 02:55:45 ] Completed Epoch: 13 batch 866: moving batch data to device 4.344 ms, 7.80 s total | |
[ 2023-10-08 02:55:45 ] Completed Epoch: 13 batch 866: forward pass 110.813 ms, 7.91 s total | |
[ 2023-10-08 02:55:45 ] Completed Epoch: 13 batch 866: backward pass 82.114 ms, 7.99 s total | |
[ 2023-10-08 02:55:45 ] Completed Epoch: 13 batch 866: computing loss 132.278 ms, 8.13 s total | |
EPOCH: [13], BATCH: [866/889], loss: 0.382, loss_box_reg: 0.113, loss_classifier: 0.099, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 866 | |
[ 2023-10-08 02:55:46 ] Completed saving temp checkpoint 974.616 ms, 9.10 s total | |
[ 2023-10-08 02:55:46 ] Completed replacing temp checkpoint with checkpoint 61.430 ms, 9.16 s total | |
[ 2023-10-08 02:55:46 ] Completed Epoch: 13 batch 867: moving batch data to device 13.207 ms, 9.17 s total | |
[ 2023-10-08 02:55:46 ] Completed Epoch: 13 batch 867: forward pass 106.532 ms, 9.28 s total | |
[ 2023-10-08 02:55:46 ] Completed Epoch: 13 batch 867: backward pass 120.244 ms, 9.40 s total | |
[ 2023-10-08 02:55:47 ] Completed Epoch: 13 batch 867: computing loss 91.187 ms, 9.49 s total | |
EPOCH: [13], BATCH: [867/889], loss: 0.394, loss_box_reg: 0.118, loss_classifier: 0.102, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 867 | |
[ 2023-10-08 02:55:48 ] Completed saving temp checkpoint 1,061.323 ms, 10.55 s total | |
[ 2023-10-08 02:55:48 ] Completed replacing temp checkpoint with checkpoint 74.484 ms, 10.63 s total | |
[ 2023-10-08 02:55:48 ] Completed Epoch: 13 batch 868: moving batch data to device 3.620 ms, 10.63 s total | |
[ 2023-10-08 02:55:48 ] Completed Epoch: 13 batch 868: forward pass 107.625 ms, 10.74 s total | |
[ 2023-10-08 02:55:48 ] Completed Epoch: 13 batch 868: backward pass 37.824 ms, 10.78 s total | |
[ 2023-10-08 02:55:48 ] Completed Epoch: 13 batch 868: computing loss 161.016 ms, 10.94 s total | |
EPOCH: [13], BATCH: [868/889], loss: 0.363, loss_box_reg: 0.108, loss_classifier: 0.091, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 868 | |
[ 2023-10-08 02:55:49 ] Completed saving temp checkpoint 937.239 ms, 11.88 s total | |
[ 2023-10-08 02:55:49 ] Completed replacing temp checkpoint with checkpoint 67.957 ms, 11.94 s total | |
[ 2023-10-08 02:55:49 ] Completed Epoch: 13 batch 869: moving batch data to device 8.251 ms, 11.95 s total | |
[ 2023-10-08 02:55:49 ] Completed Epoch: 13 batch 869: forward pass 106.959 ms, 12.06 s total | |
[ 2023-10-08 02:55:49 ] Completed Epoch: 13 batch 869: backward pass 41.414 ms, 12.10 s total | |
[ 2023-10-08 02:55:49 ] Completed Epoch: 13 batch 869: computing loss 153.956 ms, 12.25 s total | |
EPOCH: [13], BATCH: [869/889], loss: 0.418, loss_box_reg: 0.125, loss_classifier: 0.106, loss_mask: 0.135, loss_objectness: 0.019, loss_rpn_box_reg: 0.034 | |
Saving checkpoint at epoch 13 train batch 869 | |
[ 2023-10-08 02:55:51 ] Completed saving temp checkpoint 1,463.701 ms, 13.72 s total | |
[ 2023-10-08 02:55:51 ] Completed replacing temp checkpoint with checkpoint 93.267 ms, 13.81 s total | |
[ 2023-10-08 02:55:51 ] Completed Epoch: 13 batch 870: moving batch data to device 4.940 ms, 13.82 s total | |
[ 2023-10-08 02:55:51 ] Completed Epoch: 13 batch 870: forward pass 104.950 ms, 13.92 s total | |
[ 2023-10-08 02:55:51 ] Completed Epoch: 13 batch 870: backward pass 53.033 ms, 13.97 s total | |
[ 2023-10-08 02:55:51 ] Completed Epoch: 13 batch 870: computing loss 136.424 ms, 14.11 s total | |
EPOCH: [13], BATCH: [870/889], loss: 0.370, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 870 | |
[ 2023-10-08 02:55:53 ] Completed saving temp checkpoint 1,563.056 ms, 15.67 s total | |
[ 2023-10-08 02:55:53 ] Completed replacing temp checkpoint with checkpoint 97.868 ms, 15.77 s total | |
[ 2023-10-08 02:55:53 ] Completed Epoch: 13 batch 871: moving batch data to device 7.551 ms, 15.78 s total | |
[ 2023-10-08 02:55:53 ] Completed Epoch: 13 batch 871: forward pass 106.629 ms, 15.89 s total | |
[ 2023-10-08 02:55:53 ] Completed Epoch: 13 batch 871: backward pass 39.484 ms, 15.92 s total | |
[ 2023-10-08 02:55:53 ] Completed Epoch: 13 batch 871: computing loss 251.439 ms, 16.18 s total | |
EPOCH: [13], BATCH: [871/889], loss: 0.396, loss_box_reg: 0.126, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 871 | |
[ 2023-10-08 02:55:55 ] Completed saving temp checkpoint 1,681.177 ms, 17.86 s total | |
[ 2023-10-08 02:55:55 ] Completed replacing temp checkpoint with checkpoint 110.883 ms, 17.97 s total | |
[ 2023-10-08 02:55:55 ] Completed Epoch: 13 batch 872: moving batch data to device 6.834 ms, 17.98 s total | |
[ 2023-10-08 02:55:55 ] Completed Epoch: 13 batch 872: forward pass 105.078 ms, 18.08 s total | |
[ 2023-10-08 02:55:55 ] Completed Epoch: 13 batch 872: backward pass 67.977 ms, 18.15 s total | |
[ 2023-10-08 02:55:55 ] Completed Epoch: 13 batch 872: computing loss 122.443 ms, 18.27 s total | |
EPOCH: [13], BATCH: [872/889], loss: 0.375, loss_box_reg: 0.111, loss_classifier: 0.094, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.028 | |
Saving checkpoint at epoch 13 train batch 872 | |
[ 2023-10-08 02:55:57 ] Completed saving temp checkpoint 1,466.420 ms, 19.74 s total | |
[ 2023-10-08 02:55:57 ] Completed replacing temp checkpoint with checkpoint 75.728 ms, 19.81 s total | |
[ 2023-10-08 02:55:57 ] Completed Epoch: 13 batch 873: moving batch data to device 5.494 ms, 19.82 s total | |
[ 2023-10-08 02:55:57 ] Completed Epoch: 13 batch 873: forward pass 103.595 ms, 19.92 s total | |
[ 2023-10-08 02:55:57 ] Completed Epoch: 13 batch 873: backward pass 41.272 ms, 19.96 s total | |
[ 2023-10-08 02:55:57 ] Completed Epoch: 13 batch 873: computing loss 150.226 ms, 20.11 s total | |
EPOCH: [13], BATCH: [873/889], loss: 0.378, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 873 | |
[ 2023-10-08 02:55:59 ] Completed saving temp checkpoint 1,534.362 ms, 21.65 s total | |
[ 2023-10-08 02:55:59 ] Completed replacing temp checkpoint with checkpoint 93.873 ms, 21.74 s total | |
[ 2023-10-08 02:55:59 ] Completed Epoch: 13 batch 874: moving batch data to device 6.975 ms, 21.75 s total | |
[ 2023-10-08 02:55:59 ] Completed Epoch: 13 batch 874: forward pass 105.375 ms, 21.85 s total | |
[ 2023-10-08 02:55:59 ] Completed Epoch: 13 batch 874: backward pass 68.161 ms, 21.92 s total | |
[ 2023-10-08 02:55:59 ] Completed Epoch: 13 batch 874: computing loss 126.179 ms, 22.05 s total | |
EPOCH: [13], BATCH: [874/889], loss: 0.380, loss_box_reg: 0.111, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 874 | |
[ 2023-10-08 02:56:00 ] Completed saving temp checkpoint 1,009.542 ms, 23.06 s total | |
[ 2023-10-08 02:56:00 ] Completed replacing temp checkpoint with checkpoint 57.433 ms, 23.12 s total | |
[ 2023-10-08 02:56:00 ] Completed Epoch: 13 batch 875: moving batch data to device 7.985 ms, 23.12 s total | |
[ 2023-10-08 02:56:00 ] Completed Epoch: 13 batch 875: forward pass 104.050 ms, 23.23 s total | |
[ 2023-10-08 02:56:00 ] Completed Epoch: 13 batch 875: backward pass 80.793 ms, 23.31 s total | |
[ 2023-10-08 02:56:00 ] Completed Epoch: 13 batch 875: computing loss 170.539 ms, 23.48 s total | |
EPOCH: [13], BATCH: [875/889], loss: 0.416, loss_box_reg: 0.127, loss_classifier: 0.109, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.030 | |
Saving checkpoint at epoch 13 train batch 875 | |
[ 2023-10-08 02:56:02 ] Completed saving temp checkpoint 1,136.823 ms, 24.62 s total | |
[ 2023-10-08 02:56:02 ] Completed replacing temp checkpoint with checkpoint 80.415 ms, 24.70 s total | |
[ 2023-10-08 02:56:02 ] Completed Epoch: 13 batch 876: moving batch data to device 7.565 ms, 24.70 s total | |
[ 2023-10-08 02:56:02 ] Completed Epoch: 13 batch 876: forward pass 102.977 ms, 24.81 s total | |
[ 2023-10-08 02:56:02 ] Completed Epoch: 13 batch 876: backward pass 75.389 ms, 24.88 s total | |
[ 2023-10-08 02:56:02 ] Completed Epoch: 13 batch 876: computing loss 110.172 ms, 24.99 s total | |
EPOCH: [13], BATCH: [876/889], loss: 0.409, loss_box_reg: 0.123, loss_classifier: 0.101, loss_mask: 0.137, loss_objectness: 0.021, loss_rpn_box_reg: 0.027 | |
Saving checkpoint at epoch 13 train batch 876 | |
[ 2023-10-08 02:56:03 ] Completed saving temp checkpoint 987.229 ms, 25.98 s total | |
[ 2023-10-08 02:56:03 ] Completed replacing temp checkpoint with checkpoint 47.654 ms, 26.03 s total | |
[ 2023-10-08 02:56:03 ] Completed Epoch: 13 batch 877: moving batch data to device 7.673 ms, 26.03 s total | |
[ 2023-10-08 02:56:03 ] Completed Epoch: 13 batch 877: forward pass 100.335 ms, 26.14 s total | |
[ 2023-10-08 02:56:03 ] Completed Epoch: 13 batch 877: backward pass 77.758 ms, 26.21 s total | |
[ 2023-10-08 02:56:03 ] Completed Epoch: 13 batch 877: computing loss 111.754 ms, 26.32 s total | |
EPOCH: [13], BATCH: [877/889], loss: 0.393, loss_box_reg: 0.120, loss_classifier: 0.096, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 877 | |
[ 2023-10-08 02:56:05 ] Completed saving temp checkpoint 1,173.579 ms, 27.50 s total | |
[ 2023-10-08 02:56:05 ] Completed replacing temp checkpoint with checkpoint 84.847 ms, 27.58 s total | |
[ 2023-10-08 02:56:05 ] Completed Epoch: 13 batch 878: moving batch data to device 8.185 ms, 27.59 s total | |
[ 2023-10-08 02:56:05 ] Completed Epoch: 13 batch 878: forward pass 105.848 ms, 27.70 s total | |
[ 2023-10-08 02:56:05 ] Completed Epoch: 13 batch 878: backward pass 51.418 ms, 27.75 s total | |
[ 2023-10-08 02:56:05 ] Completed Epoch: 13 batch 878: computing loss 143.287 ms, 27.89 s total | |
EPOCH: [13], BATCH: [878/889], loss: 0.386, loss_box_reg: 0.117, loss_classifier: 0.100, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.023 | |
Saving checkpoint at epoch 13 train batch 878 | |
[ 2023-10-08 02:56:06 ] Completed saving temp checkpoint 1,108.064 ms, 29.00 s total | |
[ 2023-10-08 02:56:06 ] Completed replacing temp checkpoint with checkpoint 45.825 ms, 29.05 s total | |
[ 2023-10-08 02:56:06 ] Completed Epoch: 13 batch 879: moving batch data to device 11.600 ms, 29.06 s total | |
[ 2023-10-08 02:56:06 ] Completed Epoch: 13 batch 879: forward pass 110.700 ms, 29.17 s total | |
[ 2023-10-08 02:56:06 ] Completed Epoch: 13 batch 879: backward pass 74.012 ms, 29.24 s total | |
[ 2023-10-08 02:56:06 ] Completed Epoch: 13 batch 879: computing loss 122.604 ms, 29.36 s total | |
EPOCH: [13], BATCH: [879/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.107, loss_mask: 0.131, loss_objectness: 0.019, loss_rpn_box_reg: 0.031 | |
Saving checkpoint at epoch 13 train batch 879 | |
[ 2023-10-08 02:56:08 ] Completed saving temp checkpoint 1,164.365 ms, 30.53 s total | |
[ 2023-10-08 02:56:08 ] Completed replacing temp checkpoint with checkpoint 54.965 ms, 30.58 s total | |
[ 2023-10-08 02:56:08 ] Completed Epoch: 13 batch 880: moving batch data to device 4.831 ms, 30.59 s total | |
[ 2023-10-08 02:56:08 ] Completed Epoch: 13 batch 880: forward pass 101.522 ms, 30.69 s total | |
[ 2023-10-08 02:56:08 ] Completed Epoch: 13 batch 880: backward pass 52.775 ms, 30.74 s total | |
[ 2023-10-08 02:56:08 ] Completed Epoch: 13 batch 880: computing loss 139.454 ms, 30.88 s total | |
EPOCH: [13], BATCH: [880/889], loss: 0.398, loss_box_reg: 0.123, loss_classifier: 0.105, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 880 | |
[ 2023-10-08 02:56:09 ] Completed saving temp checkpoint 1,074.830 ms, 31.96 s total | |
[ 2023-10-08 02:56:09 ] Completed replacing temp checkpoint with checkpoint 46.695 ms, 32.00 s total | |
[ 2023-10-08 02:56:09 ] Completed Epoch: 13 batch 881: moving batch data to device 6.141 ms, 32.01 s total | |
[ 2023-10-08 02:56:09 ] Completed Epoch: 13 batch 881: forward pass 107.303 ms, 32.12 s total | |
[ 2023-10-08 02:56:09 ] Completed Epoch: 13 batch 881: backward pass 78.679 ms, 32.20 s total | |
[ 2023-10-08 02:56:09 ] Completed Epoch: 13 batch 881: computing loss 110.465 ms, 32.31 s total | |
EPOCH: [13], BATCH: [881/889], loss: 0.360, loss_box_reg: 0.107, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 881 | |
[ 2023-10-08 02:56:11 ] Completed saving temp checkpoint 1,230.609 ms, 33.54 s total | |
[ 2023-10-08 02:56:11 ] Completed replacing temp checkpoint with checkpoint 83.552 ms, 33.62 s total | |
[ 2023-10-08 02:56:11 ] Completed Epoch: 13 batch 882: moving batch data to device 8.926 ms, 33.63 s total | |
[ 2023-10-08 02:56:11 ] Completed Epoch: 13 batch 882: forward pass 103.895 ms, 33.73 s total | |
[ 2023-10-08 02:56:11 ] Completed Epoch: 13 batch 882: backward pass 62.530 ms, 33.80 s total | |
[ 2023-10-08 02:56:11 ] Completed Epoch: 13 batch 882: computing loss 109.247 ms, 33.91 s total | |
EPOCH: [13], BATCH: [882/889], loss: 0.370, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.129, loss_objectness: 0.012, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 882 | |
[ 2023-10-08 02:56:12 ] Completed saving temp checkpoint 1,439.795 ms, 35.35 s total | |
[ 2023-10-08 02:56:12 ] Completed replacing temp checkpoint with checkpoint 71.803 ms, 35.42 s total | |
[ 2023-10-08 02:56:12 ] Completed Epoch: 13 batch 883: moving batch data to device 6.398 ms, 35.42 s total | |
[ 2023-10-08 02:56:13 ] Completed Epoch: 13 batch 883: forward pass 103.889 ms, 35.53 s total | |
[ 2023-10-08 02:56:13 ] Completed Epoch: 13 batch 883: backward pass 78.984 ms, 35.61 s total | |
[ 2023-10-08 02:56:13 ] Completed Epoch: 13 batch 883: computing loss 91.017 ms, 35.70 s total | |
EPOCH: [13], BATCH: [883/889], loss: 0.361, loss_box_reg: 0.111, loss_classifier: 0.090, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 883 | |
[ 2023-10-08 02:56:15 ] Completed saving temp checkpoint 1,909.669 ms, 37.61 s total | |
[ 2023-10-08 02:56:15 ] Completed replacing temp checkpoint with checkpoint 93.062 ms, 37.70 s total | |
[ 2023-10-08 02:56:15 ] Completed Epoch: 13 batch 884: moving batch data to device 7.969 ms, 37.71 s total | |
[ 2023-10-08 02:56:15 ] Completed Epoch: 13 batch 884: forward pass 106.843 ms, 37.81 s total | |
[ 2023-10-08 02:56:15 ] Completed Epoch: 13 batch 884: backward pass 53.372 ms, 37.87 s total | |
[ 2023-10-08 02:56:15 ] Completed Epoch: 13 batch 884: computing loss 128.459 ms, 38.00 s total | |
EPOCH: [13], BATCH: [884/889], loss: 0.348, loss_box_reg: 0.105, loss_classifier: 0.083, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.022 | |
Saving checkpoint at epoch 13 train batch 884 | |
[ 2023-10-08 02:56:16 ] Completed saving temp checkpoint 1,341.290 ms, 39.34 s total | |
[ 2023-10-08 02:56:16 ] Completed replacing temp checkpoint with checkpoint 48.099 ms, 39.39 s total | |
[ 2023-10-08 02:56:16 ] Completed Epoch: 13 batch 885: moving batch data to device 6.728 ms, 39.39 s total | |
[ 2023-10-08 02:56:17 ] Completed Epoch: 13 batch 885: forward pass 106.381 ms, 39.50 s total | |
[ 2023-10-08 02:56:17 ] Completed Epoch: 13 batch 885: backward pass 46.268 ms, 39.55 s total | |
[ 2023-10-08 02:56:17 ] Completed Epoch: 13 batch 885: computing loss 138.231 ms, 39.68 s total | |
EPOCH: [13], BATCH: [885/889], loss: 0.351, loss_box_reg: 0.103, loss_classifier: 0.087, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.021 | |
Saving checkpoint at epoch 13 train batch 885 | |
[ 2023-10-08 02:56:19 ] Completed saving temp checkpoint 1,931.988 ms, 41.62 s total | |
[ 2023-10-08 02:56:19 ] Completed replacing temp checkpoint with checkpoint 98.443 ms, 41.71 s total | |
[ 2023-10-08 02:56:19 ] Completed Epoch: 13 batch 886: moving batch data to device 8.433 ms, 41.72 s total | |
[ 2023-10-08 02:56:19 ] Completed Epoch: 13 batch 886: forward pass 113.442 ms, 41.84 s total | |
[ 2023-10-08 02:56:19 ] Completed Epoch: 13 batch 886: backward pass 80.379 ms, 41.92 s total | |
[ 2023-10-08 02:56:19 ] Completed Epoch: 13 batch 886: computing loss 109.302 ms, 42.03 s total | |
EPOCH: [13], BATCH: [886/889], loss: 0.365, loss_box_reg: 0.111, loss_classifier: 0.088, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.024 | |
Saving checkpoint at epoch 13 train batch 886 | |
[ 2023-10-08 02:56:20 ] Completed saving temp checkpoint 1,210.242 ms, 43.24 s total | |
[ 2023-10-08 02:56:20 ] Completed replacing temp checkpoint with checkpoint 49.097 ms, 43.28 s total | |
[ 2023-10-08 02:56:20 ] Completed Epoch: 13 batch 887: moving batch data to device 5.494 ms, 43.29 s total | |
[ 2023-10-08 02:56:20 ] Completed Epoch: 13 batch 887: forward pass 105.226 ms, 43.40 s total | |
[ 2023-10-08 02:56:20 ] Completed Epoch: 13 batch 887: backward pass 85.913 ms, 43.48 s total | |
[ 2023-10-08 02:56:21 ] Completed Epoch: 13 batch 887: computing loss 104.786 ms, 43.59 s total | |
EPOCH: [13], BATCH: [887/889], loss: 0.317, loss_box_reg: 0.095, loss_classifier: 0.072, loss_mask: 0.117, loss_objectness: 0.013, loss_rpn_box_reg: 0.020 | |
Saving checkpoint at epoch 13 train batch 887 | |
[ 2023-10-08 02:56:22 ] Completed saving temp checkpoint 1,153.446 ms, 44.74 s total | |
[ 2023-10-08 02:56:22 ] Completed replacing temp checkpoint with checkpoint 85.360 ms, 44.83 s total | |
[ 2023-10-08 02:56:22 ] Completed Epoch: 13 batch 888: moving batch data to device 7.453 ms, 44.83 s total | |
[ 2023-10-08 02:56:22 ] Completed Epoch: 13 batch 888: forward pass 106.699 ms, 44.94 s total | |
[ 2023-10-08 02:56:22 ] Completed Epoch: 13 batch 888: backward pass 80.609 ms, 45.02 s total | |
[ 2023-10-08 02:56:22 ] Completed Epoch: 13 batch 888: computing loss 108.982 ms, 45.13 s total | |
EPOCH: [13], BATCH: [888/889], loss: 0.358, loss_box_reg: 0.106, loss_classifier: 0.087, loss_mask: 0.126, loss_objectness: 0.014, loss_rpn_box_reg: 0.025 | |
Saving checkpoint at epoch 13 train batch 888 | |
[ 2023-10-08 02:56:24 ] Completed saving temp checkpoint 1,389.753 ms, 46.52 s total | |
[ 2023-10-08 02:56:24 ] Completed replacing temp checkpoint with checkpoint 64.811 ms, 46.58 s total | |
[ 2023-10-08 02:56:24 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 02:56:24 ] Completed starting evaluation routine 0.116 ms, 0.00 s total | |
[ 2023-10-08 02:56:24 ] Completed evaluation preliminaries 17.236 ms, 0.02 s total | |
Evaluating / resuming epoch 13 from eval step 0 | |
[ 2023-10-08 02:56:24 ] Completed Epoch 13 batch: 0 moving to device 254.961 ms, 0.27 s total | |
[ 2023-10-08 02:56:24 ] Completed Epoch 13 batch: 0 forward through model 93.982 ms, 0.37 s total | |
[ 2023-10-08 02:56:24 ] Completed Epoch 13 batch: 0 outputs back to cpu 3.281 ms, 0.37 s total | |
[ 2023-10-08 02:56:24 ] Completed Epoch 13 batch: 0 update evaluator 17.399 ms, 0.39 s total | |
Saving checkpoint at epoch 13 eval batch 0 | |
[ 2023-10-08 02:56:26 ] Completed saving temp checkpoint 1,833.747 ms, 2.22 s total | |
[ 2023-10-08 02:56:26 ] Completed replacing temp checkpoint with checkpoint 110.866 ms, 2.33 s total | |
[ 2023-10-08 02:56:26 ] Completed Epoch 13 batch: 1 moving to device 9.863 ms, 2.34 s total | |
[ 2023-10-08 02:56:26 ] Completed Epoch 13 batch: 1 forward through model 112.951 ms, 2.45 s total | |
[ 2023-10-08 02:56:26 ] Completed Epoch 13 batch: 1 outputs back to cpu 34.341 ms, 2.49 s total | |
[ 2023-10-08 02:56:26 ] Completed Epoch 13 batch: 1 update evaluator 63.790 ms, 2.55 s total | |
Saving checkpoint at epoch 13 eval batch 1 | |
[ 2023-10-08 02:56:27 ] Completed saving temp checkpoint 997.910 ms, 3.55 s total | |
[ 2023-10-08 02:56:27 ] Completed replacing temp checkpoint with checkpoint 71.666 ms, 3.62 s total | |
[ 2023-10-08 02:56:27 ] Completed Epoch 13 batch: 2 moving to device 5.771 ms, 3.63 s total | |
[ 2023-10-08 02:56:27 ] Completed Epoch 13 batch: 2 forward through model 119.182 ms, 3.75 s total | |
[ 2023-10-08 02:56:27 ] Completed Epoch 13 batch: 2 outputs back to cpu 33.820 ms, 3.78 s total | |
[ 2023-10-08 02:56:28 ] Completed Epoch 13 batch: 2 update evaluator 76.541 ms, 3.86 s total | |
Saving checkpoint at epoch 13 eval batch 2 | |
[ 2023-10-08 02:56:29 ] Completed saving temp checkpoint 1,106.024 ms, 4.96 s total | |
[ 2023-10-08 02:56:29 ] Completed replacing temp checkpoint with checkpoint 90.148 ms, 5.05 s total | |
[ 2023-10-08 02:56:29 ] Completed Epoch 13 batch: 3 moving to device 1.613 ms, 5.06 s total | |
[ 2023-10-08 02:56:29 ] Completed Epoch 13 batch: 3 forward through model 76.913 ms, 5.13 s total | |
[ 2023-10-08 02:56:29 ] Completed Epoch 13 batch: 3 outputs back to cpu 0.632 ms, 5.13 s total | |
[ 2023-10-08 02:56:29 ] Completed Epoch 13 batch: 3 update evaluator 11.017 ms, 5.14 s total | |
Saving checkpoint at epoch 13 eval batch 3 | |
[ 2023-10-08 02:56:30 ] Completed saving temp checkpoint 961.210 ms, 6.10 s total | |
[ 2023-10-08 02:56:30 ] Completed replacing temp checkpoint with checkpoint 78.496 ms, 6.18 s total | |
[ 2023-10-08 02:56:30 ] Completed Epoch 13 batch: 4 moving to device 1.717 ms, 6.19 s total | |
[ 2023-10-08 02:56:30 ] Completed Epoch 13 batch: 4 forward through model 44.572 ms, 6.23 s total | |
[ 2023-10-08 02:56:30 ] Completed Epoch 13 batch: 4 outputs back to cpu 1.907 ms, 6.23 s total | |
[ 2023-10-08 02:56:30 ] Completed Epoch 13 batch: 4 update evaluator 10.184 ms, 6.24 s total | |
Saving checkpoint at epoch 13 eval batch 4 | |
[ 2023-10-08 02:56:31 ] Completed saving temp checkpoint 1,055.995 ms, 7.30 s total | |
[ 2023-10-08 02:56:31 ] Completed replacing temp checkpoint with checkpoint 56.911 ms, 7.35 s total | |
[ 2023-10-08 02:56:31 ] Completed Epoch 13 batch: 5 moving to device 1.952 ms, 7.36 s total | |
[ 2023-10-08 02:56:31 ] Completed Epoch 13 batch: 5 forward through model 96.128 ms, 7.45 s total | |
[ 2023-10-08 02:56:31 ] Completed Epoch 13 batch: 5 outputs back to cpu 10.011 ms, 7.46 s total | |
[ 2023-10-08 02:56:31 ] Completed Epoch 13 batch: 5 update evaluator 18.029 ms, 7.48 s total | |
Saving checkpoint at epoch 13 eval batch 5 | |
[ 2023-10-08 02:56:32 ] Completed saving temp checkpoint 1,055.148 ms, 8.54 s total | |
[ 2023-10-08 02:56:32 ] Completed replacing temp checkpoint with checkpoint 69.606 ms, 8.61 s total | |
[ 2023-10-08 02:56:32 ] Completed Epoch 13 batch: 6 moving to device 3.607 ms, 8.61 s total | |
[ 2023-10-08 02:56:32 ] Completed Epoch 13 batch: 6 forward through model 49.807 ms, 8.66 s total | |
[ 2023-10-08 02:56:32 ] Completed Epoch 13 batch: 6 outputs back to cpu 1.794 ms, 8.66 s total | |
[ 2023-10-08 02:56:32 ] Completed Epoch 13 batch: 6 update evaluator 15.729 ms, 8.68 s total | |
Saving checkpoint at epoch 13 eval batch 6 | |
[ 2023-10-08 02:56:34 ] Completed saving temp checkpoint 1,157.452 ms, 9.83 s total | |
[ 2023-10-08 02:56:34 ] Completed replacing temp checkpoint with checkpoint 62.869 ms, 9.90 s total | |
[ 2023-10-08 02:56:34 ] Completed Epoch 13 batch: 7 moving to device 1.237 ms, 9.90 s total | |
[ 2023-10-08 02:56:34 ] Completed Epoch 13 batch: 7 forward through model 71.608 ms, 9.97 s total | |
[ 2023-10-08 02:56:34 ] Completed Epoch 13 batch: 7 outputs back to cpu 0.495 ms, 9.97 s total | |
[ 2023-10-08 02:56:34 ] Completed Epoch 13 batch: 7 update evaluator 5.886 ms, 9.98 s total | |
Saving checkpoint at epoch 13 eval batch 7 | |
[ 2023-10-08 02:56:35 ] Completed saving temp checkpoint 1,058.776 ms, 11.03 s total | |
[ 2023-10-08 02:56:35 ] Completed replacing temp checkpoint with checkpoint 62.169 ms, 11.10 s total | |
[ 2023-10-08 02:56:35 ] Completed Epoch 13 batch: 8 moving to device 1.170 ms, 11.10 s total | |
[ 2023-10-08 02:56:35 ] Completed Epoch 13 batch: 8 forward through model 55.248 ms, 11.15 s total | |
[ 2023-10-08 02:56:35 ] Completed Epoch 13 batch: 8 outputs back to cpu 13.509 ms, 11.17 s total | |
[ 2023-10-08 02:56:35 ] Completed Epoch 13 batch: 8 update evaluator 23.607 ms, 11.19 s total | |
Saving checkpoint at epoch 13 eval batch 8 | |
[ 2023-10-08 02:56:36 ] Completed saving temp checkpoint 1,370.896 ms, 12.56 s total | |
[ 2023-10-08 02:56:36 ] Completed replacing temp checkpoint with checkpoint 60.258 ms, 12.62 s total | |
[ 2023-10-08 02:56:36 ] Completed Epoch 13 batch: 9 moving to device 2.620 ms, 12.62 s total | |
[ 2023-10-08 02:56:36 ] Completed Epoch 13 batch: 9 forward through model 59.546 ms, 12.68 s total | |
[ 2023-10-08 02:56:36 ] Completed Epoch 13 batch: 9 outputs back to cpu 23.720 ms, 12.71 s total | |
[ 2023-10-08 02:56:36 ] Completed Epoch 13 batch: 9 update evaluator 45.755 ms, 12.75 s total | |
Saving checkpoint at epoch 13 eval batch 9 | |
[ 2023-10-08 02:56:38 ] Completed saving temp checkpoint 1,240.061 ms, 13.99 s total | |
[ 2023-10-08 02:56:38 ] Completed replacing temp checkpoint with checkpoint 74.170 ms, 14.07 s total | |
[ 2023-10-08 02:56:38 ] Completed Epoch 13 batch: 10 moving to device 3.015 ms, 14.07 s total | |
[ 2023-10-08 02:56:38 ] Completed Epoch 13 batch: 10 forward through model 61.416 ms, 14.13 s total | |
[ 2023-10-08 02:56:38 ] Completed Epoch 13 batch: 10 outputs back to cpu 25.763 ms, 14.16 s total | |
[ 2023-10-08 02:56:38 ] Completed Epoch 13 batch: 10 update evaluator 50.560 ms, 14.21 s total | |
Saving checkpoint at epoch 13 eval batch 10 | |
[ 2023-10-08 02:56:40 ] Completed saving temp checkpoint 1,844.147 ms, 16.05 s total | |
[ 2023-10-08 02:56:40 ] Completed replacing temp checkpoint with checkpoint 106.231 ms, 16.16 s total | |
[ 2023-10-08 02:56:40 ] Completed Epoch 13 batch: 11 moving to device 4.444 ms, 16.16 s total | |
[ 2023-10-08 02:56:40 ] Completed Epoch 13 batch: 11 forward through model 82.146 ms, 16.25 s total | |
[ 2023-10-08 02:56:40 ] Completed Epoch 13 batch: 11 outputs back to cpu 0.585 ms, 16.25 s total | |
[ 2023-10-08 02:56:40 ] Completed Epoch 13 batch: 11 update evaluator 10.273 ms, 16.26 s total | |
Saving checkpoint at epoch 13 eval batch 11 | |
[ 2023-10-08 02:56:41 ] Completed saving temp checkpoint 1,109.043 ms, 17.37 s total | |
[ 2023-10-08 02:56:41 ] Completed replacing temp checkpoint with checkpoint 53.566 ms, 17.42 s total | |
[ 2023-10-08 02:56:41 ] Completed Epoch 13 batch: 12 moving to device 3.107 ms, 17.42 s total | |
[ 2023-10-08 02:56:41 ] Completed Epoch 13 batch: 12 forward through model 49.196 ms, 17.47 s total | |
[ 2023-10-08 02:56:41 ] Completed Epoch 13 batch: 12 outputs back to cpu 2.761 ms, 17.47 s total | |
[ 2023-10-08 02:56:41 ] Completed Epoch 13 batch: 12 update evaluator 15.456 ms, 17.49 s total | |
Saving checkpoint at epoch 13 eval batch 12 | |
[ 2023-10-08 02:56:43 ] Completed saving temp checkpoint 1,693.807 ms, 19.18 s total | |
[ 2023-10-08 02:56:43 ] Completed replacing temp checkpoint with checkpoint 113.595 ms, 19.30 s total | |
[ 2023-10-08 02:56:43 ] Completed Epoch 13 batch: 13 moving to device 4.056 ms, 19.30 s total | |
[ 2023-10-08 02:56:43 ] Completed Epoch 13 batch: 13 forward through model 51.970 ms, 19.35 s total | |
[ 2023-10-08 02:56:43 ] Completed Epoch 13 batch: 13 outputs back to cpu 9.337 ms, 19.36 s total | |
[ 2023-10-08 02:56:43 ] Completed Epoch 13 batch: 13 update evaluator 19.254 ms, 19.38 s total | |
Saving checkpoint at epoch 13 eval batch 13 | |
[ 2023-10-08 02:56:45 ] Completed saving temp checkpoint 1,536.467 ms, 20.92 s total | |
[ 2023-10-08 02:56:45 ] Completed replacing temp checkpoint with checkpoint 90.446 ms, 21.01 s total | |
[ 2023-10-08 02:56:45 ] Completed Epoch 13 batch: 14 moving to device 3.544 ms, 21.01 s total | |
[ 2023-10-08 02:56:45 ] Completed Epoch 13 batch: 14 forward through model 54.640 ms, 21.07 s total | |
[ 2023-10-08 02:56:45 ] Completed Epoch 13 batch: 14 outputs back to cpu 12.330 ms, 21.08 s total | |
[ 2023-10-08 02:56:45 ] Completed Epoch 13 batch: 14 update evaluator 26.097 ms, 21.10 s total | |
Saving checkpoint at epoch 13 eval batch 14 | |
[ 2023-10-08 02:56:46 ] Completed saving temp checkpoint 1,496.570 ms, 22.60 s total | |
[ 2023-10-08 02:56:46 ] Completed replacing temp checkpoint with checkpoint 75.320 ms, 22.68 s total | |
[ 2023-10-08 02:56:46 ] Completed Epoch 13 batch: 15 moving to device 4.263 ms, 22.68 s total | |
[ 2023-10-08 02:56:46 ] Completed Epoch 13 batch: 15 forward through model 102.364 ms, 22.78 s total | |
[ 2023-10-08 02:56:47 ] Completed Epoch 13 batch: 15 outputs back to cpu 32.962 ms, 22.82 s total | |
WARNING:torch.distributed.run: | |
***************************************** | |
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
***************************************** | |
[ 2023-10-08 03:10:00 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 03:10:00 ] Completed importing Timer 0.021 ms, 0.00 s total | |
[ 2023-10-08 03:10:01 ] Completed importing everything else 590.312 ms, 0.59 s total | |
[ 2023-10-08 03:10:01 ] Completed defined other functions 0.021 ms, 0.59 s total | |
| distributed init (rank 3): env:// | |
| distributed init (rank 0): env:// | |
| distributed init (rank 4): env:// | |
| distributed init (rank 2): env:// | |
| distributed init (rank 5): env:// | |
| distributed init (rank 1): env:// | |
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl') | |
[ 2023-10-08 03:10:08 ] Completed main preliminaries 7,424.900 ms, 8.02 s total | |
loading annotations into memory... | |
Done (t=10.19s) | |
creating index... | |
index created! | |
loading annotations into memory... | |
Done (t=0.27s) | |
creating index... | |
index created! | |
[ 2023-10-08 03:10:20 ] Completed loading data 11,899.787 ms, 19.92 s total | |
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization | |
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] | |
[ 2023-10-08 03:10:20 ] Completed creating data samplers 96.352 ms, 20.01 s total | |
[ 2023-10-08 03:10:20 ] Completed creating data loaders 0.206 ms, 20.01 s total | |
[ 2023-10-08 03:10:21 ] Completed creating model and .to(device) 645.497 ms, 20.66 s total | |
[ 2023-10-08 03:10:23 ] Completed preparing model for distributed training 2,345.412 ms, 23.00 s total | |
[ 2023-10-08 03:10:23 ] Completed optimizer and scaler 0.607 ms, 23.00 s total | |
[ 2023-10-08 03:10:23 ] Completed learning rate schedulers 0.258 ms, 23.00 s total | |
[ 2023-10-08 03:10:24 ] Completed init coco evaluator 951.904 ms, 23.96 s total | |
RESUMING FROM CURRENT JOB | |
[ 2023-10-08 03:10:25 ] Completed retrieving checkpoint 988.624 ms, 24.94 s total | |
EPOCH :: 13 | |
[ 2023-10-08 03:10:25 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 03:10:25 ] Completed training preliminaries 0.853 ms, 0.00 s total | |
Training / resuming epoch 13 from training step 889 | |
[ 2023-10-08 03:10:26 ] Completed Start 0.000 ms, 0.00 s total | |
[ 2023-10-08 03:10:26 ] Completed starting evaluation routine 0.099 ms, 0.00 s total | |
[ 2023-10-08 03:10:26 ] Completed evaluation preliminaries 38.742 ms, 0.04 s total | |
Evaluating / resuming epoch 13 from eval step 15 | |
[ 2023-10-08 03:10:26 ] Completed Epoch 13 batch: 15 moving to device 317.782 ms, 0.36 s total | |
[ 2023-10-08 03:10:27 ] Completed Epoch 13 batch: 15 forward through model 680.338 ms, 1.04 s total | |
[ 2023-10-08 03:10:27 ] Completed Epoch 13 batch: 15 outputs back to cpu 34.189 ms, 1.07 s total | |
[ 2023-10-08 03:10:27 ] Completed Epoch 13 batch: 15 update evaluator 60.929 ms, 1.13 s total | |
Saving checkpoint at epoch 13 eval batch 15 | |
[ 2023-10-08 03:10:28 ] Completed saving temp checkpoint 880.177 ms, 2.01 s total | |
[ 2023-10-08 03:10:28 ] Completed replacing temp checkpoint with checkpoint 188.812 ms, 2.20 s total | |
[ 2023-10-08 03:10:28 ] Completed Epoch 13 batch: 16 moving to device 2.228 ms, 2.20 s total | |
[ 2023-10-08 03:10:28 ] Completed Epoch 13 batch: 16 forward through model 116.323 ms, 2.32 s total | |
[ 2023-10-08 03:10:28 ] Completed Epoch 13 batch: 16 outputs back to cpu 28.430 ms, 2.35 s total | |
[ 2023-10-08 03:10:28 ] Completed Epoch 13 batch: 16 update evaluator 59.894 ms, 2.41 s total | |
Saving checkpoint at epoch 13 eval batch 16 | |
[ 2023-10-08 03:10:30 ] Completed saving temp checkpoint 1,429.590 ms, 3.84 s total | |
[ 2023-10-08 03:10:30 ] Completed replacing temp checkpoint with checkpoint 64.579 ms, 3.90 s total | |
[ 2023-10-08 03:10:30 ] Completed Epoch 13 batch: 17 moving to device 1.801 ms, 3.90 s total | |
[ 2023-10-08 03:10:30 ] Completed Epoch 13 batch: 17 forward through model 123.185 ms, 4.03 s total | |
[ 2023-10-08 03:10:30 ] Completed Epoch 13 batch: 17 outputs back to cpu 23.601 ms, 4.05 s total | |
[ 2023-10-08 03:10:30 ] Completed Epoch 13 batch: 17 update evaluator 57.694 ms, 4.11 s total | |
Saving checkpoint at epoch 13 eval batch 17 | |
[ 2023-10-08 03:10:31 ] Completed saving temp checkpoint 1,328.422 ms, 5.44 s total | |
[ 2023-10-08 03:10:31 ] Completed replacing temp checkpoint with checkpoint 91.615 ms, 5.53 s total | |
[ 2023-10-08 03:10:31 ] Completed Epoch 13 batch: 18 moving to device 8.407 ms, 5.54 s total | |
[ 2023-10-08 03:10:32 ] Completed Epoch 13 batch: 18 forward through model 55.460 ms, 5.59 s total | |
[ 2023-10-08 03:10:32 ] Completed Epoch 13 batch: 18 outputs back to cpu 38.000 ms, 5.63 s total | |
[ 2023-10-08 03:10:32 ] Completed Epoch 13 batch: 18 update evaluator 77.280 ms, 5.71 s total | |
Saving checkpoint at epoch 13 eval batch 18 | |
[ 2023-10-08 03:10:33 ] Completed saving temp checkpoint 1,681.993 ms, 7.39 s total | |
[ 2023-10-08 03:10:33 ] Completed replacing temp checkpoint with checkpoint 70.107 ms, 7.46 s total | |
[ 2023-10-08 03:10:33 ] Completed Epoch 13 batch: 19 moving to device 1.701 ms, 7.46 s total | |
[ 2023-10-08 03:10:34 ] Completed Epoch 13 batch: 19 forward through model 92.426 ms, 7.55 s total | |
[ 2023-10-08 03:10:34 ] Completed Epoch 13 batch: 19 outputs back to cpu 2.035 ms, 7.56 s total | |
[ 2023-10-08 03:10:34 ] Completed Epoch 13 batch: 19 update evaluator 18.178 ms, 7.57 s total | |
Saving checkpoint at epoch 13 eval batch 19 | |
[ 2023-10-08 03:10:35 ] Completed saving temp checkpoint 1,016.275 ms, 8.59 s total | |
[ 2023-10-08 03:10:35 ] Completed replacing temp checkpoint with checkpoint 75.480 ms, 8.67 s total | |
[ 2023-10-08 03:10:35 ] Completed Epoch 13 batch: 20 moving to device 1.634 ms, 8.67 s total | |
[ 2023-10-08 03:10:35 ] Completed Epoch 13 batch: 20 forward through model 85.484 ms, 8.75 s total | |
[ 2023-10-08 03:10:35 ] Completed Epoch 13 batch: 20 outputs back to cpu 0.236 ms, 8.75 s total | |
[ 2023-10-08 03:10:35 ] Completed Epoch 13 batch: 20 update evaluator 3.921 ms, 8.76 s total | |
Saving checkpoint at epoch 13 eval batch 20 | |
[ 2023-10-08 03:10:36 ] Completed saving temp checkpoint 1,176.232 ms, 9.93 s total | |
[ 2023-10-08 03:10:36 ] Completed replacing temp checkpoint with checkpoint 67.784 ms, 10.00 s total | |
[ 2023-10-08 03:10:36 ] Completed Epoch 13 batch: 21 moving to device 1.686 ms, 10.00 s total | |
[ 2023-10-08 03:10:36 ] Completed Epoch 13 batch: 21 forward through model 51.000 ms, 10.05 s total | |
[ 2023-10-08 03:10:36 ] Completed Epoch 13 batch: 21 outputs back to cpu 2.648 ms, 10.06 s total | |
[ 2023-10-08 03:10:36 ] Completed Epoch 13 batch: 21 update evaluator 18.715 ms, 10.08 s total | |
Saving checkpoint at epoch 13 eval batch 21 | |
[ 2023-10-08 03:10:37 ] Completed saving temp checkpoint 1,094.010 ms, 11.17 s total | |
[ 2023-10-08 03:10:37 ] Completed replacing temp checkpoint with checkpoint 72.610 ms, 11.24 s total | |
[ 2023-10-08 03:10:37 ] Completed Epoch 13 batch: 22 moving to device 2.736 ms, 11.24 s total | |
[ 2023-10-08 03:10:37 ] Completed Epoch 13 batch: 22 forward through model 53.906 ms, 11.30 s total | |
[ 2023-10-08 03:10:37 ] Completed Epoch 13 batch: 22 outputs back to cpu 0.532 ms, 11.30 s total | |
[ 2023-10-08 03:10:37 ] Completed Epoch 13 batch: 22 update evaluator 7.192 ms, 11.31 s total | |
Saving checkpoint at epoch 13 eval batch 22 | |
[ 2023-10-08 03:10:38 ] Completed saving temp checkpoint 1,226.853 ms, 12.53 s total | |
[ 2023-10-08 03:10:39 ] Completed replacing temp checkpoint with checkpoint 65.240 ms, 12.60 s total | |
[ 2023-10-08 03:10:39 ] Completed Epoch 13 batch: 23 moving to device 2.618 ms, 12.60 s total | |
[ 2023-10-08 03:10:39 ] Completed Epoch 13 batch: 23 forward through model 54.172 ms, 12.65 s total | |
[ 2023-10-08 03:10:39 ] Completed Epoch 13 batch: 23 outputs back to cpu 18.088 ms, 12.67 s total | |
[ 2023-10-08 03:10:39 ] Completed Epoch 13 batch: 23 update evaluator 28.648 ms, 12.70 s total | |
Saving checkpoint at epoch 13 eval batch 23 | |
[ 2023-10-08 03:10:40 ] Completed saving temp checkpoint 1,077.279 ms, 13.78 s total | |
[ 2023-10-08 03:10:40 ] Completed replacing temp checkpoint with checkpoint 79.412 ms, 13.86 s total | |
[ 2023-10-08 03:10:40 ] Completed Epoch 13 batch: 24 moving to device 3.492 ms, 13.86 s total | |
[ 2023-10-08 03:10:40 ] Completed Epoch 13 batch: 24 forward through model 74.582 ms, 13.94 s total | |
[ 2023-10-08 03:10:40 ] Completed Epoch 13 batch: 24 outputs back to cpu 33.964 ms, 13.97 s total | |
[ 2023-10-08 03:10:40 ] Completed Epoch 13 batch: 24 update evaluator 84.066 ms, 14.05 s total | |
Saving checkpoint at epoch 13 eval batch 24 | |
[ 2023-10-08 03:10:41 ] Completed saving temp checkpoint 1,207.352 ms, 15.26 s total | |
[ 2023-10-08 03:10:41 ] Completed replacing temp checkpoint with checkpoint 85.369 ms, 15.35 s total | |
[ 2023-10-08 03:10:41 ] Completed Epoch 13 batch: 25 moving to device 4.297 ms, 15.35 s total | |
[ 2023-10-08 03:10:41 ] Completed Epoch 13 batch: 25 forward through model 49.299 ms, 15.40 s total | |
[ 2023-10-08 03:10:41 ] Completed Epoch 13 batch: 25 outputs back to cpu 1.788 ms, 15.40 s total | |
[ 2023-10-08 03:10:41 ] Completed Epoch 13 batch: 25 update evaluator 19.332 ms, 15.42 s total | |
Saving checkpoint at epoch 13 eval batch 25 | |
[ 2023-10-08 03:10:42 ] Completed saving temp checkpoint 1,090.093 ms, 16.51 s total | |
[ 2023-10-08 03:10:43 ] Completed replacing temp checkpoint with checkpoint 76.838 ms, 16.59 s total | |
[ 2023-10-08 03:10:43 ] Completed Epoch 13 batch: 26 moving to device 3.862 ms, 16.59 s total | |
[ 2023-10-08 03:10:43 ] Completed Epoch 13 batch: 26 forward through model 52.686 ms, 16.65 s total | |
[ 2023-10-08 03:10:43 ] Completed Epoch 13 batch: 26 outputs back to cpu 15.129 ms, 16.66 s total | |
[ 2023-10-08 03:10:43 ] Completed Epoch 13 batch: 26 update evaluator 30.085 ms, 16.69 s total | |
Saving checkpoint at epoch 13 eval batch 26 | |
[ 2023-10-08 03:10:44 ] Completed saving temp checkpoint 1,292.559 ms, 17.98 s total | |
[ 2023-10-08 03:10:44 ] Completed replacing temp checkpoint with checkpoint 84.476 ms, 18.07 s total | |
[ 2023-10-08 03:10:44 ] Completed Epoch 13 batch: 27 moving to device 4.664 ms, 18.07 s total | |
[ 2023-10-08 03:10:44 ] Completed Epoch 13 batch: 27 forward through model 55.312 ms, 18.13 s total | |
[ 2023-10-08 03:10:44 ] Completed Epoch 13 batch: 27 outputs back to cpu 15.942 ms, 18.14 s total | |
[ 2023-10-08 03:10:44 ] Completed Epoch 13 batch: 27 update evaluator 30.231 ms, 18.17 s total | |
Saving checkpoint at epoch 13 eval batch 27 | |
[ 2023-10-08 03:10:46 ] Completed saving temp checkpoint 1,672.633 ms, 19.85 s total | |
[ 2023-10-08 03:10:46 ] Completed replacing temp checkpoint with checkpoint 51.342 ms, 19.90 s total | |
[ 2023-10-08 03:10:46 ] Completed Epoch 13 batch: 28 moving to device 4.533 ms, 19.90 s total | |
[ 2023-10-08 03:10:46 ] Completed Epoch 13 batch: 28 forward through model 92.320 ms, 19.99 s total | |
[ 2023-10-08 03:10:46 ] Completed Epoch 13 batch: 28 outputs back to cpu 9.571 ms, 20.00 s total | |
[ 2023-10-08 03:10:46 ] Completed Epoch 13 batch: 28 update evaluator 20.959 ms, 20.03 s total | |
Saving checkpoint at epoch 13 eval batch 28 | |
[ 2023-10-08 03:10:48 ] Completed saving temp checkpoint 1,763.604 ms, 21.79 s total | |
[ 2023-10-08 03:10:48 ] Completed replacing temp checkpoint with checkpoint 92.948 ms, 21.88 s total | |
[ 2023-10-08 03:10:48 ] Completed Epoch 13 batch: 29 moving to device 2.655 ms, 21.88 s total | |
[ 2023-10-08 03:10:48 ] Completed Epoch 13 batch: 29 forward through model 47.961 ms, 21.93 s total | |
[ 2023-10-08 03:10:48 ] Completed Epoch 13 batch: 29 outputs back to cpu 1.459 ms, 21.93 s total | |
[ 2023-10-08 03:10:48 ] Completed Epoch 13 batch: 29 update evaluator 14.077 ms, 21.95 s total | |
Saving checkpoint at epoch 13 eval batch 29 | |
[ 2023-10-08 03:10:49 ] Completed saving temp checkpoint 1,244.354 ms, 23.19 s total | |
[ 2023-10-08 03:10:49 ] Completed replacing temp checkpoint with checkpoint 60.819 ms, 23.25 s total | |
[ 2023-10-08 03:10:49 ] Completed Epoch 13 batch: 30 moving to device 2.840 ms, 23.26 s total | |
[ 2023-10-08 03:10:49 ] Completed Epoch 13 batch: 30 forward through model 76.834 ms, 23.33 s total | |
[ 2023-10-08 03:10:49 ] Completed Epoch 13 batch: 30 outputs back to cpu 5.922 ms, 23.34 s total | |
[ 2023-10-08 03:10:49 ] Completed Epoch 13 batch: 30 update evaluator 17.096 ms, 23.36 s total | |
Saving checkpoint at epoch 13 eval batch 30 | |
[ 2023-10-08 03:10:51 ] Completed saving temp checkpoint 1,436.404 ms, 24.79 s total | |
[ 2023-10-08 03:10:51 ] Completed replacing temp checkpoint with checkpoint 80.917 ms, 24.87 s total | |
[ 2023-10-08 03:10:51 ] Completed Epoch 13 batch: 31 moving to device 3.006 ms, 24.88 s total | |
[ 2023-10-08 03:10:51 ] Completed Epoch 13 batch: 31 forward through model 43.127 ms, 24.92 s total | |
[ 2023-10-08 03:10:51 ] Completed Epoch 13 batch: 31 outputs back to cpu 17.243 ms, 24.94 s total | |
[ 2023-10-08 03:10:51 ] Completed Epoch 13 batch: 31 update evaluator 28.655 ms, 24.97 s total | |
Saving checkpoint at epoch 13 eval batch 31 | |
[ 2023-10-08 03:10:54 ] Completed saving temp checkpoint 2,732.500 ms, 27.70 s total | |
[ 2023-10-08 03:10:54 ] Completed replacing temp checkpoint with checkpoint 66.451 ms, 27.76 s total | |
[ 2023-10-08 03:10:54 ] Completed Epoch 13 batch: 32 moving to device 2.839 ms, 27.77 s total | |
[ 2023-10-08 03:10:54 ] Completed Epoch 13 batch: 32 forward through model 54.000 ms, 27.82 s total | |
[ 2023-10-08 03:10:54 ] Completed Epoch 13 batch: 32 outputs back to cpu 3.098 ms, 27.82 s total | |
[ 2023-10-08 03:10:54 ] Completed Epoch 13 batch: 32 update evaluator 25.493 ms, 27.85 s total | |
Saving checkpoint at epoch 13 eval batch 32 | |
[ 2023-10-08 03:10:55 ] Completed saving temp checkpoint 1,447.735 ms, 29.30 s total | |
[ 2023-10-08 03:10:55 ] Completed replacing temp checkpoint with checkpoint 53.265 ms, 29.35 s total | |
[ 2023-10-08 03:10:55 ] Completed Epoch 13 batch: 33 moving to device 2.733 ms, 29.35 s total | |
[ 2023-10-08 03:10:55 ] Completed Epoch 13 batch: 33 forward through model 45.863 ms, 29.40 s total | |
[ 2023-10-08 03:10:55 ] Completed Epoch 13 batch: 33 outputs back to cpu 7.476 ms, 29.41 s total | |
[ 2023-10-08 03:10:55 ] Completed Epoch 13 batch: 33 update evaluator 17.130 ms, 29.42 s total | |
Saving checkpoint at epoch 13 eval batch 33 | |
[ 2023-10-08 03:10:57 ] Completed saving temp checkpoint 1,319.541 ms, 30.74 s total | |
[ 2023-10-08 03:10:57 ] Completed replacing temp checkpoint with checkpoint 88.119 ms, 30.83 s total | |
[ 2023-10-08 03:10:57 ] Completed Epoch 13 batch: 34 moving to device 3.924 ms, 30.84 s total | |
[ 2023-10-08 03:10:57 ] Completed Epoch 13 batch: 34 forward through model 63.677 ms, 30.90 s total | |
[ 2023-10-08 03:10:57 ] Completed Epoch 13 batch: 34 outputs back to cpu 33.674 ms, 30.93 s total | |
[ 2023-10-08 03:10:57 ] Completed Epoch 13 batch: 34 update evaluator 61.421 ms, 30.99 s total | |
Saving checkpoint at epoch 13 eval batch 34 | |
[ 2023-10-08 03:10:58 ] Completed saving temp checkpoint 1,440.214 ms, 32.43 s total | |
[ 2023-10-08 03:10:58 ] Completed replacing temp checkpoint with checkpoint 90.246 ms, 32.52 s total | |
[ 2023-10-08 03:10:58 ] Completed Epoch 13 batch: 35 moving to device 4.571 ms, 32.53 s total | |
[ 2023-10-08 03:10:59 ] Completed Epoch 13 batch: 35 forward through model 50.663 ms, 32.58 s total | |
[ 2023-10-08 03:10:59 ] Completed Epoch 13 batch: 35 outputs back to cpu 1.841 ms, 32.58 s total | |
[ 2023-10-08 03:10:59 ] Completed Epoch 13 batch: 35 update evaluator 21.221 ms, 32.60 s total | |
Saving checkpoint at epoch 13 eval batch 35 | |
[ 2023-10-08 03:11:00 ] Completed saving temp checkpoint 1,319.335 ms, 33.92 s total | |
[ 2023-10-08 03:11:00 ] Completed replacing temp checkpoint with checkpoint 61.727 ms, 33.98 s total | |
[ 2023-10-08 03:11:00 ] Completed Epoch 13 batch: 36 moving to device 4.079 ms, 33.99 s total | |
[ 2023-10-08 03:11:00 ] Completed Epoch 13 batch: 36 forward through model 61.349 ms, 34.05 s total | |
[ 2023-10-08 03:11:00 ] Completed Epoch 13 batch: 36 outputs back to cpu 23.733 ms, 34.07 s total | |
[ 2023-10-08 03:11:00 ] Completed Epoch 13 batch: 36 update evaluator 40.745 ms, 34.11 s total | |
Saving checkpoint at epoch 13 eval batch 36 | |
[ 2023-10-08 03:11:02 ] Completed saving temp checkpoint 1,798.649 ms, 35.91 s total | |
[ 2023-10-08 03:11:02 ] Completed replacing temp checkpoint with checkpoint 71.565 ms, 35.98 s total | |
[ 2023-10-08 03:11:02 ] Completed Epoch 13 batch: 37 moving to device 3.534 ms, 35.99 s total | |
[ 2023-10-08 03:11:02 ] Completed Epoch 13 batch: 37 forward through model 71.762 ms, 36.06 s total | |
[ 2023-10-08 03:11:02 ] Completed Epoch 13 batch: 37 outputs back to cpu 23.564 ms, 36.08 s total | |
[ 2023-10-08 03:11:02 ] Completed Epoch 13 batch: 37 update evaluator 55.991 ms, 36.14 s total | |
Saving checkpoint at epoch 13 eval batch 37 | |
[ 2023-10-08 03:11:03 ] Completed saving temp checkpoint 1,340.792 ms, 37.48 s total | |
[ 2023-10-08 03:11:04 ] Completed replacing temp checkpoint with checkpoint 93.663 ms, 37.57 s total | |
[ 2023-10-08 03:11:04 ] Completed Epoch 13 batch: 38 moving to device 4.057 ms, 37.58 s total | |
[ 2023-10-08 03:11:04 ] Completed Epoch 13 batch: 38 forward through model 50.859 ms, 37.63 s total | |
[ 2023-10-08 03:11:04 ] Completed Epoch 13 batch: 38 outputs back to cpu 15.756 ms, 37.64 s total | |
[ 2023-10-08 03:11:04 ] Completed Epoch 13 batch: 38 update evaluator 31.196 ms, 37.68 s total | |
Saving checkpoint at epoch 13 eval batch 38 | |
[ 2023-10-08 03:11:05 ] Completed saving temp checkpoint 1,482.176 ms, 39.16 s total | |
[ 2023-10-08 03:11:05 ] Completed replacing temp checkpoint with checkpoint 74.504 ms, 39.23 s total | |
[ 2023-10-08 03:11:05 ] Completed Epoch 13 batch: 39 moving to device 4.184 ms, 39.24 s total | |
[ 2023-10-08 03:11:05 ] Completed Epoch 13 batch: 39 forward through model 46.789 ms, 39.28 s total | |
[ 2023-10-08 03:11:05 ] Completed Epoch 13 batch: 39 outputs back to cpu 14.758 ms, 39.30 s total | |
[ 2023-10-08 03:11:05 ] Completed Epoch 13 batch: 39 update evaluator 26.291 ms, 39.32 s total | |
Saving checkpoint at epoch 13 eval batch 39 | |
[ 2023-10-08 03:11:07 ] Completed saving temp checkpoint 1,756.377 ms, 41.08 s total | |
[ 2023-10-08 03:11:07 ] Completed replacing temp checkpoint with checkpoint 75.036 ms, 41.16 s total | |
[ 2023-10-08 03:11:07 ] Completed Epoch 13 batch: 40 moving to device 3.187 ms, 41.16 s total | |
[ 2023-10-08 03:11:07 ] Completed Epoch 13 batch: 40 forward through model 42.730 ms, 41.20 s total | |
[ 2023-10-08 03:11:07 ] Completed Epoch 13 batch: 40 outputs back to cpu 0.812 ms, 41.20 s total | |
[ 2023-10-08 03:11:07 ] Completed Epoch 13 batch: 40 update evaluator 10.331 ms, 41.21 s total | |
Saving checkpoint at epoch 13 eval batch 40 | |
[ 2023-10-08 03:11:09 ] Completed saving temp checkpoint 1,837.132 ms, 43.05 s total | |
[ 2023-10-08 03:11:09 ] Completed replacing temp checkpoint with checkpoint 71.679 ms, 43.12 s total | |
[ 2023-10-08 03:11:09 ] Completed Epoch 13 batch: 41 moving to device 3.599 ms, 43.12 s total | |
[ 2023-10-08 03:11:09 ] Completed Epoch 13 batch: 41 forward through model 44.046 ms, 43.17 s total | |
[ 2023-10-08 03:11:09 ] Completed Epoch 13 batch: 41 outputs back to cpu 0.888 ms, 43.17 s total | |
[ 2023-10-08 03:11:09 ] Completed Epoch 13 batch: 41 update evaluator 9.602 ms, 43.18 s total | |
Saving checkpoint at epoch 13 eval batch 41 | |
[ 2023-10-08 03:11:11 ] Completed saving temp checkpoint 1,381.114 ms, 44.56 s total | |
[ 2023-10-08 03:11:11 ] Completed replacing temp checkpoint with checkpoint 87.767 ms, 44.65 s total | |
[ 2023-10-08 03:11:11 ] Completed Epoch 13 batch: 42 moving to device 3.751 ms, 44.65 s total | |
[ 2023-10-08 03:11:11 ] Completed Epoch 13 batch: 42 forward through model 58.221 ms, 44.71 s total | |
[ 2023-10-08 03:11:11 ] Completed Epoch 13 batch: 42 outputs back to cpu 22.767 ms, 44.73 s total | |
[ 2023-10-08 03:11:11 ] Completed Epoch 13 batch: 42 update evaluator 38.435 ms, 44.77 s total | |
Saving checkpoint at epoch 13 eval batch 42 | |
[ 2023-10-08 03:11:12 ] Completed saving temp checkpoint 1,475.712 ms, 46.25 s total | |
[ 2023-10-08 03:11:12 ] Completed replacing temp checkpoint with checkpoint 77.050 ms, 46.32 s total | |
[ 2023-10-08 03:11:12 ] Completed Epoch 13 batch: 43 moving to device 4.458 ms, 46.33 s total | |
[ 2023-10-08 03:11:12 ] Completed Epoch 13 batch: 43 forward through model 41.484 ms, 46.37 s total | |
[ 2023-10-08 03:11:12 ] Completed Epoch 13 batch: 43 outputs back to cpu 1.729 ms, 46.37 s total | |
[ 2023-10-08 03:11:12 ] Completed Epoch 13 batch: 43 update evaluator 16.873 ms, 46.39 s total | |
Saving checkpoint at epoch 13 eval batch 43 | |
[ 2023-10-08 03:11:14 ] Completed saving temp checkpoint 1,356.612 ms, 47.75 s total | |
[ 2023-10-08 03:11:14 ] Completed replacing temp checkpoint with checkpoint 65.703 ms, 47.81 s total | |
[ 2023-10-08 03:11:14 ] Completed Epoch 13 batch: 44 moving to device 2.640 ms, 47.81 s total | |
[ 2023-10-08 03:11:14 ] Completed Epoch 13 batch: 44 forward through model 44.760 ms, 47.86 s total | |
[ 2023-10-08 03:11:14 ] Completed Epoch 13 batch: 44 outputs back to cpu 1.067 ms, 47.86 s total | |
[ 2023-10-08 03:11:14 ] Completed Epoch 13 batch: 44 update evaluator 9.824 ms, 47.87 s total | |
Saving checkpoint at epoch 13 eval batch 44 | |
[ 2023-10-08 03:11:15 ] Completed saving temp checkpoint 1,424.392 ms, 49.29 s total | |
[ 2023-10-08 03:11:15 ] Completed replacing temp checkpoint with checkpoint 60.214 ms, 49.35 s total | |
[ 2023-10-08 03:11:15 ] Completed Epoch 13 batch: 45 moving to device 2.624 ms, 49.36 s total | |
[ 2023-10-08 03:11:15 ] Completed Epoch 13 batch: 45 forward through model 92.847 ms, 49.45 s total | |
[ 2023-10-08 03:11:15 ] Completed Epoch 13 batch: 45 outputs back to cpu 24.480 ms, 49.47 s total | |
[ 2023-10-08 03:11:15 ] Completed Epoch 13 batch: 45 update evaluator 43.413 ms, 49.52 s total | |
Saving checkpoint at epoch 13 eval batch 45 | |
[ 2023-10-08 03:11:17 ] Completed saving temp checkpoint 1,534.886 ms, 51.05 s total | |
[ 2023-10-08 03:11:17 ] Completed replacing temp checkpoint with checkpoint 70.143 ms, 51.12 s total | |
[ 2023-10-08 03:11:17 ] Completed Epoch 13 batch: 46 moving to device 4.401 ms, 51.13 s total | |
[ 2023-10-08 03:11:17 ] Completed Epoch 13 batch: 46 forward through model 52.652 ms, 51.18 s total | |
[ 2023-10-08 03:11:17 ] Completed Epoch 13 batch: 46 outputs back to cpu 10.537 ms, 51.19 s total | |
[ 2023-10-08 03:11:17 ] Completed Epoch 13 batch: 46 update evaluator 23.979 ms, 51.21 s total | |
Saving checkpoint at epoch 13 eval batch 46 | |
[ 2023-10-08 03:11:19 ] Completed saving temp checkpoint 1,734.566 ms, 52.95 s total | |
[ 2023-10-08 03:11:19 ] Completed replacing temp checkpoint with checkpoint 67.724 ms, 53.02 s total | |
[ 2023-10-08 03:11:19 ] Completed Epoch 13 batch: 47 moving to device 4.365 ms, 53.02 s total | |
[ 2023-10-08 03:11:19 ] Completed Epoch 13 batch: 47 forward through model 44.784 ms, 53.07 s total | |
[ 2023-10-08 03:11:19 ] Completed Epoch 13 batch: 47 outputs back to cpu 1.993 ms, 53.07 s total | |
[ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment