Skip to content

Instantly share code, notes, and snippets.

@adam-peaston-SC
Last active October 10, 2023 20:27
Show Gist options
  • Save adam-peaston-SC/e5e5f3dbd1469bf8d7bd0e8d41f471ed to your computer and use it in GitHub Desktop.
Save adam-peaston-SC/e5e5f3dbd1469bf8d7bd0e8d41f471ed to your computer and use it in GitHub Desktop.
This file has been truncated, but you can view the full file.
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 20:29:48 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 20:29:48 ] Completed importing Timer 0.020 ms, 0.00 s total
[ 2023-10-07 20:29:48 ] Completed importing everything else 645.807 ms, 0.65 s total
[ 2023-10-07 20:29:48 ] Completed defined other functions 0.021 ms, 0.65 s total
| distributed init (rank 5): env://
| distributed init (rank 4): env://
| distributed init (rank 3): env://
| distributed init (rank 2): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 20:29:51 ] Completed main preliminaries 2,672.082 ms, 3.32 s total
loading annotations into memory...
Done (t=12.39s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 20:30:06 ] Completed loading data 14,458.412 ms, 17.78 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 20:30:06 ] Completed creating data samplers 102.056 ms, 17.88 s total
[ 2023-10-07 20:30:06 ] Completed creating data loaders 0.209 ms, 17.88 s total
[ 2023-10-07 20:30:07 ] Completed creating model and .to(device) 1,784.297 ms, 19.66 s total
[ 2023-10-07 20:30:08 ] Completed preparing model for distributed training 391.627 ms, 20.05 s total
[ 2023-10-07 20:30:08 ] Completed optimizer and scaler 0.568 ms, 20.06 s total
[ 2023-10-07 20:30:08 ] Completed learning rate schedulers 0.123 ms, 20.06 s total
[ 2023-10-07 20:30:09 ] Completed init coco evaluator 966.697 ms, 21.02 s total
RESUMING FROM PREVIOUS JOB /mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc
[ 2023-10-07 20:30:10 ] Completed retrieving checkpoint 953.051 ms, 21.97 s total
EPOCH :: 13
[ 2023-10-07 20:30:10 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 20:30:10 ] Completed training preliminaries 0.831 ms, 0.00 s total
Training / resuming epoch 13 from training step 178
[ 2023-10-07 20:30:10 ] Completed Epoch: 13 batch 178: moving batch data to device 263.850 ms, 0.26 s total
[ 2023-10-07 20:30:16 ] Completed Epoch: 13 batch 178: forward pass 5,668.097 ms, 5.93 s total
[ 2023-10-07 20:30:16 ] Completed Epoch: 13 batch 178: backward pass 320.484 ms, 6.25 s total
[ 2023-10-07 20:30:17 ] Completed Epoch: 13 batch 178: computing loss 919.551 ms, 7.17 s total
EPOCH: [13], BATCH: [178/889], loss: 0.353, loss_box_reg: 0.103, loss_classifier: 0.088, loss_mask: 0.126, loss_objectness: 0.012, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 178
[ 2023-10-07 20:30:19 ] Completed saving temp checkpoint 1,665.854 ms, 8.84 s total
[ 2023-10-07 20:30:19 ] Completed replacing temp checkpoint with checkpoint 23.998 ms, 8.86 s total
[ 2023-10-07 20:30:19 ] Completed Epoch: 13 batch 179: moving batch data to device 18.899 ms, 8.88 s total
[ 2023-10-07 20:30:19 ] Completed Epoch: 13 batch 179: forward pass 306.148 ms, 9.19 s total
[ 2023-10-07 20:30:19 ] Completed Epoch: 13 batch 179: backward pass 149.391 ms, 9.34 s total
[ 2023-10-07 20:30:20 ] Completed Epoch: 13 batch 179: computing loss 1,204.551 ms, 10.54 s total
EPOCH: [13], BATCH: [179/889], loss: 0.363, loss_box_reg: 0.106, loss_classifier: 0.092, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 179
[ 2023-10-07 20:30:21 ] Completed saving temp checkpoint 1,004.944 ms, 11.55 s total
[ 2023-10-07 20:30:21 ] Completed replacing temp checkpoint with checkpoint 65.612 ms, 11.61 s total
[ 2023-10-07 20:30:21 ] Completed Epoch: 13 batch 180: moving batch data to device 21.205 ms, 11.63 s total
[ 2023-10-07 20:30:22 ] Completed Epoch: 13 batch 180: forward pass 320.299 ms, 11.95 s total
[ 2023-10-07 20:30:22 ] Completed Epoch: 13 batch 180: backward pass 375.374 ms, 12.33 s total
[ 2023-10-07 20:30:23 ] Completed Epoch: 13 batch 180: computing loss 926.449 ms, 13.26 s total
EPOCH: [13], BATCH: [180/889], loss: 0.418, loss_box_reg: 0.119, loss_classifier: 0.102, loss_mask: 0.141, loss_objectness: 0.020, loss_rpn_box_reg: 0.036
Saving checkpoint at epoch 13 train batch 180
[ 2023-10-07 20:30:24 ] Completed saving temp checkpoint 1,044.228 ms, 14.30 s total
[ 2023-10-07 20:30:24 ] Completed replacing temp checkpoint with checkpoint 70.090 ms, 14.37 s total
[ 2023-10-07 20:30:24 ] Completed Epoch: 13 batch 181: moving batch data to device 20.761 ms, 14.39 s total
[ 2023-10-07 20:30:25 ] Completed Epoch: 13 batch 181: forward pass 333.843 ms, 14.72 s total
[ 2023-10-07 20:30:25 ] Completed Epoch: 13 batch 181: backward pass 81.624 ms, 14.81 s total
[ 2023-10-07 20:30:26 ] Completed Epoch: 13 batch 181: computing loss 1,816.710 ms, 16.62 s total
EPOCH: [13], BATCH: [181/889], loss: 0.402, loss_box_reg: 0.122, loss_classifier: 0.101, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 181
[ 2023-10-07 20:30:28 ] Completed saving temp checkpoint 1,446.053 ms, 18.07 s total
[ 2023-10-07 20:30:28 ] Completed replacing temp checkpoint with checkpoint 69.901 ms, 18.14 s total
[ 2023-10-07 20:30:28 ] Completed Epoch: 13 batch 182: moving batch data to device 21.792 ms, 18.16 s total
[ 2023-10-07 20:30:28 ] Completed Epoch: 13 batch 182: forward pass 313.629 ms, 18.47 s total
[ 2023-10-07 20:30:29 ] Completed Epoch: 13 batch 182: backward pass 393.885 ms, 18.87 s total
[ 2023-10-07 20:30:30 ] Completed Epoch: 13 batch 182: computing loss 1,555.282 ms, 20.42 s total
EPOCH: [13], BATCH: [182/889], loss: 0.352, loss_box_reg: 0.104, loss_classifier: 0.086, loss_mask: 0.124, loss_objectness: 0.017, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 182
[ 2023-10-07 20:30:31 ] Completed saving temp checkpoint 986.916 ms, 21.41 s total
[ 2023-10-07 20:30:31 ] Completed replacing temp checkpoint with checkpoint 67.232 ms, 21.48 s total
[ 2023-10-07 20:30:31 ] Completed Epoch: 13 batch 183: moving batch data to device 19.967 ms, 21.50 s total
[ 2023-10-07 20:30:32 ] Completed Epoch: 13 batch 183: forward pass 363.909 ms, 21.86 s total
[ 2023-10-07 20:30:32 ] Completed Epoch: 13 batch 183: backward pass 68.638 ms, 21.93 s total
[ 2023-10-07 20:30:33 ] Completed Epoch: 13 batch 183: computing loss 1,089.971 ms, 23.02 s total
EPOCH: [13], BATCH: [183/889], loss: 0.376, loss_box_reg: 0.113, loss_classifier: 0.088, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 183
[ 2023-10-07 20:30:34 ] Completed saving temp checkpoint 1,205.219 ms, 24.23 s total
[ 2023-10-07 20:30:34 ] Completed replacing temp checkpoint with checkpoint 60.021 ms, 24.29 s total
[ 2023-10-07 20:30:34 ] Completed Epoch: 13 batch 184: moving batch data to device 22.305 ms, 24.31 s total
[ 2023-10-07 20:30:34 ] Completed Epoch: 13 batch 184: forward pass 327.497 ms, 24.64 s total
[ 2023-10-07 20:30:35 ] Completed Epoch: 13 batch 184: backward pass 87.571 ms, 24.72 s total
[ 2023-10-07 20:30:36 ] Completed Epoch: 13 batch 184: computing loss 1,211.946 ms, 25.93 s total
EPOCH: [13], BATCH: [184/889], loss: 0.419, loss_box_reg: 0.125, loss_classifier: 0.106, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 184
[ 2023-10-07 20:30:37 ] Completed saving temp checkpoint 1,157.785 ms, 27.09 s total
[ 2023-10-07 20:30:37 ] Completed replacing temp checkpoint with checkpoint 42.763 ms, 27.14 s total
[ 2023-10-07 20:30:37 ] Completed Epoch: 13 batch 185: moving batch data to device 17.957 ms, 27.15 s total
[ 2023-10-07 20:30:37 ] Completed Epoch: 13 batch 185: forward pass 334.698 ms, 27.49 s total
[ 2023-10-07 20:30:37 ] Completed Epoch: 13 batch 185: backward pass 68.048 ms, 27.56 s total
[ 2023-10-07 20:30:39 ] Completed Epoch: 13 batch 185: computing loss 1,239.720 ms, 28.80 s total
EPOCH: [13], BATCH: [185/889], loss: 0.380, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.126, loss_objectness: 0.019, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 185
[ 2023-10-07 20:30:39 ] Completed saving temp checkpoint 903.262 ms, 29.70 s total
[ 2023-10-07 20:30:40 ] Completed replacing temp checkpoint with checkpoint 58.779 ms, 29.76 s total
[ 2023-10-07 20:30:40 ] Completed Epoch: 13 batch 186: moving batch data to device 21.418 ms, 29.78 s total
[ 2023-10-07 20:30:40 ] Completed Epoch: 13 batch 186: forward pass 326.448 ms, 30.11 s total
[ 2023-10-07 20:30:40 ] Completed Epoch: 13 batch 186: backward pass 74.928 ms, 30.18 s total
[ 2023-10-07 20:30:41 ] Completed Epoch: 13 batch 186: computing loss 1,327.581 ms, 31.51 s total
EPOCH: [13], BATCH: [186/889], loss: 0.363, loss_box_reg: 0.110, loss_classifier: 0.088, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 186
[ 2023-10-07 20:30:42 ] Completed saving temp checkpoint 1,050.988 ms, 32.56 s total
[ 2023-10-07 20:30:42 ] Completed replacing temp checkpoint with checkpoint 65.173 ms, 32.62 s total
[ 2023-10-07 20:30:42 ] Completed Epoch: 13 batch 187: moving batch data to device 24.124 ms, 32.65 s total
[ 2023-10-07 20:30:43 ] Completed Epoch: 13 batch 187: forward pass 326.560 ms, 32.97 s total
[ 2023-10-07 20:30:43 ] Completed Epoch: 13 batch 187: backward pass 42.582 ms, 33.02 s total
[ 2023-10-07 20:30:44 ] Completed Epoch: 13 batch 187: computing loss 1,377.614 ms, 34.39 s total
EPOCH: [13], BATCH: [187/889], loss: 0.379, loss_box_reg: 0.115, loss_classifier: 0.096, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 187
[ 2023-10-07 20:30:46 ] Completed saving temp checkpoint 1,495.497 ms, 35.89 s total
[ 2023-10-07 20:30:46 ] Completed replacing temp checkpoint with checkpoint 73.748 ms, 35.96 s total
[ 2023-10-07 20:30:46 ] Completed Epoch: 13 batch 188: moving batch data to device 22.204 ms, 35.99 s total
[ 2023-10-07 20:30:46 ] Completed Epoch: 13 batch 188: forward pass 302.889 ms, 36.29 s total
[ 2023-10-07 20:30:46 ] Completed Epoch: 13 batch 188: backward pass 80.070 ms, 36.37 s total
[ 2023-10-07 20:30:47 ] Completed Epoch: 13 batch 188: computing loss 1,234.838 ms, 37.60 s total
EPOCH: [13], BATCH: [188/889], loss: 0.432, loss_box_reg: 0.129, loss_classifier: 0.109, loss_mask: 0.143, loss_objectness: 0.018, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 188
[ 2023-10-07 20:30:49 ] Completed saving temp checkpoint 1,713.400 ms, 39.32 s total
[ 2023-10-07 20:30:49 ] Completed replacing temp checkpoint with checkpoint 69.660 ms, 39.39 s total
[ 2023-10-07 20:30:49 ] Completed Epoch: 13 batch 189: moving batch data to device 23.996 ms, 39.41 s total
[ 2023-10-07 20:30:49 ] Completed Epoch: 13 batch 189: forward pass 292.397 ms, 39.70 s total
[ 2023-10-07 20:30:50 ] Completed Epoch: 13 batch 189: backward pass 89.729 ms, 39.79 s total
[ 2023-10-07 20:30:51 ] Completed Epoch: 13 batch 189: computing loss 1,208.920 ms, 41.00 s total
EPOCH: [13], BATCH: [189/889], loss: 0.391, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 189
[ 2023-10-07 20:30:52 ] Completed saving temp checkpoint 1,116.745 ms, 42.12 s total
[ 2023-10-07 20:30:52 ] Completed replacing temp checkpoint with checkpoint 48.592 ms, 42.17 s total
[ 2023-10-07 20:30:52 ] Completed Epoch: 13 batch 190: moving batch data to device 22.751 ms, 42.19 s total
[ 2023-10-07 20:30:52 ] Completed Epoch: 13 batch 190: forward pass 325.562 ms, 42.52 s total
[ 2023-10-07 20:30:52 ] Completed Epoch: 13 batch 190: backward pass 75.002 ms, 42.59 s total
[ 2023-10-07 20:30:54 ] Completed Epoch: 13 batch 190: computing loss 1,248.879 ms, 43.84 s total
EPOCH: [13], BATCH: [190/889], loss: 0.362, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.013, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 190
[ 2023-10-07 20:30:55 ] Completed saving temp checkpoint 1,150.627 ms, 44.99 s total
[ 2023-10-07 20:30:55 ] Completed replacing temp checkpoint with checkpoint 43.850 ms, 45.03 s total
[ 2023-10-07 20:30:55 ] Completed Epoch: 13 batch 191: moving batch data to device 22.360 ms, 45.06 s total
[ 2023-10-07 20:30:55 ] Completed Epoch: 13 batch 191: forward pass 310.296 ms, 45.37 s total
[ 2023-10-07 20:30:55 ] Completed Epoch: 13 batch 191: backward pass 71.113 ms, 45.44 s total
[ 2023-10-07 20:30:57 ] Completed Epoch: 13 batch 191: computing loss 1,687.658 ms, 47.13 s total
EPOCH: [13], BATCH: [191/889], loss: 0.388, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 191
[ 2023-10-07 20:30:59 ] Completed saving temp checkpoint 1,601.125 ms, 48.73 s total
[ 2023-10-07 20:30:59 ] Completed replacing temp checkpoint with checkpoint 88.879 ms, 48.82 s total
[ 2023-10-07 20:30:59 ] Completed Epoch: 13 batch 192: moving batch data to device 20.862 ms, 48.84 s total
[ 2023-10-07 20:30:59 ] Completed Epoch: 13 batch 192: forward pass 291.588 ms, 49.13 s total
[ 2023-10-07 20:30:59 ] Completed Epoch: 13 batch 192: backward pass 94.742 ms, 49.22 s total
[ 2023-10-07 20:31:00 ] Completed Epoch: 13 batch 192: computing loss 1,408.642 ms, 50.63 s total
EPOCH: [13], BATCH: [192/889], loss: 0.375, loss_box_reg: 0.111, loss_classifier: 0.096, loss_mask: 0.124, loss_objectness: 0.014, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 192
[ 2023-10-07 20:31:01 ] Completed saving temp checkpoint 1,016.514 ms, 51.65 s total
[ 2023-10-07 20:31:01 ] Completed replacing temp checkpoint with checkpoint 63.104 ms, 51.71 s total
[ 2023-10-07 20:31:02 ] Completed Epoch: 13 batch 193: moving batch data to device 24.064 ms, 51.74 s total
[ 2023-10-07 20:31:02 ] Completed Epoch: 13 batch 193: forward pass 330.109 ms, 52.07 s total
[ 2023-10-07 20:31:02 ] Completed Epoch: 13 batch 193: backward pass 391.592 ms, 52.46 s total
[ 2023-10-07 20:31:03 ] Completed Epoch: 13 batch 193: computing loss 918.971 ms, 53.38 s total
EPOCH: [13], BATCH: [193/889], loss: 0.404, loss_box_reg: 0.125, loss_classifier: 0.099, loss_mask: 0.138, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 193
[ 2023-10-07 20:31:04 ] Completed saving temp checkpoint 1,005.405 ms, 54.38 s total
[ 2023-10-07 20:31:04 ] Completed replacing temp checkpoint with checkpoint 58.940 ms, 54.44 s total
[ 2023-10-07 20:31:04 ] Completed Epoch: 13 batch 194: moving batch data to device 21.607 ms, 54.46 s total
[ 2023-10-07 20:31:05 ] Completed Epoch: 13 batch 194: forward pass 339.481 ms, 54.80 s total
[ 2023-10-07 20:31:05 ] Completed Epoch: 13 batch 194: backward pass 80.159 ms, 54.88 s total
[ 2023-10-07 20:31:06 ] Completed Epoch: 13 batch 194: computing loss 1,244.043 ms, 56.13 s total
EPOCH: [13], BATCH: [194/889], loss: 0.406, loss_box_reg: 0.117, loss_classifier: 0.107, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 194
[ 2023-10-07 20:31:07 ] Completed saving temp checkpoint 1,189.988 ms, 57.32 s total
[ 2023-10-07 20:31:07 ] Completed replacing temp checkpoint with checkpoint 64.686 ms, 57.38 s total
[ 2023-10-07 20:31:07 ] Completed Epoch: 13 batch 195: moving batch data to device 22.344 ms, 57.40 s total
[ 2023-10-07 20:31:08 ] Completed Epoch: 13 batch 195: forward pass 318.383 ms, 57.72 s total
[ 2023-10-07 20:31:08 ] Completed Epoch: 13 batch 195: backward pass 55.502 ms, 57.78 s total
[ 2023-10-07 20:31:09 ] Completed Epoch: 13 batch 195: computing loss 1,191.752 ms, 58.97 s total
EPOCH: [13], BATCH: [195/889], loss: 0.402, loss_box_reg: 0.124, loss_classifier: 0.107, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 195
[ 2023-10-07 20:31:10 ] Completed saving temp checkpoint 1,045.894 ms, 60.01 s total
[ 2023-10-07 20:31:10 ] Completed replacing temp checkpoint with checkpoint 38.715 ms, 60.05 s total
[ 2023-10-07 20:31:10 ] Completed Epoch: 13 batch 196: moving batch data to device 20.997 ms, 60.07 s total
[ 2023-10-07 20:31:10 ] Completed Epoch: 13 batch 196: forward pass 318.081 ms, 60.39 s total
[ 2023-10-07 20:31:10 ] Completed Epoch: 13 batch 196: backward pass 58.734 ms, 60.45 s total
[ 2023-10-07 20:31:12 ] Completed Epoch: 13 batch 196: computing loss 1,523.792 ms, 61.97 s total
EPOCH: [13], BATCH: [196/889], loss: 0.385, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 196
[ 2023-10-07 20:31:13 ] Completed saving temp checkpoint 1,128.574 ms, 63.10 s total
[ 2023-10-07 20:31:13 ] Completed replacing temp checkpoint with checkpoint 67.397 ms, 63.17 s total
[ 2023-10-07 20:31:13 ] Completed Epoch: 13 batch 197: moving batch data to device 22.168 ms, 63.19 s total
[ 2023-10-07 20:31:13 ] Completed Epoch: 13 batch 197: forward pass 307.550 ms, 63.50 s total
[ 2023-10-07 20:31:13 ] Completed Epoch: 13 batch 197: backward pass 55.182 ms, 63.56 s total
[ 2023-10-07 20:31:15 ] Completed Epoch: 13 batch 197: computing loss 1,501.842 ms, 65.06 s total
EPOCH: [13], BATCH: [197/889], loss: 0.368, loss_box_reg: 0.106, loss_classifier: 0.089, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 197
[ 2023-10-07 20:31:16 ] Completed saving temp checkpoint 1,358.905 ms, 66.42 s total
[ 2023-10-07 20:31:16 ] Completed replacing temp checkpoint with checkpoint 50.451 ms, 66.47 s total
[ 2023-10-07 20:31:16 ] Completed Epoch: 13 batch 198: moving batch data to device 22.912 ms, 66.49 s total
[ 2023-10-07 20:31:17 ] Completed Epoch: 13 batch 198: forward pass 361.130 ms, 66.85 s total
[ 2023-10-07 20:31:17 ] Completed Epoch: 13 batch 198: backward pass 107.381 ms, 66.96 s total
[ 2023-10-07 20:31:18 ] Completed Epoch: 13 batch 198: computing loss 1,314.248 ms, 68.27 s total
EPOCH: [13], BATCH: [198/889], loss: 0.396, loss_box_reg: 0.121, loss_classifier: 0.101, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 198
[ 2023-10-07 20:31:19 ] Completed saving temp checkpoint 989.193 ms, 69.26 s total
[ 2023-10-07 20:31:19 ] Completed replacing temp checkpoint with checkpoint 50.916 ms, 69.31 s total
[ 2023-10-07 20:31:19 ] Completed Epoch: 13 batch 199: moving batch data to device 21.029 ms, 69.33 s total
[ 2023-10-07 20:31:19 ] Completed Epoch: 13 batch 199: forward pass 338.646 ms, 69.67 s total
[ 2023-10-07 20:31:20 ] Completed Epoch: 13 batch 199: backward pass 69.430 ms, 69.74 s total
[ 2023-10-07 20:31:21 ] Completed Epoch: 13 batch 199: computing loss 1,041.847 ms, 70.78 s total
EPOCH: [13], BATCH: [199/889], loss: 0.360, loss_box_reg: 0.106, loss_classifier: 0.088, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 199
[ 2023-10-07 20:31:22 ] Completed saving temp checkpoint 1,154.441 ms, 71.94 s total
[ 2023-10-07 20:31:22 ] Completed replacing temp checkpoint with checkpoint 39.406 ms, 71.98 s total
[ 2023-10-07 20:31:22 ] Completed Epoch: 13 batch 200: moving batch data to device 21.565 ms, 72.00 s total
[ 2023-10-07 20:31:22 ] Completed Epoch: 13 batch 200: forward pass 313.716 ms, 72.31 s total
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 20:53:06 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 20:53:06 ] Completed importing Timer 0.027 ms, 0.00 s total
[ 2023-10-07 20:53:07 ] Completed importing everything else 707.623 ms, 0.71 s total
[ 2023-10-07 20:53:07 ] Completed defined other functions 0.026 ms, 0.71 s total
| distributed init (rank 1): env://
| distributed init (rank 4): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 0): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 20:53:10 ] Completed main preliminaries 2,901.928 ms, 3.61 s total
loading annotations into memory...
Done (t=11.30s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 20:53:23 ] Completed loading data 13,135.575 ms, 16.75 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 20:53:23 ] Completed creating data samplers 100.083 ms, 16.85 s total
[ 2023-10-07 20:53:23 ] Completed creating data loaders 0.231 ms, 16.85 s total
[ 2023-10-07 20:53:24 ] Completed creating model and .to(device) 664.227 ms, 17.51 s total
[ 2023-10-07 20:53:26 ] Completed preparing model for distributed training 2,263.756 ms, 19.77 s total
[ 2023-10-07 20:53:26 ] Completed optimizer and scaler 0.553 ms, 19.77 s total
[ 2023-10-07 20:53:26 ] Completed learning rate schedulers 0.148 ms, 19.77 s total
[ 2023-10-07 20:53:27 ] Completed init coco evaluator 986.679 ms, 20.76 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 20:53:28 ] Completed retrieving checkpoint 815.101 ms, 21.58 s total
EPOCH :: 13
[ 2023-10-07 20:53:28 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 20:53:28 ] Completed training preliminaries 0.880 ms, 0.00 s total
Training / resuming epoch 13 from training step 200
[ 2023-10-07 20:53:28 ] Completed Epoch: 13 batch 200: moving batch data to device 291.686 ms, 0.29 s total
[ 2023-10-07 20:53:34 ] Completed Epoch: 13 batch 200: forward pass 6,033.523 ms, 6.33 s total
[ 2023-10-07 20:53:34 ] Completed Epoch: 13 batch 200: backward pass 145.206 ms, 6.47 s total
[ 2023-10-07 20:53:36 ] Completed Epoch: 13 batch 200: computing loss 1,173.047 ms, 7.64 s total
EPOCH: [13], BATCH: [200/889], loss: 0.415, loss_box_reg: 0.128, loss_classifier: 0.106, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 200
[ 2023-10-07 20:53:39 ] Completed saving temp checkpoint 3,270.844 ms, 10.92 s total
[ 2023-10-07 20:53:39 ] Completed replacing temp checkpoint with checkpoint 153.115 ms, 11.07 s total
[ 2023-10-07 20:53:39 ] Completed Epoch: 13 batch 201: moving batch data to device 2.834 ms, 11.07 s total
[ 2023-10-07 20:53:39 ] Completed Epoch: 13 batch 201: forward pass 431.358 ms, 11.50 s total
[ 2023-10-07 20:53:39 ] Completed Epoch: 13 batch 201: backward pass 63.226 ms, 11.57 s total
[ 2023-10-07 20:53:42 ] Completed Epoch: 13 batch 201: computing loss 2,385.076 ms, 13.95 s total
EPOCH: [13], BATCH: [201/889], loss: 0.350, loss_box_reg: 0.106, loss_classifier: 0.087, loss_mask: 0.122, loss_objectness: 0.012, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 201
[ 2023-10-07 20:53:44 ] Completed saving temp checkpoint 1,796.375 ms, 15.75 s total
[ 2023-10-07 20:53:44 ] Completed replacing temp checkpoint with checkpoint 843.611 ms, 16.59 s total
[ 2023-10-07 20:53:45 ] Completed Epoch: 13 batch 202: moving batch data to device 63.811 ms, 16.65 s total
[ 2023-10-07 20:53:45 ] Completed Epoch: 13 batch 202: forward pass 458.349 ms, 17.11 s total
[ 2023-10-07 20:53:45 ] Completed Epoch: 13 batch 202: backward pass 356.659 ms, 17.47 s total
[ 2023-10-07 20:53:47 ] Completed Epoch: 13 batch 202: computing loss 1,526.293 ms, 19.00 s total
EPOCH: [13], BATCH: [202/889], loss: 0.393, loss_box_reg: 0.122, loss_classifier: 0.097, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 202
[ 2023-10-07 20:53:49 ] Completed saving temp checkpoint 2,271.994 ms, 21.27 s total
[ 2023-10-07 20:53:49 ] Completed replacing temp checkpoint with checkpoint 54.271 ms, 21.32 s total
[ 2023-10-07 20:53:49 ] Completed Epoch: 13 batch 203: moving batch data to device 4.006 ms, 21.33 s total
[ 2023-10-07 20:53:50 ] Completed Epoch: 13 batch 203: forward pass 448.070 ms, 21.77 s total
[ 2023-10-07 20:53:50 ] Completed Epoch: 13 batch 203: backward pass 395.800 ms, 22.17 s total
[ 2023-10-07 20:53:52 ] Completed Epoch: 13 batch 203: computing loss 1,767.707 ms, 23.94 s total
EPOCH: [13], BATCH: [203/889], loss: 0.370, loss_box_reg: 0.115, loss_classifier: 0.089, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 203
[ 2023-10-07 20:53:54 ] Completed saving temp checkpoint 2,079.378 ms, 26.02 s total
[ 2023-10-07 20:53:54 ] Completed replacing temp checkpoint with checkpoint 81.680 ms, 26.10 s total
[ 2023-10-07 20:53:54 ] Completed Epoch: 13 batch 204: moving batch data to device 4.535 ms, 26.10 s total
[ 2023-10-07 20:53:54 ] Completed Epoch: 13 batch 204: forward pass 432.856 ms, 26.54 s total
[ 2023-10-07 20:53:55 ] Completed Epoch: 13 batch 204: backward pass 73.450 ms, 26.61 s total
[ 2023-10-07 20:53:57 ] Completed Epoch: 13 batch 204: computing loss 2,022.677 ms, 28.63 s total
EPOCH: [13], BATCH: [204/889], loss: 0.399, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 204
[ 2023-10-07 20:53:59 ] Completed saving temp checkpoint 2,925.630 ms, 31.56 s total
[ 2023-10-07 20:54:01 ] Completed replacing temp checkpoint with checkpoint 1,262.232 ms, 32.82 s total
[ 2023-10-07 20:54:01 ] Completed Epoch: 13 batch 205: moving batch data to device 3.743 ms, 32.82 s total
[ 2023-10-07 20:54:01 ] Completed Epoch: 13 batch 205: forward pass 444.860 ms, 33.27 s total
[ 2023-10-07 20:54:01 ] Completed Epoch: 13 batch 205: backward pass 79.133 ms, 33.35 s total
[ 2023-10-07 20:54:03 ] Completed Epoch: 13 batch 205: computing loss 2,026.958 ms, 35.37 s total
EPOCH: [13], BATCH: [205/889], loss: 0.387, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 205
[ 2023-10-07 20:54:05 ] Completed saving temp checkpoint 1,500.724 ms, 36.88 s total
[ 2023-10-07 20:54:05 ] Completed replacing temp checkpoint with checkpoint 601.770 ms, 37.48 s total
[ 2023-10-07 20:54:05 ] Completed Epoch: 13 batch 206: moving batch data to device 4.351 ms, 37.48 s total
[ 2023-10-07 20:54:06 ] Completed Epoch: 13 batch 206: forward pass 442.955 ms, 37.92 s total
[ 2023-10-07 20:54:06 ] Completed Epoch: 13 batch 206: backward pass 66.843 ms, 37.99 s total
[ 2023-10-07 20:54:08 ] Completed Epoch: 13 batch 206: computing loss 2,003.969 ms, 40.00 s total
EPOCH: [13], BATCH: [206/889], loss: 0.384, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 206
[ 2023-10-07 20:54:09 ] Completed saving temp checkpoint 1,223.861 ms, 41.22 s total
[ 2023-10-07 20:54:10 ] Completed replacing temp checkpoint with checkpoint 599.129 ms, 41.82 s total
[ 2023-10-07 20:54:10 ] Completed Epoch: 13 batch 207: moving batch data to device 10.081 ms, 41.83 s total
[ 2023-10-07 20:54:10 ] Completed Epoch: 13 batch 207: forward pass 427.359 ms, 42.26 s total
[ 2023-10-07 20:54:10 ] Completed Epoch: 13 batch 207: backward pass 87.673 ms, 42.34 s total
[ 2023-10-07 20:54:12 ] Completed Epoch: 13 batch 207: computing loss 1,680.482 ms, 44.02 s total
EPOCH: [13], BATCH: [207/889], loss: 0.385, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 207
[ 2023-10-07 20:54:14 ] Completed saving temp checkpoint 2,095.066 ms, 46.12 s total
[ 2023-10-07 20:54:14 ] Completed replacing temp checkpoint with checkpoint 39.957 ms, 46.16 s total
[ 2023-10-07 20:54:14 ] Completed Epoch: 13 batch 208: moving batch data to device 9.064 ms, 46.17 s total
[ 2023-10-07 20:54:15 ] Completed Epoch: 13 batch 208: forward pass 457.951 ms, 46.63 s total
[ 2023-10-07 20:54:15 ] Completed Epoch: 13 batch 208: backward pass 43.512 ms, 46.67 s total
[ 2023-10-07 20:54:17 ] Completed Epoch: 13 batch 208: computing loss 2,385.377 ms, 49.05 s total
EPOCH: [13], BATCH: [208/889], loss: 0.406, loss_box_reg: 0.126, loss_classifier: 0.105, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 208
[ 2023-10-07 20:54:19 ] Completed saving temp checkpoint 2,141.603 ms, 51.20 s total
[ 2023-10-07 20:54:20 ] Completed replacing temp checkpoint with checkpoint 1,334.263 ms, 52.53 s total
[ 2023-10-07 20:54:20 ] Completed Epoch: 13 batch 209: moving batch data to device 9.119 ms, 52.54 s total
[ 2023-10-07 20:54:21 ] Completed Epoch: 13 batch 209: forward pass 378.244 ms, 52.92 s total
[ 2023-10-07 20:54:21 ] Completed Epoch: 13 batch 209: backward pass 121.692 ms, 53.04 s total
[ 2023-10-07 20:54:21 ] Completed Epoch: 13 batch 209: computing loss 300.367 ms, 53.34 s total
EPOCH: [13], BATCH: [209/889], loss: 0.393, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 209
[ 2023-10-07 20:54:23 ] Completed saving temp checkpoint 1,595.424 ms, 54.94 s total
[ 2023-10-07 20:54:23 ] Completed replacing temp checkpoint with checkpoint 55.819 ms, 54.99 s total
[ 2023-10-07 20:54:23 ] Completed Epoch: 13 batch 210: moving batch data to device 6.597 ms, 55.00 s total
[ 2023-10-07 20:54:23 ] Completed Epoch: 13 batch 210: forward pass 432.733 ms, 55.43 s total
[ 2023-10-07 20:54:23 ] Completed Epoch: 13 batch 210: backward pass 68.437 ms, 55.50 s total
[ 2023-10-07 20:54:25 ] Completed Epoch: 13 batch 210: computing loss 1,396.526 ms, 56.90 s total
EPOCH: [13], BATCH: [210/889], loss: 0.393, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 210
[ 2023-10-07 20:54:27 ] Completed saving temp checkpoint 1,792.026 ms, 58.69 s total
[ 2023-10-07 20:54:27 ] Completed replacing temp checkpoint with checkpoint 114.719 ms, 58.80 s total
[ 2023-10-07 20:54:27 ] Completed Epoch: 13 batch 211: moving batch data to device 5.653 ms, 58.81 s total
[ 2023-10-07 20:54:27 ] Completed Epoch: 13 batch 211: forward pass 137.900 ms, 58.95 s total
[ 2023-10-07 20:54:27 ] Completed Epoch: 13 batch 211: backward pass 90.613 ms, 59.04 s total
[ 2023-10-07 20:54:27 ] Completed Epoch: 13 batch 211: computing loss 137.834 ms, 59.17 s total
EPOCH: [13], BATCH: [211/889], loss: 0.390, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 211
[ 2023-10-07 20:54:28 ] Completed saving temp checkpoint 1,198.300 ms, 60.37 s total
[ 2023-10-07 20:54:28 ] Completed replacing temp checkpoint with checkpoint 50.485 ms, 60.42 s total
[ 2023-10-07 20:54:28 ] Completed Epoch: 13 batch 212: moving batch data to device 4.321 ms, 60.43 s total
[ 2023-10-07 20:54:28 ] Completed Epoch: 13 batch 212: forward pass 112.222 ms, 60.54 s total
[ 2023-10-07 20:54:29 ] Completed Epoch: 13 batch 212: backward pass 120.474 ms, 60.66 s total
[ 2023-10-07 20:54:29 ] Completed Epoch: 13 batch 212: computing loss 86.290 ms, 60.75 s total
EPOCH: [13], BATCH: [212/889], loss: 0.384, loss_box_reg: 0.115, loss_classifier: 0.091, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 212
[ 2023-10-07 20:54:30 ] Completed saving temp checkpoint 1,270.790 ms, 62.02 s total
[ 2023-10-07 20:54:30 ] Completed replacing temp checkpoint with checkpoint 64.948 ms, 62.08 s total
[ 2023-10-07 20:54:30 ] Completed Epoch: 13 batch 213: moving batch data to device 4.548 ms, 62.09 s total
[ 2023-10-07 20:54:30 ] Completed Epoch: 13 batch 213: forward pass 107.804 ms, 62.19 s total
[ 2023-10-07 20:54:30 ] Completed Epoch: 13 batch 213: backward pass 78.285 ms, 62.27 s total
[ 2023-10-07 20:54:30 ] Completed Epoch: 13 batch 213: computing loss 117.143 ms, 62.39 s total
EPOCH: [13], BATCH: [213/889], loss: 0.396, loss_box_reg: 0.124, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 213
[ 2023-10-07 20:54:31 ] Completed saving temp checkpoint 1,135.718 ms, 63.53 s total
[ 2023-10-07 20:54:31 ] Completed replacing temp checkpoint with checkpoint 54.082 ms, 63.58 s total
[ 2023-10-07 20:54:31 ] Completed Epoch: 13 batch 214: moving batch data to device 6.147 ms, 63.59 s total
[ 2023-10-07 20:54:32 ] Completed Epoch: 13 batch 214: forward pass 112.651 ms, 63.70 s total
[ 2023-10-07 20:54:32 ] Completed Epoch: 13 batch 214: backward pass 78.181 ms, 63.78 s total
[ 2023-10-07 20:54:32 ] Completed Epoch: 13 batch 214: computing loss 116.033 ms, 63.89 s total
EPOCH: [13], BATCH: [214/889], loss: 0.406, loss_box_reg: 0.121, loss_classifier: 0.102, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 214
[ 2023-10-07 20:54:33 ] Completed saving temp checkpoint 1,172.984 ms, 65.07 s total
[ 2023-10-07 20:54:33 ] Completed replacing temp checkpoint with checkpoint 53.118 ms, 65.12 s total
[ 2023-10-07 20:54:33 ] Completed Epoch: 13 batch 215: moving batch data to device 7.978 ms, 65.13 s total
[ 2023-10-07 20:54:33 ] Completed Epoch: 13 batch 215: forward pass 113.263 ms, 65.24 s total
[ 2023-10-07 20:54:33 ] Completed Epoch: 13 batch 215: backward pass 118.975 ms, 65.36 s total
[ 2023-10-07 20:54:33 ] Completed Epoch: 13 batch 215: computing loss 84.587 ms, 65.44 s total
EPOCH: [13], BATCH: [215/889], loss: 0.421, loss_box_reg: 0.127, loss_classifier: 0.103, loss_mask: 0.142, loss_objectness: 0.018, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 215
[ 2023-10-07 20:54:34 ] Completed saving temp checkpoint 1,120.460 ms, 66.56 s total
[ 2023-10-07 20:54:35 ] Completed replacing temp checkpoint with checkpoint 61.744 ms, 66.63 s total
[ 2023-10-07 20:54:35 ] Completed Epoch: 13 batch 216: moving batch data to device 7.009 ms, 66.63 s total
[ 2023-10-07 20:54:35 ] Completed Epoch: 13 batch 216: forward pass 403.194 ms, 67.04 s total
[ 2023-10-07 20:54:35 ] Completed Epoch: 13 batch 216: backward pass 113.931 ms, 67.15 s total
[ 2023-10-07 20:54:36 ] Completed Epoch: 13 batch 216: computing loss 958.208 ms, 68.11 s total
EPOCH: [13], BATCH: [216/889], loss: 0.406, loss_box_reg: 0.121, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 216
[ 2023-10-07 20:54:37 ] Completed saving temp checkpoint 1,165.457 ms, 69.27 s total
[ 2023-10-07 20:54:37 ] Completed replacing temp checkpoint with checkpoint 66.807 ms, 69.34 s total
[ 2023-10-07 20:54:37 ] Completed Epoch: 13 batch 217: moving batch data to device 10.224 ms, 69.35 s total
[ 2023-10-07 20:54:37 ] Completed Epoch: 13 batch 217: forward pass 107.063 ms, 69.46 s total
[ 2023-10-07 20:54:37 ] Completed Epoch: 13 batch 217: backward pass 82.125 ms, 69.54 s total
[ 2023-10-07 20:54:38 ] Completed Epoch: 13 batch 217: computing loss 99.647 ms, 69.64 s total
EPOCH: [13], BATCH: [217/889], loss: 0.404, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.138, loss_objectness: 0.020, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 217
[ 2023-10-07 20:54:39 ] Completed saving temp checkpoint 1,103.708 ms, 70.74 s total
[ 2023-10-07 20:54:39 ] Completed replacing temp checkpoint with checkpoint 59.976 ms, 70.80 s total
[ 2023-10-07 20:54:39 ] Completed Epoch: 13 batch 218: moving batch data to device 5.522 ms, 70.81 s total
[ 2023-10-07 20:54:39 ] Completed Epoch: 13 batch 218: forward pass 115.590 ms, 70.92 s total
[ 2023-10-07 20:54:39 ] Completed Epoch: 13 batch 218: backward pass 96.852 ms, 71.02 s total
[ 2023-10-07 20:54:39 ] Completed Epoch: 13 batch 218: computing loss 132.900 ms, 71.15 s total
EPOCH: [13], BATCH: [218/889], loss: 0.411, loss_box_reg: 0.123, loss_classifier: 0.106, loss_mask: 0.132, loss_objectness: 0.020, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 218
[ 2023-10-07 20:54:40 ] Completed saving temp checkpoint 1,339.643 ms, 72.49 s total
[ 2023-10-07 20:54:40 ] Completed replacing temp checkpoint with checkpoint 62.063 ms, 72.56 s total
[ 2023-10-07 20:54:40 ] Completed Epoch: 13 batch 219: moving batch data to device 23.736 ms, 72.58 s total
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 21:21:31 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 21:21:31 ] Completed importing Timer 0.026 ms, 0.00 s total
[ 2023-10-07 21:21:32 ] Completed importing everything else 510.624 ms, 0.51 s total
[ 2023-10-07 21:21:32 ] Completed defined other functions 0.025 ms, 0.51 s total
| distributed init (rank 4): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
| distributed init (rank 0): env://
| distributed init (rank 3): env://
| distributed init (rank 5): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 21:21:35 ] Completed main preliminaries 3,051.202 ms, 3.56 s total
loading annotations into memory...
Done (t=12.83s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 21:21:50 ] Completed loading data 14,825.410 ms, 18.39 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 21:21:50 ] Completed creating data samplers 107.380 ms, 18.49 s total
[ 2023-10-07 21:21:50 ] Completed creating data loaders 0.240 ms, 18.49 s total
[ 2023-10-07 21:21:50 ] Completed creating model and .to(device) 731.898 ms, 19.23 s total
[ 2023-10-07 21:21:52 ] Completed preparing model for distributed training 1,281.266 ms, 20.51 s total
[ 2023-10-07 21:21:52 ] Completed optimizer and scaler 0.557 ms, 20.51 s total
[ 2023-10-07 21:21:52 ] Completed learning rate schedulers 0.211 ms, 20.51 s total
[ 2023-10-07 21:21:53 ] Completed init coco evaluator 967.945 ms, 21.48 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 21:21:54 ] Completed retrieving checkpoint 974.301 ms, 22.45 s total
EPOCH :: 13
[ 2023-10-07 21:21:54 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 21:21:54 ] Completed training preliminaries 0.876 ms, 0.00 s total
Training / resuming epoch 13 from training step 219
[ 2023-10-07 21:21:54 ] Completed Epoch: 13 batch 219: moving batch data to device 243.872 ms, 0.24 s total
[ 2023-10-07 21:21:59 ] Completed Epoch: 13 batch 219: forward pass 5,249.355 ms, 5.49 s total
[ 2023-10-07 21:21:59 ] Completed Epoch: 13 batch 219: backward pass 266.134 ms, 5.76 s total
[ 2023-10-07 21:22:00 ] Completed Epoch: 13 batch 219: computing loss 1,030.547 ms, 6.79 s total
EPOCH: [13], BATCH: [219/889], loss: 0.361, loss_box_reg: 0.106, loss_classifier: 0.082, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 219
[ 2023-10-07 21:22:02 ] Completed saving temp checkpoint 2,039.547 ms, 8.83 s total
[ 2023-10-07 21:22:03 ] Completed replacing temp checkpoint with checkpoint 160.846 ms, 8.99 s total
[ 2023-10-07 21:22:03 ] Completed Epoch: 13 batch 220: moving batch data to device 63.593 ms, 9.05 s total
[ 2023-10-07 21:22:03 ] Completed Epoch: 13 batch 220: forward pass 342.548 ms, 9.40 s total
[ 2023-10-07 21:22:03 ] Completed Epoch: 13 batch 220: backward pass 412.471 ms, 9.81 s total
[ 2023-10-07 21:22:05 ] Completed Epoch: 13 batch 220: computing loss 1,678.773 ms, 11.49 s total
EPOCH: [13], BATCH: [220/889], loss: 0.377, loss_box_reg: 0.119, loss_classifier: 0.096, loss_mask: 0.124, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 220
[ 2023-10-07 21:22:06 ] Completed saving temp checkpoint 1,401.175 ms, 12.89 s total
[ 2023-10-07 21:22:07 ] Completed replacing temp checkpoint with checkpoint 45.700 ms, 12.94 s total
[ 2023-10-07 21:22:07 ] Completed Epoch: 13 batch 221: moving batch data to device 19.078 ms, 12.95 s total
[ 2023-10-07 21:22:07 ] Completed Epoch: 13 batch 221: forward pass 326.889 ms, 13.28 s total
[ 2023-10-07 21:22:07 ] Completed Epoch: 13 batch 221: backward pass 40.970 ms, 13.32 s total
[ 2023-10-07 21:22:08 ] Completed Epoch: 13 batch 221: computing loss 1,274.695 ms, 14.60 s total
EPOCH: [13], BATCH: [221/889], loss: 0.378, loss_box_reg: 0.109, loss_classifier: 0.099, loss_mask: 0.135, loss_objectness: 0.016, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 221
[ 2023-10-07 21:22:09 ] Completed saving temp checkpoint 1,171.356 ms, 15.77 s total
[ 2023-10-07 21:22:09 ] Completed replacing temp checkpoint with checkpoint 77.042 ms, 15.85 s total
[ 2023-10-07 21:22:09 ] Completed Epoch: 13 batch 222: moving batch data to device 24.459 ms, 15.87 s total
[ 2023-10-07 21:22:10 ] Completed Epoch: 13 batch 222: forward pass 349.117 ms, 16.22 s total
[ 2023-10-07 21:22:10 ] Completed Epoch: 13 batch 222: backward pass 50.014 ms, 16.27 s total
[ 2023-10-07 21:22:11 ] Completed Epoch: 13 batch 222: computing loss 1,275.117 ms, 17.54 s total
EPOCH: [13], BATCH: [222/889], loss: 0.389, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 222
[ 2023-10-07 21:22:13 ] Completed saving temp checkpoint 1,806.208 ms, 19.35 s total
[ 2023-10-07 21:22:13 ] Completed replacing temp checkpoint with checkpoint 61.136 ms, 19.41 s total
[ 2023-10-07 21:22:13 ] Completed Epoch: 13 batch 223: moving batch data to device 3.906 ms, 19.42 s total
[ 2023-10-07 21:22:13 ] Completed Epoch: 13 batch 223: forward pass 441.284 ms, 19.86 s total
[ 2023-10-07 21:22:14 ] Completed Epoch: 13 batch 223: backward pass 376.843 ms, 20.23 s total
[ 2023-10-07 21:22:15 ] Completed Epoch: 13 batch 223: computing loss 1,012.753 ms, 21.25 s total
EPOCH: [13], BATCH: [223/889], loss: 0.404, loss_box_reg: 0.123, loss_classifier: 0.109, loss_mask: 0.131, loss_objectness: 0.018, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 223
[ 2023-10-07 21:22:16 ] Completed saving temp checkpoint 1,175.613 ms, 22.42 s total
[ 2023-10-07 21:22:16 ] Completed replacing temp checkpoint with checkpoint 36.986 ms, 22.46 s total
[ 2023-10-07 21:22:16 ] Completed Epoch: 13 batch 224: moving batch data to device 21.800 ms, 22.48 s total
[ 2023-10-07 21:22:16 ] Completed Epoch: 13 batch 224: forward pass 304.285 ms, 22.78 s total
[ 2023-10-07 21:22:17 ] Completed Epoch: 13 batch 224: backward pass 638.819 ms, 23.42 s total
[ 2023-10-07 21:22:18 ] Completed Epoch: 13 batch 224: computing loss 831.048 ms, 24.25 s total
EPOCH: [13], BATCH: [224/889], loss: 0.403, loss_box_reg: 0.122, loss_classifier: 0.106, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 224
[ 2023-10-07 21:22:19 ] Completed saving temp checkpoint 1,075.172 ms, 25.33 s total
[ 2023-10-07 21:22:19 ] Completed replacing temp checkpoint with checkpoint 41.491 ms, 25.37 s total
[ 2023-10-07 21:22:19 ] Completed Epoch: 13 batch 225: moving batch data to device 22.336 ms, 25.39 s total
[ 2023-10-07 21:22:19 ] Completed Epoch: 13 batch 225: forward pass 324.066 ms, 25.72 s total
[ 2023-10-07 21:22:19 ] Completed Epoch: 13 batch 225: backward pass 39.983 ms, 25.76 s total
[ 2023-10-07 21:22:21 ] Completed Epoch: 13 batch 225: computing loss 1,470.817 ms, 27.23 s total
EPOCH: [13], BATCH: [225/889], loss: 0.389, loss_box_reg: 0.116, loss_classifier: 0.098, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 225
[ 2023-10-07 21:22:22 ] Completed saving temp checkpoint 1,114.838 ms, 28.34 s total
[ 2023-10-07 21:22:22 ] Completed replacing temp checkpoint with checkpoint 67.471 ms, 28.41 s total
[ 2023-10-07 21:22:22 ] Completed Epoch: 13 batch 226: moving batch data to device 24.899 ms, 28.44 s total
[ 2023-10-07 21:22:22 ] Completed Epoch: 13 batch 226: forward pass 337.503 ms, 28.77 s total
[ 2023-10-07 21:22:22 ] Completed Epoch: 13 batch 226: backward pass 34.536 ms, 28.81 s total
[ 2023-10-07 21:22:24 ] Completed Epoch: 13 batch 226: computing loss 1,211.138 ms, 30.02 s total
EPOCH: [13], BATCH: [226/889], loss: 0.422, loss_box_reg: 0.130, loss_classifier: 0.109, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 226
[ 2023-10-07 21:22:25 ] Completed saving temp checkpoint 1,183.239 ms, 31.20 s total
[ 2023-10-07 21:22:25 ] Completed replacing temp checkpoint with checkpoint 67.894 ms, 31.27 s total
[ 2023-10-07 21:22:25 ] Completed Epoch: 13 batch 227: moving batch data to device 20.568 ms, 31.29 s total
[ 2023-10-07 21:22:25 ] Completed Epoch: 13 batch 227: forward pass 338.908 ms, 31.63 s total
[ 2023-10-07 21:22:25 ] Completed Epoch: 13 batch 227: backward pass 82.642 ms, 31.71 s total
[ 2023-10-07 21:22:27 ] Completed Epoch: 13 batch 227: computing loss 1,236.324 ms, 32.95 s total
EPOCH: [13], BATCH: [227/889], loss: 0.405, loss_box_reg: 0.127, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 227
[ 2023-10-07 21:22:28 ] Completed saving temp checkpoint 1,170.192 ms, 34.12 s total
[ 2023-10-07 21:22:28 ] Completed replacing temp checkpoint with checkpoint 44.143 ms, 34.16 s total
[ 2023-10-07 21:22:28 ] Completed Epoch: 13 batch 228: moving batch data to device 21.484 ms, 34.18 s total
[ 2023-10-07 21:22:28 ] Completed Epoch: 13 batch 228: forward pass 336.720 ms, 34.52 s total
[ 2023-10-07 21:22:29 ] Completed Epoch: 13 batch 228: backward pass 396.681 ms, 34.92 s total
[ 2023-10-07 21:22:29 ] Completed Epoch: 13 batch 228: computing loss 853.897 ms, 35.77 s total
EPOCH: [13], BATCH: [228/889], loss: 0.397, loss_box_reg: 0.122, loss_classifier: 0.110, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 228
[ 2023-10-07 21:22:30 ] Completed saving temp checkpoint 902.539 ms, 36.67 s total
[ 2023-10-07 21:22:30 ] Completed replacing temp checkpoint with checkpoint 71.010 ms, 36.75 s total
[ 2023-10-07 21:22:30 ] Completed Epoch: 13 batch 229: moving batch data to device 24.275 ms, 36.77 s total
[ 2023-10-07 21:22:31 ] Completed Epoch: 13 batch 229: forward pass 348.430 ms, 37.12 s total
[ 2023-10-07 21:22:31 ] Completed Epoch: 13 batch 229: backward pass 126.232 ms, 37.24 s total
[ 2023-10-07 21:22:32 ] Completed Epoch: 13 batch 229: computing loss 1,565.254 ms, 38.81 s total
EPOCH: [13], BATCH: [229/889], loss: 0.355, loss_box_reg: 0.105, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 229
[ 2023-10-07 21:22:34 ] Completed saving temp checkpoint 1,458.709 ms, 40.27 s total
[ 2023-10-07 21:22:34 ] Completed replacing temp checkpoint with checkpoint 53.474 ms, 40.32 s total
[ 2023-10-07 21:22:34 ] Completed Epoch: 13 batch 230: moving batch data to device 21.517 ms, 40.34 s total
[ 2023-10-07 21:22:34 ] Completed Epoch: 13 batch 230: forward pass 317.831 ms, 40.66 s total
[ 2023-10-07 21:22:34 ] Completed Epoch: 13 batch 230: backward pass 50.135 ms, 40.71 s total
[ 2023-10-07 21:22:36 ] Completed Epoch: 13 batch 230: computing loss 1,932.606 ms, 42.64 s total
EPOCH: [13], BATCH: [230/889], loss: 0.379, loss_box_reg: 0.112, loss_classifier: 0.089, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 230
[ 2023-10-07 21:22:38 ] Completed saving temp checkpoint 1,453.577 ms, 44.10 s total
[ 2023-10-07 21:22:38 ] Completed replacing temp checkpoint with checkpoint 56.552 ms, 44.15 s total
[ 2023-10-07 21:22:38 ] Completed Epoch: 13 batch 231: moving batch data to device 23.541 ms, 44.18 s total
[ 2023-10-07 21:22:38 ] Completed Epoch: 13 batch 231: forward pass 329.859 ms, 44.51 s total
[ 2023-10-07 21:22:38 ] Completed Epoch: 13 batch 231: backward pass 73.592 ms, 44.58 s total
[ 2023-10-07 21:22:39 ] Completed Epoch: 13 batch 231: computing loss 1,238.040 ms, 45.82 s total
EPOCH: [13], BATCH: [231/889], loss: 0.370, loss_box_reg: 0.108, loss_classifier: 0.097, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 231
[ 2023-10-07 21:22:40 ] Completed saving temp checkpoint 1,086.622 ms, 46.91 s total
[ 2023-10-07 21:22:41 ] Completed replacing temp checkpoint with checkpoint 55.825 ms, 46.96 s total
[ 2023-10-07 21:22:41 ] Completed Epoch: 13 batch 232: moving batch data to device 23.577 ms, 46.98 s total
[ 2023-10-07 21:22:41 ] Completed Epoch: 13 batch 232: forward pass 339.513 ms, 47.32 s total
[ 2023-10-07 21:22:41 ] Completed Epoch: 13 batch 232: backward pass 90.031 ms, 47.41 s total
[ 2023-10-07 21:22:42 ] Completed Epoch: 13 batch 232: computing loss 1,225.779 ms, 48.64 s total
EPOCH: [13], BATCH: [232/889], loss: 0.372, loss_box_reg: 0.106, loss_classifier: 0.099, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 232
[ 2023-10-07 21:22:44 ] Completed saving temp checkpoint 1,457.633 ms, 50.10 s total
[ 2023-10-07 21:22:44 ] Completed replacing temp checkpoint with checkpoint 43.304 ms, 50.14 s total
[ 2023-10-07 21:22:44 ] Completed Epoch: 13 batch 233: moving batch data to device 21.574 ms, 50.16 s total
[ 2023-10-07 21:22:44 ] Completed Epoch: 13 batch 233: forward pass 330.986 ms, 50.49 s total
[ 2023-10-07 21:22:44 ] Completed Epoch: 13 batch 233: backward pass 71.645 ms, 50.57 s total
[ 2023-10-07 21:22:46 ] Completed Epoch: 13 batch 233: computing loss 1,834.122 ms, 52.40 s total
EPOCH: [13], BATCH: [233/889], loss: 0.382, loss_box_reg: 0.112, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 233
[ 2023-10-07 21:22:48 ] Completed saving temp checkpoint 1,799.946 ms, 54.20 s total
[ 2023-10-07 21:22:48 ] Completed replacing temp checkpoint with checkpoint 59.325 ms, 54.26 s total
[ 2023-10-07 21:22:48 ] Completed Epoch: 13 batch 234: moving batch data to device 23.381 ms, 54.28 s total
[ 2023-10-07 21:22:48 ] Completed Epoch: 13 batch 234: forward pass 405.964 ms, 54.69 s total
[ 2023-10-07 21:22:48 ] Completed Epoch: 13 batch 234: backward pass 41.439 ms, 54.73 s total
[ 2023-10-07 21:22:49 ] Completed Epoch: 13 batch 234: computing loss 1,020.559 ms, 55.75 s total
EPOCH: [13], BATCH: [234/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 234
[ 2023-10-07 21:22:50 ] Completed saving temp checkpoint 1,025.129 ms, 56.78 s total
[ 2023-10-07 21:22:50 ] Completed replacing temp checkpoint with checkpoint 58.446 ms, 56.83 s total
[ 2023-10-07 21:22:50 ] Completed Epoch: 13 batch 235: moving batch data to device 22.947 ms, 56.86 s total
[ 2023-10-07 21:22:51 ] Completed Epoch: 13 batch 235: forward pass 325.226 ms, 57.18 s total
[ 2023-10-07 21:22:51 ] Completed Epoch: 13 batch 235: backward pass 49.066 ms, 57.23 s total
[ 2023-10-07 21:22:52 ] Completed Epoch: 13 batch 235: computing loss 1,449.461 ms, 58.68 s total
EPOCH: [13], BATCH: [235/889], loss: 0.402, loss_box_reg: 0.117, loss_classifier: 0.109, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 235
[ 2023-10-07 21:22:53 ] Completed saving temp checkpoint 1,067.417 ms, 59.75 s total
[ 2023-10-07 21:22:53 ] Completed replacing temp checkpoint with checkpoint 55.752 ms, 59.80 s total
[ 2023-10-07 21:22:53 ] Completed Epoch: 13 batch 236: moving batch data to device 22.620 ms, 59.83 s total
[ 2023-10-07 21:22:54 ] Completed Epoch: 13 batch 236: forward pass 331.063 ms, 60.16 s total
[ 2023-10-07 21:22:54 ] Completed Epoch: 13 batch 236: backward pass 85.726 ms, 60.24 s total
[ 2023-10-07 21:22:55 ] Completed Epoch: 13 batch 236: computing loss 1,595.707 ms, 61.84 s total
EPOCH: [13], BATCH: [236/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.103, loss_mask: 0.131, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 236
[ 2023-10-07 21:22:57 ] Completed saving temp checkpoint 1,369.757 ms, 63.21 s total
[ 2023-10-07 21:22:57 ] Completed replacing temp checkpoint with checkpoint 66.424 ms, 63.27 s total
[ 2023-10-07 21:22:57 ] Completed Epoch: 13 batch 237: moving batch data to device 25.064 ms, 63.30 s total
[ 2023-10-07 21:22:57 ] Completed Epoch: 13 batch 237: forward pass 311.921 ms, 63.61 s total
[ 2023-10-07 21:22:57 ] Completed Epoch: 13 batch 237: backward pass 41.054 ms, 63.65 s total
[ 2023-10-07 21:22:59 ] Completed Epoch: 13 batch 237: computing loss 1,385.378 ms, 65.04 s total
EPOCH: [13], BATCH: [237/889], loss: 0.409, loss_box_reg: 0.120, loss_classifier: 0.115, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 237
[ 2023-10-07 21:23:00 ] Completed saving temp checkpoint 1,781.589 ms, 66.82 s total
[ 2023-10-07 21:23:00 ] Completed replacing temp checkpoint with checkpoint 59.599 ms, 66.88 s total
[ 2023-10-07 21:23:00 ] Completed Epoch: 13 batch 238: moving batch data to device 6.168 ms, 66.89 s total
[ 2023-10-07 21:23:01 ] Completed Epoch: 13 batch 238: forward pass 447.173 ms, 67.33 s total
[ 2023-10-07 21:23:01 ] Completed Epoch: 13 batch 238: backward pass 67.778 ms, 67.40 s total
[ 2023-10-07 21:23:02 ] Completed Epoch: 13 batch 238: computing loss 857.329 ms, 68.26 s total
EPOCH: [13], BATCH: [238/889], loss: 0.406, loss_box_reg: 0.118, loss_classifier: 0.106, loss_mask: 0.135, loss_objectness: 0.019, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 238
[ 2023-10-07 21:23:03 ] Completed saving temp checkpoint 1,210.376 ms, 69.47 s total
[ 2023-10-07 21:23:03 ] Completed replacing temp checkpoint with checkpoint 72.455 ms, 69.54 s total
[ 2023-10-07 21:23:03 ] Completed Epoch: 13 batch 239: moving batch data to device 21.724 ms, 69.56 s total
[ 2023-10-07 21:23:03 ] Completed Epoch: 13 batch 239: forward pass 322.209 ms, 69.88 s total
[ 2023-10-07 21:23:04 ] Completed Epoch: 13 batch 239: backward pass 67.942 ms, 69.95 s total
[ 2023-10-07 21:23:05 ] Completed Epoch: 13 batch 239: computing loss 1,338.141 ms, 71.29 s total
EPOCH: [13], BATCH: [239/889], loss: 0.386, loss_box_reg: 0.116, loss_classifier: 0.103, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 239
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 21:42:05 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 21:42:05 ] Completed importing Timer 0.022 ms, 0.00 s total
[ 2023-10-07 21:42:05 ] Completed importing everything else 715.647 ms, 0.72 s total
[ 2023-10-07 21:42:05 ] Completed defined other functions 0.022 ms, 0.72 s total
| distributed init (rank 2): env://
| distributed init (rank 4): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 5): env://
| distributed init (rank 3): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 21:42:13 ] Completed main preliminaries 7,812.464 ms, 8.53 s total
loading annotations into memory...
Done (t=11.64s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 21:42:27 ] Completed loading data 13,569.845 ms, 22.10 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 21:42:27 ] Completed creating data samplers 108.759 ms, 22.21 s total
[ 2023-10-07 21:42:27 ] Completed creating data loaders 0.222 ms, 22.21 s total
[ 2023-10-07 21:42:28 ] Completed creating model and .to(device) 667.254 ms, 22.87 s total
[ 2023-10-07 21:42:30 ] Completed preparing model for distributed training 2,210.429 ms, 25.08 s total
[ 2023-10-07 21:42:30 ] Completed optimizer and scaler 0.542 ms, 25.09 s total
[ 2023-10-07 21:42:30 ] Completed learning rate schedulers 0.132 ms, 25.09 s total
[ 2023-10-07 21:42:31 ] Completed init coco evaluator 977.855 ms, 26.06 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 21:42:32 ] Completed retrieving checkpoint 890.640 ms, 26.95 s total
EPOCH :: 13
[ 2023-10-07 21:42:32 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 21:42:32 ] Completed training preliminaries 0.861 ms, 0.00 s total
Training / resuming epoch 13 from training step 239
[ 2023-10-07 21:42:32 ] Completed Epoch: 13 batch 239: moving batch data to device 262.362 ms, 0.26 s total
[ 2023-10-07 21:42:37 ] Completed Epoch: 13 batch 239: forward pass 5,453.160 ms, 5.72 s total
[ 2023-10-07 21:42:38 ] Completed Epoch: 13 batch 239: backward pass 218.153 ms, 5.93 s total
[ 2023-10-07 21:42:39 ] Completed Epoch: 13 batch 239: computing loss 1,031.126 ms, 6.97 s total
EPOCH: [13], BATCH: [239/889], loss: 0.386, loss_box_reg: 0.116, loss_classifier: 0.102, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 239
[ 2023-10-07 21:42:41 ] Completed saving temp checkpoint 2,624.210 ms, 9.59 s total
[ 2023-10-07 21:42:41 ] Completed replacing temp checkpoint with checkpoint 182.548 ms, 9.77 s total
[ 2023-10-07 21:42:42 ] Completed Epoch: 13 batch 240: moving batch data to device 61.757 ms, 9.83 s total
[ 2023-10-07 21:42:42 ] Completed Epoch: 13 batch 240: forward pass 377.813 ms, 10.21 s total
[ 2023-10-07 21:42:42 ] Completed Epoch: 13 batch 240: backward pass 72.774 ms, 10.28 s total
[ 2023-10-07 21:42:44 ] Completed Epoch: 13 batch 240: computing loss 1,736.049 ms, 12.02 s total
EPOCH: [13], BATCH: [240/889], loss: 0.362, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 240
[ 2023-10-07 21:42:46 ] Completed saving temp checkpoint 2,136.967 ms, 14.16 s total
[ 2023-10-07 21:42:46 ] Completed replacing temp checkpoint with checkpoint 98.776 ms, 14.26 s total
[ 2023-10-07 21:42:46 ] Completed Epoch: 13 batch 241: moving batch data to device 19.954 ms, 14.28 s total
[ 2023-10-07 21:42:46 ] Completed Epoch: 13 batch 241: forward pass 325.517 ms, 14.60 s total
[ 2023-10-07 21:42:47 ] Completed Epoch: 13 batch 241: backward pass 365.055 ms, 14.97 s total
[ 2023-10-07 21:42:48 ] Completed Epoch: 13 batch 241: computing loss 1,432.836 ms, 16.40 s total
EPOCH: [13], BATCH: [241/889], loss: 0.410, loss_box_reg: 0.123, loss_classifier: 0.103, loss_mask: 0.135, loss_objectness: 0.018, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 241
[ 2023-10-07 21:42:50 ] Completed saving temp checkpoint 1,773.419 ms, 18.17 s total
[ 2023-10-07 21:42:50 ] Completed replacing temp checkpoint with checkpoint 38.498 ms, 18.21 s total
[ 2023-10-07 21:42:50 ] Completed Epoch: 13 batch 242: moving batch data to device 19.921 ms, 18.23 s total
[ 2023-10-07 21:42:50 ] Completed Epoch: 13 batch 242: forward pass 378.703 ms, 18.61 s total
[ 2023-10-07 21:42:50 ] Completed Epoch: 13 batch 242: backward pass 85.965 ms, 18.70 s total
[ 2023-10-07 21:42:52 ] Completed Epoch: 13 batch 242: computing loss 1,625.480 ms, 20.32 s total
EPOCH: [13], BATCH: [242/889], loss: 0.378, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 242
[ 2023-10-07 21:42:54 ] Completed saving temp checkpoint 2,089.049 ms, 22.41 s total
[ 2023-10-07 21:42:54 ] Completed replacing temp checkpoint with checkpoint 65.335 ms, 22.48 s total
[ 2023-10-07 21:42:54 ] Completed Epoch: 13 batch 243: moving batch data to device 20.021 ms, 22.50 s total
[ 2023-10-07 21:42:55 ] Completed Epoch: 13 batch 243: forward pass 399.342 ms, 22.90 s total
[ 2023-10-07 21:42:55 ] Completed Epoch: 13 batch 243: backward pass 66.536 ms, 22.96 s total
[ 2023-10-07 21:42:57 ] Completed Epoch: 13 batch 243: computing loss 1,854.960 ms, 24.82 s total
EPOCH: [13], BATCH: [243/889], loss: 0.413, loss_box_reg: 0.129, loss_classifier: 0.110, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 243
[ 2023-10-07 21:42:59 ] Completed saving temp checkpoint 1,999.818 ms, 26.82 s total
[ 2023-10-07 21:42:59 ] Completed replacing temp checkpoint with checkpoint 81.612 ms, 26.90 s total
[ 2023-10-07 21:42:59 ] Completed Epoch: 13 batch 244: moving batch data to device 24.608 ms, 26.92 s total
[ 2023-10-07 21:42:59 ] Completed Epoch: 13 batch 244: forward pass 317.442 ms, 27.24 s total
[ 2023-10-07 21:42:59 ] Completed Epoch: 13 batch 244: backward pass 185.203 ms, 27.43 s total
[ 2023-10-07 21:43:01 ] Completed Epoch: 13 batch 244: computing loss 1,756.140 ms, 29.18 s total
EPOCH: [13], BATCH: [244/889], loss: 0.422, loss_box_reg: 0.131, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.035
Saving checkpoint at epoch 13 train batch 244
[ 2023-10-07 21:43:02 ] Completed saving temp checkpoint 1,431.293 ms, 30.61 s total
[ 2023-10-07 21:43:02 ] Completed replacing temp checkpoint with checkpoint 53.942 ms, 30.67 s total
[ 2023-10-07 21:43:02 ] Completed Epoch: 13 batch 245: moving batch data to device 19.163 ms, 30.69 s total
[ 2023-10-07 21:43:03 ] Completed Epoch: 13 batch 245: forward pass 325.542 ms, 31.01 s total
[ 2023-10-07 21:43:03 ] Completed Epoch: 13 batch 245: backward pass 113.673 ms, 31.13 s total
[ 2023-10-07 21:43:05 ] Completed Epoch: 13 batch 245: computing loss 2,261.558 ms, 33.39 s total
EPOCH: [13], BATCH: [245/889], loss: 0.403, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 245
[ 2023-10-07 21:43:07 ] Completed saving temp checkpoint 1,857.590 ms, 35.24 s total
[ 2023-10-07 21:43:07 ] Completed replacing temp checkpoint with checkpoint 62.223 ms, 35.31 s total
[ 2023-10-07 21:43:07 ] Completed Epoch: 13 batch 246: moving batch data to device 21.174 ms, 35.33 s total
[ 2023-10-07 21:43:07 ] Completed Epoch: 13 batch 246: forward pass 335.024 ms, 35.66 s total
[ 2023-10-07 21:43:07 ] Completed Epoch: 13 batch 246: backward pass 75.994 ms, 35.74 s total
[ 2023-10-07 21:43:09 ] Completed Epoch: 13 batch 246: computing loss 1,774.734 ms, 37.51 s total
EPOCH: [13], BATCH: [246/889], loss: 0.395, loss_box_reg: 0.118, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 246
[ 2023-10-07 21:43:11 ] Completed saving temp checkpoint 1,970.926 ms, 39.48 s total
[ 2023-10-07 21:43:11 ] Completed replacing temp checkpoint with checkpoint 57.848 ms, 39.54 s total
[ 2023-10-07 21:43:11 ] Completed Epoch: 13 batch 247: moving batch data to device 2.924 ms, 39.55 s total
[ 2023-10-07 21:43:12 ] Completed Epoch: 13 batch 247: forward pass 445.254 ms, 39.99 s total
[ 2023-10-07 21:43:12 ] Completed Epoch: 13 batch 247: backward pass 65.274 ms, 40.06 s total
[ 2023-10-07 21:43:13 ] Completed Epoch: 13 batch 247: computing loss 1,226.314 ms, 41.28 s total
EPOCH: [13], BATCH: [247/889], loss: 0.419, loss_box_reg: 0.124, loss_classifier: 0.109, loss_mask: 0.137, loss_objectness: 0.016, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 247
[ 2023-10-07 21:43:14 ] Completed saving temp checkpoint 1,399.100 ms, 42.68 s total
[ 2023-10-07 21:43:14 ] Completed replacing temp checkpoint with checkpoint 35.921 ms, 42.72 s total
[ 2023-10-07 21:43:14 ] Completed Epoch: 13 batch 248: moving batch data to device 20.987 ms, 42.74 s total
[ 2023-10-07 21:43:15 ] Completed Epoch: 13 batch 248: forward pass 327.202 ms, 43.07 s total
[ 2023-10-07 21:43:15 ] Completed Epoch: 13 batch 248: backward pass 357.232 ms, 43.42 s total
[ 2023-10-07 21:43:17 ] Completed Epoch: 13 batch 248: computing loss 1,445.853 ms, 44.87 s total
EPOCH: [13], BATCH: [248/889], loss: 0.378, loss_box_reg: 0.108, loss_classifier: 0.095, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 248
[ 2023-10-07 21:43:19 ] Completed saving temp checkpoint 1,989.372 ms, 46.86 s total
[ 2023-10-07 21:43:19 ] Completed replacing temp checkpoint with checkpoint 75.888 ms, 46.93 s total
[ 2023-10-07 21:43:19 ] Completed Epoch: 13 batch 249: moving batch data to device 21.782 ms, 46.96 s total
[ 2023-10-07 21:43:19 ] Completed Epoch: 13 batch 249: forward pass 310.365 ms, 47.27 s total
[ 2023-10-07 21:43:19 ] Completed Epoch: 13 batch 249: backward pass 45.444 ms, 47.31 s total
[ 2023-10-07 21:43:21 ] Completed Epoch: 13 batch 249: computing loss 2,213.336 ms, 49.52 s total
EPOCH: [13], BATCH: [249/889], loss: 0.384, loss_box_reg: 0.113, loss_classifier: 0.098, loss_mask: 0.134, loss_objectness: 0.013, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 249
[ 2023-10-07 21:43:25 ] Completed saving temp checkpoint 3,907.823 ms, 53.43 s total
[ 2023-10-07 21:43:25 ] Completed replacing temp checkpoint with checkpoint 100.264 ms, 53.53 s total
[ 2023-10-07 21:43:25 ] Completed Epoch: 13 batch 250: moving batch data to device 6.145 ms, 53.54 s total
[ 2023-10-07 21:43:26 ] Completed Epoch: 13 batch 250: forward pass 434.978 ms, 53.97 s total
[ 2023-10-07 21:43:26 ] Completed Epoch: 13 batch 250: backward pass 77.360 ms, 54.05 s total
[ 2023-10-07 21:43:28 ] Completed Epoch: 13 batch 250: computing loss 1,765.851 ms, 55.82 s total
EPOCH: [13], BATCH: [250/889], loss: 0.395, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 250
[ 2023-10-07 21:43:33 ] Completed saving temp checkpoint 5,878.672 ms, 61.70 s total
[ 2023-10-07 21:43:33 ] Completed replacing temp checkpoint with checkpoint 68.456 ms, 61.76 s total
[ 2023-10-07 21:43:33 ] Completed Epoch: 13 batch 251: moving batch data to device 8.696 ms, 61.77 s total
[ 2023-10-07 21:43:34 ] Completed Epoch: 13 batch 251: forward pass 421.775 ms, 62.19 s total
[ 2023-10-07 21:43:34 ] Completed Epoch: 13 batch 251: backward pass 75.791 ms, 62.27 s total
[ 2023-10-07 21:43:36 ] Completed Epoch: 13 batch 251: computing loss 2,340.480 ms, 64.61 s total
EPOCH: [13], BATCH: [251/889], loss: 0.405, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 251
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 22:01:27 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 22:01:27 ] Completed importing Timer 0.023 ms, 0.00 s total
[ 2023-10-07 22:01:28 ] Completed importing everything else 551.466 ms, 0.55 s total
[ 2023-10-07 22:01:28 ] Completed defined other functions 0.025 ms, 0.55 s total
| distributed init (rank 5): env://
| distributed init (rank 3): env://
| distributed init (rank 2): env://
| distributed init (rank 1): env://
| distributed init (rank 4): env://
| distributed init (rank 0): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 22:01:36 ] Completed main preliminaries 8,221.088 ms, 8.77 s total
loading annotations into memory...
Done (t=11.85s)
creating index...
index created!
loading annotations into memory...
Done (t=0.29s)
creating index...
index created!
[ 2023-10-07 22:01:50 ] Completed loading data 14,007.493 ms, 22.78 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 22:01:50 ] Completed creating data samplers 128.170 ms, 22.91 s total
[ 2023-10-07 22:01:50 ] Completed creating data loaders 0.241 ms, 22.91 s total
[ 2023-10-07 22:01:51 ] Completed creating model and .to(device) 679.890 ms, 23.59 s total
[ 2023-10-07 22:01:53 ] Completed preparing model for distributed training 1,982.271 ms, 25.57 s total
[ 2023-10-07 22:01:53 ] Completed optimizer and scaler 0.611 ms, 25.57 s total
[ 2023-10-07 22:01:53 ] Completed learning rate schedulers 0.161 ms, 25.57 s total
[ 2023-10-07 22:01:54 ] Completed init coco evaluator 1,045.979 ms, 26.62 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 22:01:55 ] Completed retrieving checkpoint 807.166 ms, 27.42 s total
EPOCH :: 13
[ 2023-10-07 22:01:55 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 22:01:55 ] Completed training preliminaries 0.870 ms, 0.00 s total
Training / resuming epoch 13 from training step 251
[ 2023-10-07 22:01:55 ] Completed Epoch: 13 batch 251: moving batch data to device 369.538 ms, 0.37 s total
[ 2023-10-07 22:02:01 ] Completed Epoch: 13 batch 251: forward pass 5,466.479 ms, 5.84 s total
[ 2023-10-07 22:02:01 ] Completed Epoch: 13 batch 251: backward pass 265.855 ms, 6.10 s total
[ 2023-10-07 22:02:02 ] Completed Epoch: 13 batch 251: computing loss 992.731 ms, 7.10 s total
EPOCH: [13], BATCH: [251/889], loss: 0.406, loss_box_reg: 0.124, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 251
[ 2023-10-07 22:02:04 ] Completed saving temp checkpoint 1,786.217 ms, 8.88 s total
[ 2023-10-07 22:02:04 ] Completed replacing temp checkpoint with checkpoint 166.190 ms, 9.05 s total
[ 2023-10-07 22:02:04 ] Completed Epoch: 13 batch 252: moving batch data to device 19.190 ms, 9.07 s total
[ 2023-10-07 22:02:04 ] Completed Epoch: 13 batch 252: forward pass 321.590 ms, 9.39 s total
[ 2023-10-07 22:02:05 ] Completed Epoch: 13 batch 252: backward pass 387.156 ms, 9.78 s total
[ 2023-10-07 22:02:06 ] Completed Epoch: 13 batch 252: computing loss 1,541.009 ms, 11.32 s total
EPOCH: [13], BATCH: [252/889], loss: 0.392, loss_box_reg: 0.115, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 252
[ 2023-10-07 22:02:08 ] Completed saving temp checkpoint 1,723.546 ms, 13.04 s total
[ 2023-10-07 22:02:08 ] Completed replacing temp checkpoint with checkpoint 52.678 ms, 13.09 s total
[ 2023-10-07 22:02:08 ] Completed Epoch: 13 batch 253: moving batch data to device 18.802 ms, 13.11 s total
[ 2023-10-07 22:02:08 ] Completed Epoch: 13 batch 253: forward pass 317.156 ms, 13.43 s total
[ 2023-10-07 22:02:08 ] Completed Epoch: 13 batch 253: backward pass 78.653 ms, 13.51 s total
[ 2023-10-07 22:02:10 ] Completed Epoch: 13 batch 253: computing loss 1,946.918 ms, 15.45 s total
EPOCH: [13], BATCH: [253/889], loss: 0.402, loss_box_reg: 0.120, loss_classifier: 0.107, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 253
[ 2023-10-07 22:02:12 ] Completed saving temp checkpoint 1,527.802 ms, 16.98 s total
[ 2023-10-07 22:02:12 ] Completed replacing temp checkpoint with checkpoint 44.723 ms, 17.03 s total
[ 2023-10-07 22:02:12 ] Completed Epoch: 13 batch 254: moving batch data to device 9.931 ms, 17.04 s total
[ 2023-10-07 22:02:12 ] Completed Epoch: 13 batch 254: forward pass 380.784 ms, 17.42 s total
[ 2023-10-07 22:02:12 ] Completed Epoch: 13 batch 254: backward pass 70.222 ms, 17.49 s total
[ 2023-10-07 22:02:13 ] Completed Epoch: 13 batch 254: computing loss 1,239.490 ms, 18.73 s total
EPOCH: [13], BATCH: [254/889], loss: 0.406, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.036
Saving checkpoint at epoch 13 train batch 254
[ 2023-10-07 22:02:15 ] Completed saving temp checkpoint 1,230.818 ms, 19.96 s total
[ 2023-10-07 22:02:15 ] Completed replacing temp checkpoint with checkpoint 48.286 ms, 20.01 s total
[ 2023-10-07 22:02:15 ] Completed Epoch: 13 batch 255: moving batch data to device 27.356 ms, 20.03 s total
[ 2023-10-07 22:02:15 ] Completed Epoch: 13 batch 255: forward pass 314.359 ms, 20.35 s total
[ 2023-10-07 22:02:15 ] Completed Epoch: 13 batch 255: backward pass 71.528 ms, 20.42 s total
[ 2023-10-07 22:02:17 ] Completed Epoch: 13 batch 255: computing loss 1,848.888 ms, 22.27 s total
EPOCH: [13], BATCH: [255/889], loss: 0.380, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.131, loss_objectness: 0.018, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 255
[ 2023-10-07 22:02:19 ] Completed saving temp checkpoint 1,828.894 ms, 24.10 s total
[ 2023-10-07 22:02:19 ] Completed replacing temp checkpoint with checkpoint 89.585 ms, 24.19 s total
[ 2023-10-07 22:02:19 ] Completed Epoch: 13 batch 256: moving batch data to device 19.999 ms, 24.21 s total
[ 2023-10-07 22:02:19 ] Completed Epoch: 13 batch 256: forward pass 323.841 ms, 24.53 s total
[ 2023-10-07 22:02:19 ] Completed Epoch: 13 batch 256: backward pass 74.606 ms, 24.61 s total
[ 2023-10-07 22:02:21 ] Completed Epoch: 13 batch 256: computing loss 1,540.043 ms, 26.15 s total
EPOCH: [13], BATCH: [256/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.095, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 256
[ 2023-10-07 22:02:23 ] Completed saving temp checkpoint 1,861.753 ms, 28.01 s total
[ 2023-10-07 22:02:23 ] Completed replacing temp checkpoint with checkpoint 63.815 ms, 28.07 s total
[ 2023-10-07 22:02:23 ] Completed Epoch: 13 batch 257: moving batch data to device 3.793 ms, 28.08 s total
[ 2023-10-07 22:02:23 ] Completed Epoch: 13 batch 257: forward pass 436.565 ms, 28.51 s total
[ 2023-10-07 22:02:23 ] Completed Epoch: 13 batch 257: backward pass 68.704 ms, 28.58 s total
[ 2023-10-07 22:02:25 ] Completed Epoch: 13 batch 257: computing loss 1,472.676 ms, 30.05 s total
EPOCH: [13], BATCH: [257/889], loss: 0.399, loss_box_reg: 0.114, loss_classifier: 0.098, loss_mask: 0.141, loss_objectness: 0.019, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 257
[ 2023-10-07 22:02:26 ] Completed saving temp checkpoint 1,298.303 ms, 31.35 s total
[ 2023-10-07 22:02:26 ] Completed replacing temp checkpoint with checkpoint 57.357 ms, 31.41 s total
[ 2023-10-07 22:02:26 ] Completed Epoch: 13 batch 258: moving batch data to device 23.130 ms, 31.43 s total
[ 2023-10-07 22:02:26 ] Completed Epoch: 13 batch 258: forward pass 316.621 ms, 31.75 s total
[ 2023-10-07 22:02:27 ] Completed Epoch: 13 batch 258: backward pass 109.746 ms, 31.86 s total
[ 2023-10-07 22:02:28 ] Completed Epoch: 13 batch 258: computing loss 1,677.634 ms, 33.54 s total
EPOCH: [13], BATCH: [258/889], loss: 0.377, loss_box_reg: 0.111, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 258
[ 2023-10-07 22:02:30 ] Completed saving temp checkpoint 1,626.398 ms, 35.16 s total
[ 2023-10-07 22:02:30 ] Completed replacing temp checkpoint with checkpoint 62.180 ms, 35.22 s total
[ 2023-10-07 22:02:30 ] Completed Epoch: 13 batch 259: moving batch data to device 22.629 ms, 35.25 s total
[ 2023-10-07 22:02:30 ] Completed Epoch: 13 batch 259: forward pass 307.273 ms, 35.55 s total
[ 2023-10-07 22:02:30 ] Completed Epoch: 13 batch 259: backward pass 68.927 ms, 35.62 s total
[ 2023-10-07 22:02:32 ] Completed Epoch: 13 batch 259: computing loss 1,799.651 ms, 37.42 s total
EPOCH: [13], BATCH: [259/889], loss: 0.402, loss_box_reg: 0.118, loss_classifier: 0.106, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 259
[ 2023-10-07 22:02:34 ] Completed saving temp checkpoint 2,124.627 ms, 39.55 s total
[ 2023-10-07 22:02:34 ] Completed replacing temp checkpoint with checkpoint 81.331 ms, 39.63 s total
[ 2023-10-07 22:02:34 ] Completed Epoch: 13 batch 260: moving batch data to device 8.684 ms, 39.64 s total
[ 2023-10-07 22:02:35 ] Completed Epoch: 13 batch 260: forward pass 376.512 ms, 40.01 s total
[ 2023-10-07 22:02:35 ] Completed Epoch: 13 batch 260: backward pass 69.445 ms, 40.08 s total
[ 2023-10-07 22:02:36 ] Completed Epoch: 13 batch 260: computing loss 1,206.590 ms, 41.29 s total
EPOCH: [13], BATCH: [260/889], loss: 0.405, loss_box_reg: 0.124, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 260
[ 2023-10-07 22:02:37 ] Completed saving temp checkpoint 1,188.984 ms, 42.48 s total
[ 2023-10-07 22:02:37 ] Completed replacing temp checkpoint with checkpoint 35.389 ms, 42.51 s total
[ 2023-10-07 22:02:37 ] Completed Epoch: 13 batch 261: moving batch data to device 20.541 ms, 42.53 s total
[ 2023-10-07 22:02:38 ] Completed Epoch: 13 batch 261: forward pass 310.483 ms, 42.85 s total
[ 2023-10-07 22:02:38 ] Completed Epoch: 13 batch 261: backward pass 79.204 ms, 42.92 s total
[ 2023-10-07 22:02:39 ] Completed Epoch: 13 batch 261: computing loss 1,483.295 ms, 44.41 s total
EPOCH: [13], BATCH: [261/889], loss: 0.383, loss_box_reg: 0.116, loss_classifier: 0.094, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 261
[ 2023-10-07 22:02:41 ] Completed saving temp checkpoint 2,135.781 ms, 46.54 s total
[ 2023-10-07 22:02:41 ] Completed replacing temp checkpoint with checkpoint 54.366 ms, 46.60 s total
[ 2023-10-07 22:02:41 ] Completed Epoch: 13 batch 262: moving batch data to device 21.314 ms, 46.62 s total
[ 2023-10-07 22:02:42 ] Completed Epoch: 13 batch 262: forward pass 400.913 ms, 47.02 s total
[ 2023-10-07 22:02:42 ] Completed Epoch: 13 batch 262: backward pass 56.845 ms, 47.08 s total
[ 2023-10-07 22:02:43 ] Completed Epoch: 13 batch 262: computing loss 1,405.219 ms, 48.48 s total
EPOCH: [13], BATCH: [262/889], loss: 0.387, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 262
[ 2023-10-07 22:02:45 ] Completed saving temp checkpoint 2,000.498 ms, 50.48 s total
[ 2023-10-07 22:02:45 ] Completed replacing temp checkpoint with checkpoint 44.713 ms, 50.53 s total
[ 2023-10-07 22:02:45 ] Completed Epoch: 13 batch 263: moving batch data to device 22.140 ms, 50.55 s total
[ 2023-10-07 22:02:46 ] Completed Epoch: 13 batch 263: forward pass 298.351 ms, 50.85 s total
[ 2023-10-07 22:02:46 ] Completed Epoch: 13 batch 263: backward pass 74.333 ms, 50.92 s total
[ 2023-10-07 22:02:47 ] Completed Epoch: 13 batch 263: computing loss 1,807.585 ms, 52.73 s total
EPOCH: [13], BATCH: [263/889], loss: 0.390, loss_box_reg: 0.118, loss_classifier: 0.102, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 263
[ 2023-10-07 22:02:49 ] Completed saving temp checkpoint 1,776.992 ms, 54.51 s total
[ 2023-10-07 22:02:49 ] Completed replacing temp checkpoint with checkpoint 66.338 ms, 54.57 s total
[ 2023-10-07 22:02:49 ] Completed Epoch: 13 batch 264: moving batch data to device 25.119 ms, 54.60 s total
[ 2023-10-07 22:02:50 ] Completed Epoch: 13 batch 264: forward pass 401.078 ms, 55.00 s total
[ 2023-10-07 22:02:50 ] Completed Epoch: 13 batch 264: backward pass 44.110 ms, 55.04 s total
[ 2023-10-07 22:02:51 ] Completed Epoch: 13 batch 264: computing loss 1,322.211 ms, 56.37 s total
EPOCH: [13], BATCH: [264/889], loss: 0.363, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 264
[ 2023-10-07 22:02:53 ] Completed saving temp checkpoint 1,936.250 ms, 58.30 s total
[ 2023-10-07 22:02:53 ] Completed replacing temp checkpoint with checkpoint 79.650 ms, 58.38 s total
[ 2023-10-07 22:02:53 ] Completed Epoch: 13 batch 265: moving batch data to device 23.717 ms, 58.41 s total
[ 2023-10-07 22:02:54 ] Completed Epoch: 13 batch 265: forward pass 396.701 ms, 58.80 s total
[ 2023-10-07 22:02:54 ] Completed Epoch: 13 batch 265: backward pass 104.881 ms, 58.91 s total
[ 2023-10-07 22:02:55 ] Completed Epoch: 13 batch 265: computing loss 1,531.652 ms, 60.44 s total
EPOCH: [13], BATCH: [265/889], loss: 0.380, loss_box_reg: 0.118, loss_classifier: 0.099, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 265
[ 2023-10-07 22:02:57 ] Completed saving temp checkpoint 1,563.136 ms, 62.00 s total
[ 2023-10-07 22:02:57 ] Completed replacing temp checkpoint with checkpoint 41.972 ms, 62.04 s total
[ 2023-10-07 22:02:57 ] Completed Epoch: 13 batch 266: moving batch data to device 21.430 ms, 62.07 s total
[ 2023-10-07 22:02:57 ] Completed Epoch: 13 batch 266: forward pass 307.133 ms, 62.37 s total
[ 2023-10-07 22:02:57 ] Completed Epoch: 13 batch 266: backward pass 34.837 ms, 62.41 s total
[ 2023-10-07 22:02:59 ] Completed Epoch: 13 batch 266: computing loss 2,026.554 ms, 64.43 s total
EPOCH: [13], BATCH: [266/889], loss: 0.375, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 266
[ 2023-10-07 22:03:01 ] Completed saving temp checkpoint 1,754.032 ms, 66.19 s total
[ 2023-10-07 22:03:01 ] Completed replacing temp checkpoint with checkpoint 40.575 ms, 66.23 s total
[ 2023-10-07 22:03:01 ] Completed Epoch: 13 batch 267: moving batch data to device 23.131 ms, 66.25 s total
[ 2023-10-07 22:03:01 ] Completed Epoch: 13 batch 267: forward pass 420.030 ms, 66.67 s total
[ 2023-10-07 22:03:01 ] Completed Epoch: 13 batch 267: backward pass 89.221 ms, 66.76 s total
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 22:19:24 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 22:19:24 ] Completed importing Timer 0.026 ms, 0.00 s total
[ 2023-10-07 22:19:25 ] Completed importing everything else 466.542 ms, 0.47 s total
[ 2023-10-07 22:19:25 ] Completed defined other functions 0.024 ms, 0.47 s total
| distributed init (rank 2): env://
| distributed init (rank 1): env://
| distributed init (rank 5): env://
| distributed init (rank 4): env://
| distributed init (rank 0): env://
| distributed init (rank 3): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 22:19:31 ] Completed main preliminaries 5,583.766 ms, 6.05 s total
loading annotations into memory...
Done (t=12.21s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 22:19:45 ] Completed loading data 14,122.596 ms, 20.17 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 22:19:45 ] Completed creating data samplers 101.259 ms, 20.27 s total
[ 2023-10-07 22:19:45 ] Completed creating data loaders 0.205 ms, 20.27 s total
[ 2023-10-07 22:19:46 ] Completed creating model and .to(device) 1,652.175 ms, 21.93 s total
[ 2023-10-07 22:19:54 ] Completed preparing model for distributed training 7,941.974 ms, 29.87 s total
[ 2023-10-07 22:19:54 ] Completed optimizer and scaler 0.584 ms, 29.87 s total
[ 2023-10-07 22:19:54 ] Completed learning rate schedulers 0.222 ms, 29.87 s total
[ 2023-10-07 22:19:55 ] Completed init coco evaluator 1,041.918 ms, 30.91 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 22:19:56 ] Completed retrieving checkpoint 775.672 ms, 31.69 s total
EPOCH :: 13
[ 2023-10-07 22:19:56 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 22:19:56 ] Completed training preliminaries 0.882 ms, 0.00 s total
Training / resuming epoch 13 from training step 267
[ 2023-10-07 22:19:56 ] Completed Epoch: 13 batch 267: moving batch data to device 205.722 ms, 0.21 s total
[ 2023-10-07 22:20:08 ] Completed Epoch: 13 batch 267: forward pass 11,558.690 ms, 11.77 s total
[ 2023-10-07 22:20:08 ] Completed Epoch: 13 batch 267: backward pass 152.627 ms, 11.92 s total
[ 2023-10-07 22:20:10 ] Completed Epoch: 13 batch 267: computing loss 1,967.682 ms, 13.89 s total
EPOCH: [13], BATCH: [267/889], loss: 0.369, loss_box_reg: 0.111, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 267
[ 2023-10-07 22:20:11 ] Completed saving temp checkpoint 970.952 ms, 14.86 s total
[ 2023-10-07 22:20:11 ] Completed replacing temp checkpoint with checkpoint 169.831 ms, 15.03 s total
[ 2023-10-07 22:20:11 ] Completed Epoch: 13 batch 268: moving batch data to device 78.087 ms, 15.10 s total
[ 2023-10-07 22:20:12 ] Completed Epoch: 13 batch 268: forward pass 737.190 ms, 15.84 s total
[ 2023-10-07 22:20:13 ] Completed Epoch: 13 batch 268: backward pass 632.964 ms, 16.47 s total
[ 2023-10-07 22:20:15 ] Completed Epoch: 13 batch 268: computing loss 2,812.743 ms, 19.29 s total
EPOCH: [13], BATCH: [268/889], loss: 0.385, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 268
[ 2023-10-07 22:20:16 ] Completed saving temp checkpoint 1,041.909 ms, 20.33 s total
[ 2023-10-07 22:20:17 ] Completed replacing temp checkpoint with checkpoint 75.034 ms, 20.40 s total
[ 2023-10-07 22:20:17 ] Completed Epoch: 13 batch 269: moving batch data to device 19.674 ms, 20.42 s total
[ 2023-10-07 22:20:17 ] Completed Epoch: 13 batch 269: forward pass 747.699 ms, 21.17 s total
[ 2023-10-07 22:20:18 ] Completed Epoch: 13 batch 269: backward pass 1,111.140 ms, 22.28 s total
[ 2023-10-07 22:20:21 ] Completed Epoch: 13 batch 269: computing loss 2,381.261 ms, 24.66 s total
EPOCH: [13], BATCH: [269/889], loss: 0.387, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 269
[ 2023-10-07 22:20:22 ] Completed saving temp checkpoint 1,040.453 ms, 25.70 s total
[ 2023-10-07 22:20:22 ] Completed replacing temp checkpoint with checkpoint 60.887 ms, 25.77 s total
[ 2023-10-07 22:20:22 ] Completed Epoch: 13 batch 270: moving batch data to device 20.226 ms, 25.79 s total
[ 2023-10-07 22:20:23 ] Completed Epoch: 13 batch 270: forward pass 783.929 ms, 26.57 s total
[ 2023-10-07 22:20:23 ] Completed Epoch: 13 batch 270: backward pass 79.342 ms, 26.65 s total
[ 2023-10-07 22:20:26 ] Completed Epoch: 13 batch 270: computing loss 3,394.605 ms, 30.04 s total
EPOCH: [13], BATCH: [270/889], loss: 0.379, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 270
[ 2023-10-07 22:20:27 ] Completed saving temp checkpoint 1,160.817 ms, 31.20 s total
[ 2023-10-07 22:20:27 ] Completed replacing temp checkpoint with checkpoint 61.697 ms, 31.27 s total
[ 2023-10-07 22:20:27 ] Completed Epoch: 13 batch 271: moving batch data to device 20.911 ms, 31.29 s total
[ 2023-10-07 22:20:28 ] Completed Epoch: 13 batch 271: forward pass 736.234 ms, 32.02 s total
[ 2023-10-07 22:20:28 ] Completed Epoch: 13 batch 271: backward pass 104.959 ms, 32.13 s total
[ 2023-10-07 22:20:32 ] Completed Epoch: 13 batch 271: computing loss 3,363.156 ms, 35.49 s total
EPOCH: [13], BATCH: [271/889], loss: 0.361, loss_box_reg: 0.108, loss_classifier: 0.089, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 271
[ 2023-10-07 22:20:33 ] Completed saving temp checkpoint 1,760.888 ms, 37.25 s total
[ 2023-10-07 22:20:33 ] Completed replacing temp checkpoint with checkpoint 91.717 ms, 37.34 s total
[ 2023-10-07 22:20:34 ] Completed Epoch: 13 batch 272: moving batch data to device 26.558 ms, 37.37 s total
[ 2023-10-07 22:20:34 ] Completed Epoch: 13 batch 272: forward pass 724.626 ms, 38.10 s total
[ 2023-10-07 22:20:35 ] Completed Epoch: 13 batch 272: backward pass 545.061 ms, 38.64 s total
[ 2023-10-07 22:20:38 ] Completed Epoch: 13 batch 272: computing loss 2,895.725 ms, 41.54 s total
EPOCH: [13], BATCH: [272/889], loss: 0.372, loss_box_reg: 0.112, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 272
[ 2023-10-07 22:20:39 ] Completed saving temp checkpoint 995.225 ms, 42.53 s total
[ 2023-10-07 22:20:39 ] Completed replacing temp checkpoint with checkpoint 51.020 ms, 42.58 s total
[ 2023-10-07 22:20:39 ] Completed Epoch: 13 batch 273: moving batch data to device 18.453 ms, 42.60 s total
[ 2023-10-07 22:20:39 ] Completed Epoch: 13 batch 273: forward pass 737.988 ms, 43.34 s total
[ 2023-10-07 22:20:40 ] Completed Epoch: 13 batch 273: backward pass 611.119 ms, 43.95 s total
[ 2023-10-07 22:20:43 ] Completed Epoch: 13 batch 273: computing loss 2,845.232 ms, 46.79 s total
EPOCH: [13], BATCH: [273/889], loss: 0.370, loss_box_reg: 0.107, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 273
[ 2023-10-07 22:20:44 ] Completed saving temp checkpoint 1,214.512 ms, 48.01 s total
[ 2023-10-07 22:20:44 ] Completed replacing temp checkpoint with checkpoint 64.791 ms, 48.07 s total
[ 2023-10-07 22:20:44 ] Completed Epoch: 13 batch 274: moving batch data to device 24.920 ms, 48.10 s total
[ 2023-10-07 22:20:45 ] Completed Epoch: 13 batch 274: forward pass 740.762 ms, 48.84 s total
[ 2023-10-07 22:20:45 ] Completed Epoch: 13 batch 274: backward pass 65.252 ms, 48.91 s total
[ 2023-10-07 22:20:48 ] Completed Epoch: 13 batch 274: computing loss 3,373.270 ms, 52.28 s total
EPOCH: [13], BATCH: [274/889], loss: 0.401, loss_box_reg: 0.121, loss_classifier: 0.099, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 274
[ 2023-10-07 22:20:49 ] Completed saving temp checkpoint 1,038.440 ms, 53.32 s total
[ 2023-10-07 22:20:50 ] Completed replacing temp checkpoint with checkpoint 72.382 ms, 53.39 s total
[ 2023-10-07 22:20:50 ] Completed Epoch: 13 batch 275: moving batch data to device 22.142 ms, 53.41 s total
[ 2023-10-07 22:20:50 ] Completed Epoch: 13 batch 275: forward pass 743.514 ms, 54.15 s total
[ 2023-10-07 22:20:50 ] Completed Epoch: 13 batch 275: backward pass 61.960 ms, 54.22 s total
[ 2023-10-07 22:20:54 ] Completed Epoch: 13 batch 275: computing loss 3,437.587 ms, 57.65 s total
EPOCH: [13], BATCH: [275/889], loss: 0.366, loss_box_reg: 0.109, loss_classifier: 0.089, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 275
[ 2023-10-07 22:20:55 ] Completed saving temp checkpoint 970.030 ms, 58.62 s total
[ 2023-10-07 22:20:55 ] Completed replacing temp checkpoint with checkpoint 49.988 ms, 58.67 s total
[ 2023-10-07 22:20:55 ] Completed Epoch: 13 batch 276: moving batch data to device 22.796 ms, 58.70 s total
[ 2023-10-07 22:20:56 ] Completed Epoch: 13 batch 276: forward pass 732.531 ms, 59.43 s total
[ 2023-10-07 22:20:56 ] Completed Epoch: 13 batch 276: backward pass 72.573 ms, 59.50 s total
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 22:34:12 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 22:34:12 ] Completed importing Timer 0.022 ms, 0.00 s total
[ 2023-10-07 22:34:12 ] Completed importing everything else 460.062 ms, 0.46 s total
[ 2023-10-07 22:34:12 ] Completed defined other functions 0.026 ms, 0.46 s total
| distributed init (rank 5): env://
| distributed init (rank 3): env://
| distributed init (rank 0): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
| distributed init (rank 4): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 22:34:22 ] Completed main preliminaries 10,129.197 ms, 10.59 s total
loading annotations into memory...
Done (t=12.35s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 22:34:37 ] Completed loading data 14,289.510 ms, 24.88 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 22:34:37 ] Completed creating data samplers 102.005 ms, 24.98 s total
[ 2023-10-07 22:34:37 ] Completed creating data loaders 0.199 ms, 24.98 s total
[ 2023-10-07 22:34:38 ] Completed creating model and .to(device) 920.146 ms, 25.90 s total
[ 2023-10-07 22:34:45 ] Completed preparing model for distributed training 7,700.027 ms, 33.60 s total
[ 2023-10-07 22:34:45 ] Completed optimizer and scaler 0.566 ms, 33.60 s total
[ 2023-10-07 22:34:45 ] Completed learning rate schedulers 0.217 ms, 33.60 s total
[ 2023-10-07 22:34:46 ] Completed init coco evaluator 963.893 ms, 34.57 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 22:34:47 ] Completed retrieving checkpoint 992.096 ms, 35.56 s total
EPOCH :: 13
[ 2023-10-07 22:34:47 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 22:34:47 ] Completed training preliminaries 0.844 ms, 0.00 s total
Training / resuming epoch 13 from training step 276
[ 2023-10-07 22:34:48 ] Completed Epoch: 13 batch 276: moving batch data to device 253.728 ms, 0.25 s total
[ 2023-10-07 22:34:59 ] Completed Epoch: 13 batch 276: forward pass 11,418.600 ms, 11.67 s total
[ 2023-10-07 22:34:59 ] Completed Epoch: 13 batch 276: backward pass 194.640 ms, 11.87 s total
[ 2023-10-07 22:35:01 ] Completed Epoch: 13 batch 276: computing loss 1,958.372 ms, 13.83 s total
EPOCH: [13], BATCH: [276/889], loss: 0.401, loss_box_reg: 0.122, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 276
[ 2023-10-07 22:35:02 ] Completed saving temp checkpoint 1,081.217 ms, 14.91 s total
[ 2023-10-07 22:35:03 ] Completed replacing temp checkpoint with checkpoint 192.169 ms, 15.10 s total
[ 2023-10-07 22:35:03 ] Completed Epoch: 13 batch 277: moving batch data to device 20.577 ms, 15.12 s total
[ 2023-10-07 22:35:03 ] Completed Epoch: 13 batch 277: forward pass 739.676 ms, 15.86 s total
[ 2023-10-07 22:35:03 ] Completed Epoch: 13 batch 277: backward pass 70.292 ms, 15.93 s total
[ 2023-10-07 22:35:07 ] Completed Epoch: 13 batch 277: computing loss 3,430.871 ms, 19.36 s total
EPOCH: [13], BATCH: [277/889], loss: 0.386, loss_box_reg: 0.116, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 277
[ 2023-10-07 22:35:08 ] Completed saving temp checkpoint 1,111.717 ms, 20.47 s total
[ 2023-10-07 22:35:08 ] Completed replacing temp checkpoint with checkpoint 52.866 ms, 20.53 s total
[ 2023-10-07 22:35:08 ] Completed Epoch: 13 batch 278: moving batch data to device 20.167 ms, 20.55 s total
[ 2023-10-07 22:35:09 ] Completed Epoch: 13 batch 278: forward pass 754.879 ms, 21.30 s total
[ 2023-10-07 22:35:09 ] Completed Epoch: 13 batch 278: backward pass 260.566 ms, 21.56 s total
[ 2023-10-07 22:35:12 ] Completed Epoch: 13 batch 278: computing loss 3,320.664 ms, 24.88 s total
EPOCH: [13], BATCH: [278/889], loss: 0.376, loss_box_reg: 0.114, loss_classifier: 0.090, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 278
[ 2023-10-07 22:35:14 ] Completed saving temp checkpoint 1,234.766 ms, 26.12 s total
[ 2023-10-07 22:35:14 ] Completed replacing temp checkpoint with checkpoint 69.776 ms, 26.19 s total
[ 2023-10-07 22:35:14 ] Completed Epoch: 13 batch 279: moving batch data to device 18.645 ms, 26.21 s total
[ 2023-10-07 22:35:14 ] Completed Epoch: 13 batch 279: forward pass 756.969 ms, 26.96 s total
[ 2023-10-07 22:35:15 ] Completed Epoch: 13 batch 279: backward pass 572.414 ms, 27.53 s total
[ 2023-10-07 22:35:18 ] Completed Epoch: 13 batch 279: computing loss 2,913.303 ms, 30.45 s total
EPOCH: [13], BATCH: [279/889], loss: 0.385, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 279
[ 2023-10-07 22:35:19 ] Completed saving temp checkpoint 1,052.600 ms, 31.50 s total
[ 2023-10-07 22:35:19 ] Completed replacing temp checkpoint with checkpoint 64.452 ms, 31.56 s total
[ 2023-10-07 22:35:19 ] Completed Epoch: 13 batch 280: moving batch data to device 27.875 ms, 31.59 s total
[ 2023-10-07 22:35:20 ] Completed Epoch: 13 batch 280: forward pass 729.686 ms, 32.32 s total
[ 2023-10-07 22:35:20 ] Completed Epoch: 13 batch 280: backward pass 96.007 ms, 32.42 s total
[ 2023-10-07 22:35:23 ] Completed Epoch: 13 batch 280: computing loss 3,351.986 ms, 35.77 s total
EPOCH: [13], BATCH: [280/889], loss: 0.358, loss_box_reg: 0.102, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 280
[ 2023-10-07 22:35:24 ] Completed saving temp checkpoint 1,048.768 ms, 36.82 s total
[ 2023-10-07 22:35:24 ] Completed replacing temp checkpoint with checkpoint 71.099 ms, 36.89 s total
[ 2023-10-07 22:35:24 ] Completed Epoch: 13 batch 281: moving batch data to device 19.592 ms, 36.91 s total
[ 2023-10-07 22:35:25 ] Completed Epoch: 13 batch 281: forward pass 730.608 ms, 37.64 s total
[ 2023-10-07 22:35:25 ] Completed Epoch: 13 batch 281: backward pass 132.007 ms, 37.77 s total
[ 2023-10-07 22:35:28 ] Completed Epoch: 13 batch 281: computing loss 3,182.632 ms, 40.96 s total
EPOCH: [13], BATCH: [281/889], loss: 0.393, loss_box_reg: 0.120, loss_classifier: 0.104, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 281
[ 2023-10-07 22:35:29 ] Completed saving temp checkpoint 1,117.755 ms, 42.07 s total
[ 2023-10-07 22:35:30 ] Completed replacing temp checkpoint with checkpoint 70.138 ms, 42.14 s total
[ 2023-10-07 22:35:30 ] Completed Epoch: 13 batch 282: moving batch data to device 22.294 ms, 42.17 s total
[ 2023-10-07 22:35:30 ] Completed Epoch: 13 batch 282: forward pass 753.400 ms, 42.92 s total
[ 2023-10-07 22:35:30 ] Completed Epoch: 13 batch 282: backward pass 71.580 ms, 42.99 s total
[ 2023-10-07 22:35:34 ] Completed Epoch: 13 batch 282: computing loss 3,320.048 ms, 46.31 s total
EPOCH: [13], BATCH: [282/889], loss: 0.378, loss_box_reg: 0.119, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 282
[ 2023-10-07 22:35:35 ] Completed saving temp checkpoint 882.555 ms, 47.19 s total
[ 2023-10-07 22:35:35 ] Completed replacing temp checkpoint with checkpoint 43.350 ms, 47.24 s total
[ 2023-10-07 22:35:35 ] Completed Epoch: 13 batch 283: moving batch data to device 19.797 ms, 47.26 s total
[ 2023-10-07 22:35:35 ] Completed Epoch: 13 batch 283: forward pass 749.036 ms, 48.00 s total
[ 2023-10-07 22:35:35 ] Completed Epoch: 13 batch 283: backward pass 84.646 ms, 48.09 s total
[ 2023-10-07 22:35:39 ] Completed Epoch: 13 batch 283: computing loss 3,356.819 ms, 51.45 s total
EPOCH: [13], BATCH: [283/889], loss: 0.396, loss_box_reg: 0.118, loss_classifier: 0.103, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 283
[ 2023-10-07 22:35:41 ] Completed saving temp checkpoint 1,648.975 ms, 53.10 s total
[ 2023-10-07 22:35:41 ] Completed replacing temp checkpoint with checkpoint 102.755 ms, 53.20 s total
[ 2023-10-07 22:35:41 ] Completed Epoch: 13 batch 284: moving batch data to device 24.429 ms, 53.22 s total
[ 2023-10-07 22:35:41 ] Completed Epoch: 13 batch 284: forward pass 754.868 ms, 53.98 s total
[ 2023-10-07 22:35:41 ] Completed Epoch: 13 batch 284: backward pass 74.388 ms, 54.05 s total
[ 2023-10-07 22:35:45 ] Completed Epoch: 13 batch 284: computing loss 3,281.365 ms, 57.33 s total
EPOCH: [13], BATCH: [284/889], loss: 0.372, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.124, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 284
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 22:48:58 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 22:48:58 ] Completed importing Timer 0.021 ms, 0.00 s total
[ 2023-10-07 22:48:59 ] Completed importing everything else 710.724 ms, 0.71 s total
[ 2023-10-07 22:48:59 ] Completed defined other functions 0.028 ms, 0.71 s total
| distributed init (rank 5): env://
| distributed init (rank 3): env://
| distributed init (rank 0): env://
| distributed init (rank 4): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 22:49:02 ] Completed main preliminaries 3,165.165 ms, 3.88 s total
loading annotations into memory...
Done (t=11.57s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 22:49:16 ] Completed loading data 13,583.008 ms, 17.46 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 22:49:16 ] Completed creating data samplers 130.644 ms, 17.59 s total
[ 2023-10-07 22:49:16 ] Completed creating data loaders 0.251 ms, 17.59 s total
[ 2023-10-07 22:49:16 ] Completed creating model and .to(device) 679.967 ms, 18.27 s total
[ 2023-10-07 22:49:19 ] Completed preparing model for distributed training 2,499.317 ms, 20.77 s total
[ 2023-10-07 22:49:19 ] Completed optimizer and scaler 0.532 ms, 20.77 s total
[ 2023-10-07 22:49:19 ] Completed learning rate schedulers 0.120 ms, 20.77 s total
[ 2023-10-07 22:49:20 ] Completed init coco evaluator 973.400 ms, 21.74 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 22:49:21 ] Completed retrieving checkpoint 1,052.511 ms, 22.80 s total
EPOCH :: 13
[ 2023-10-07 22:49:21 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 22:49:21 ] Completed training preliminaries 0.838 ms, 0.00 s total
Training / resuming epoch 13 from training step 284
[ 2023-10-07 22:49:21 ] Completed Epoch: 13 batch 284: moving batch data to device 243.604 ms, 0.24 s total
[ 2023-10-07 22:49:27 ] Completed Epoch: 13 batch 284: forward pass 5,332.136 ms, 5.58 s total
[ 2023-10-07 22:49:27 ] Completed Epoch: 13 batch 284: backward pass 165.398 ms, 5.74 s total
[ 2023-10-07 22:49:28 ] Completed Epoch: 13 batch 284: computing loss 1,159.899 ms, 6.90 s total
EPOCH: [13], BATCH: [284/889], loss: 0.373, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 284
[ 2023-10-07 22:49:29 ] Completed saving temp checkpoint 1,355.206 ms, 8.26 s total
[ 2023-10-07 22:49:29 ] Completed replacing temp checkpoint with checkpoint 147.577 ms, 8.40 s total
[ 2023-10-07 22:49:29 ] Completed Epoch: 13 batch 285: moving batch data to device 23.661 ms, 8.43 s total
[ 2023-10-07 22:49:30 ] Completed Epoch: 13 batch 285: forward pass 321.395 ms, 8.75 s total
[ 2023-10-07 22:49:30 ] Completed Epoch: 13 batch 285: backward pass 73.272 ms, 8.82 s total
[ 2023-10-07 22:49:31 ] Completed Epoch: 13 batch 285: computing loss 1,535.616 ms, 10.36 s total
EPOCH: [13], BATCH: [285/889], loss: 0.399, loss_box_reg: 0.126, loss_classifier: 0.102, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 285
[ 2023-10-07 22:49:33 ] Completed saving temp checkpoint 1,456.228 ms, 11.81 s total
[ 2023-10-07 22:49:33 ] Completed replacing temp checkpoint with checkpoint 51.519 ms, 11.87 s total
[ 2023-10-07 22:49:33 ] Completed Epoch: 13 batch 286: moving batch data to device 21.001 ms, 11.89 s total
[ 2023-10-07 22:49:33 ] Completed Epoch: 13 batch 286: forward pass 311.570 ms, 12.20 s total
[ 2023-10-07 22:49:33 ] Completed Epoch: 13 batch 286: backward pass 143.135 ms, 12.34 s total
[ 2023-10-07 22:49:35 ] Completed Epoch: 13 batch 286: computing loss 1,815.399 ms, 14.16 s total
EPOCH: [13], BATCH: [286/889], loss: 0.396, loss_box_reg: 0.119, loss_classifier: 0.102, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 286
[ 2023-10-07 22:49:37 ] Completed saving temp checkpoint 1,744.326 ms, 15.90 s total
[ 2023-10-07 22:49:37 ] Completed replacing temp checkpoint with checkpoint 58.950 ms, 15.96 s total
[ 2023-10-07 22:49:37 ] Completed Epoch: 13 batch 287: moving batch data to device 22.263 ms, 15.98 s total
[ 2023-10-07 22:49:37 ] Completed Epoch: 13 batch 287: forward pass 332.324 ms, 16.32 s total
[ 2023-10-07 22:49:38 ] Completed Epoch: 13 batch 287: backward pass 392.631 ms, 16.71 s total
[ 2023-10-07 22:49:39 ] Completed Epoch: 13 batch 287: computing loss 1,137.273 ms, 17.85 s total
EPOCH: [13], BATCH: [287/889], loss: 0.388, loss_box_reg: 0.116, loss_classifier: 0.100, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 287
[ 2023-10-07 22:49:40 ] Completed saving temp checkpoint 1,266.818 ms, 19.11 s total
[ 2023-10-07 22:49:40 ] Completed replacing temp checkpoint with checkpoint 50.186 ms, 19.16 s total
[ 2023-10-07 22:49:40 ] Completed Epoch: 13 batch 288: moving batch data to device 73.031 ms, 19.24 s total
[ 2023-10-07 22:49:41 ] Completed Epoch: 13 batch 288: forward pass 331.716 ms, 19.57 s total
[ 2023-10-07 22:49:41 ] Completed Epoch: 13 batch 288: backward pass 88.984 ms, 19.66 s total
[ 2023-10-07 22:49:42 ] Completed Epoch: 13 batch 288: computing loss 1,841.053 ms, 21.50 s total
EPOCH: [13], BATCH: [288/889], loss: 0.398, loss_box_reg: 0.122, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 288
[ 2023-10-07 22:49:44 ] Completed saving temp checkpoint 1,816.615 ms, 23.31 s total
[ 2023-10-07 22:49:44 ] Completed replacing temp checkpoint with checkpoint 82.812 ms, 23.40 s total
[ 2023-10-07 22:49:44 ] Completed Epoch: 13 batch 289: moving batch data to device 20.154 ms, 23.42 s total
[ 2023-10-07 22:49:45 ] Completed Epoch: 13 batch 289: forward pass 325.805 ms, 23.74 s total
[ 2023-10-07 22:49:45 ] Completed Epoch: 13 batch 289: backward pass 87.435 ms, 23.83 s total
[ 2023-10-07 22:49:47 ] Completed Epoch: 13 batch 289: computing loss 2,014.451 ms, 25.84 s total
EPOCH: [13], BATCH: [289/889], loss: 0.381, loss_box_reg: 0.113, loss_classifier: 0.092, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 289
[ 2023-10-07 22:49:49 ] Completed saving temp checkpoint 2,047.516 ms, 27.89 s total
[ 2023-10-07 22:49:49 ] Completed replacing temp checkpoint with checkpoint 58.873 ms, 27.95 s total
[ 2023-10-07 22:49:49 ] Completed Epoch: 13 batch 290: moving batch data to device 2.852 ms, 27.95 s total
[ 2023-10-07 22:49:49 ] Completed Epoch: 13 batch 290: forward pass 435.818 ms, 28.39 s total
[ 2023-10-07 22:49:50 ] Completed Epoch: 13 batch 290: backward pass 370.142 ms, 28.76 s total
[ 2023-10-07 22:49:51 ] Completed Epoch: 13 batch 290: computing loss 1,599.513 ms, 30.36 s total
EPOCH: [13], BATCH: [290/889], loss: 0.391, loss_box_reg: 0.117, loss_classifier: 0.099, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 290
[ 2023-10-07 22:49:53 ] Completed saving temp checkpoint 1,676.741 ms, 32.04 s total
[ 2023-10-07 22:49:53 ] Completed replacing temp checkpoint with checkpoint 65.710 ms, 32.10 s total
[ 2023-10-07 22:49:53 ] Completed Epoch: 13 batch 291: moving batch data to device 7.515 ms, 32.11 s total
[ 2023-10-07 22:49:54 ] Completed Epoch: 13 batch 291: forward pass 446.162 ms, 32.56 s total
[ 2023-10-07 22:49:54 ] Completed Epoch: 13 batch 291: backward pass 38.502 ms, 32.59 s total
[ 2023-10-07 22:49:55 ] Completed Epoch: 13 batch 291: computing loss 1,809.728 ms, 34.40 s total
EPOCH: [13], BATCH: [291/889], loss: 0.397, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.127, loss_objectness: 0.019, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 291
[ 2023-10-07 22:49:57 ] Completed saving temp checkpoint 1,856.743 ms, 36.26 s total
[ 2023-10-07 22:49:57 ] Completed replacing temp checkpoint with checkpoint 54.842 ms, 36.31 s total
[ 2023-10-07 22:49:57 ] Completed Epoch: 13 batch 292: moving batch data to device 21.504 ms, 36.34 s total
[ 2023-10-07 22:49:58 ] Completed Epoch: 13 batch 292: forward pass 324.782 ms, 36.66 s total
[ 2023-10-07 22:49:58 ] Completed Epoch: 13 batch 292: backward pass 75.960 ms, 36.74 s total
[ 2023-10-07 22:49:59 ] Completed Epoch: 13 batch 292: computing loss 1,532.465 ms, 38.27 s total
EPOCH: [13], BATCH: [292/889], loss: 0.367, loss_box_reg: 0.112, loss_classifier: 0.091, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 292
[ 2023-10-07 22:50:01 ] Completed saving temp checkpoint 1,371.852 ms, 39.64 s total
[ 2023-10-07 22:50:01 ] Completed replacing temp checkpoint with checkpoint 52.632 ms, 39.69 s total
[ 2023-10-07 22:50:01 ] Completed Epoch: 13 batch 293: moving batch data to device 21.904 ms, 39.72 s total
[ 2023-10-07 22:50:01 ] Completed Epoch: 13 batch 293: forward pass 315.612 ms, 40.03 s total
[ 2023-10-07 22:50:01 ] Completed Epoch: 13 batch 293: backward pass 74.773 ms, 40.11 s total
[ 2023-10-07 22:50:03 ] Completed Epoch: 13 batch 293: computing loss 1,460.755 ms, 41.57 s total
EPOCH: [13], BATCH: [293/889], loss: 0.400, loss_box_reg: 0.122, loss_classifier: 0.095, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 293
[ 2023-10-07 22:50:04 ] Completed saving temp checkpoint 1,289.994 ms, 42.86 s total
[ 2023-10-07 22:50:04 ] Completed replacing temp checkpoint with checkpoint 56.116 ms, 42.91 s total
[ 2023-10-07 22:50:04 ] Completed Epoch: 13 batch 294: moving batch data to device 21.162 ms, 42.93 s total
[ 2023-10-07 22:50:04 ] Completed Epoch: 13 batch 294: forward pass 313.039 ms, 43.25 s total
[ 2023-10-07 22:50:05 ] Completed Epoch: 13 batch 294: backward pass 342.197 ms, 43.59 s total
[ 2023-10-07 22:50:06 ] Completed Epoch: 13 batch 294: computing loss 1,168.170 ms, 44.76 s total
EPOCH: [13], BATCH: [294/889], loss: 0.418, loss_box_reg: 0.127, loss_classifier: 0.110, loss_mask: 0.137, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 294
[ 2023-10-07 22:50:07 ] Completed saving temp checkpoint 1,449.425 ms, 46.21 s total
[ 2023-10-07 22:50:07 ] Completed replacing temp checkpoint with checkpoint 65.273 ms, 46.27 s total
[ 2023-10-07 22:50:07 ] Completed Epoch: 13 batch 295: moving batch data to device 23.513 ms, 46.30 s total
[ 2023-10-07 22:50:08 ] Completed Epoch: 13 batch 295: forward pass 353.318 ms, 46.65 s total
[ 2023-10-07 22:50:08 ] Completed Epoch: 13 batch 295: backward pass 68.594 ms, 46.72 s total
[ 2023-10-07 22:50:09 ] Completed Epoch: 13 batch 295: computing loss 1,655.603 ms, 48.37 s total
EPOCH: [13], BATCH: [295/889], loss: 0.397, loss_box_reg: 0.117, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.019, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 295
[ 2023-10-07 22:50:11 ] Completed saving temp checkpoint 1,946.601 ms, 50.32 s total
[ 2023-10-07 22:50:11 ] Completed replacing temp checkpoint with checkpoint 65.313 ms, 50.39 s total
[ 2023-10-07 22:50:11 ] Completed Epoch: 13 batch 296: moving batch data to device 21.644 ms, 50.41 s total
[ 2023-10-07 22:50:12 ] Completed Epoch: 13 batch 296: forward pass 296.284 ms, 50.70 s total
[ 2023-10-07 22:50:12 ] Completed Epoch: 13 batch 296: backward pass 74.890 ms, 50.78 s total
[ 2023-10-07 22:50:13 ] Completed Epoch: 13 batch 296: computing loss 1,594.515 ms, 52.37 s total
EPOCH: [13], BATCH: [296/889], loss: 0.375, loss_box_reg: 0.112, loss_classifier: 0.099, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 296
[ 2023-10-07 22:50:15 ] Completed saving temp checkpoint 1,265.598 ms, 53.64 s total
[ 2023-10-07 22:50:15 ] Completed replacing temp checkpoint with checkpoint 37.035 ms, 53.68 s total
[ 2023-10-07 22:50:15 ] Completed Epoch: 13 batch 297: moving batch data to device 21.791 ms, 53.70 s total
[ 2023-10-07 22:50:15 ] Completed Epoch: 13 batch 297: forward pass 383.074 ms, 54.08 s total
[ 2023-10-07 22:50:15 ] Completed Epoch: 13 batch 297: backward pass 74.060 ms, 54.15 s total
[ 2023-10-07 22:50:17 ] Completed Epoch: 13 batch 297: computing loss 1,389.431 ms, 55.54 s total
EPOCH: [13], BATCH: [297/889], loss: 0.360, loss_box_reg: 0.107, loss_classifier: 0.090, loss_mask: 0.126, loss_objectness: 0.014, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 297
[ 2023-10-07 22:50:18 ] Completed saving temp checkpoint 1,305.610 ms, 56.85 s total
[ 2023-10-07 22:50:18 ] Completed replacing temp checkpoint with checkpoint 52.837 ms, 56.90 s total
[ 2023-10-07 22:50:18 ] Completed Epoch: 13 batch 298: moving batch data to device 22.614 ms, 56.92 s total
[ 2023-10-07 22:50:18 ] Completed Epoch: 13 batch 298: forward pass 306.823 ms, 57.23 s total
[ 2023-10-07 22:50:18 ] Completed Epoch: 13 batch 298: backward pass 39.933 ms, 57.27 s total
[ 2023-10-07 22:50:20 ] Completed Epoch: 13 batch 298: computing loss 1,498.558 ms, 58.77 s total
EPOCH: [13], BATCH: [298/889], loss: 0.362, loss_box_reg: 0.112, loss_classifier: 0.092, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 298
[ 2023-10-07 22:50:21 ] Completed saving temp checkpoint 1,257.317 ms, 60.03 s total
[ 2023-10-07 22:50:21 ] Completed replacing temp checkpoint with checkpoint 70.707 ms, 60.10 s total
[ 2023-10-07 22:50:21 ] Completed Epoch: 13 batch 299: moving batch data to device 23.388 ms, 60.12 s total
[ 2023-10-07 22:50:21 ] Completed Epoch: 13 batch 299: forward pass 345.682 ms, 60.47 s total
[ 2023-10-07 22:50:22 ] Completed Epoch: 13 batch 299: backward pass 86.590 ms, 60.55 s total
[ 2023-10-07 22:50:23 ] Completed Epoch: 13 batch 299: computing loss 1,395.030 ms, 61.95 s total
EPOCH: [13], BATCH: [299/889], loss: 0.389, loss_box_reg: 0.114, loss_classifier: 0.096, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 299
[ 2023-10-07 22:50:24 ] Completed saving temp checkpoint 1,305.594 ms, 63.25 s total
[ 2023-10-07 22:50:24 ] Completed replacing temp checkpoint with checkpoint 56.161 ms, 63.31 s total
[ 2023-10-07 22:50:24 ] Completed Epoch: 13 batch 300: moving batch data to device 22.869 ms, 63.33 s total
[ 2023-10-07 22:50:25 ] Completed Epoch: 13 batch 300: forward pass 316.257 ms, 63.65 s total
[ 2023-10-07 22:50:25 ] Completed Epoch: 13 batch 300: backward pass 45.363 ms, 63.70 s total
[ 2023-10-07 22:50:27 ] Completed Epoch: 13 batch 300: computing loss 1,841.517 ms, 65.54 s total
EPOCH: [13], BATCH: [300/889], loss: 0.388, loss_box_reg: 0.122, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 300
[ 2023-10-07 22:50:28 ] Completed saving temp checkpoint 1,659.750 ms, 67.20 s total
[ 2023-10-07 22:50:28 ] Completed replacing temp checkpoint with checkpoint 73.653 ms, 67.27 s total
[ 2023-10-07 22:50:28 ] Completed Epoch: 13 batch 301: moving batch data to device 23.098 ms, 67.29 s total
[ 2023-10-07 22:50:29 ] Completed Epoch: 13 batch 301: forward pass 338.769 ms, 67.63 s total
[ 2023-10-07 22:50:29 ] Completed Epoch: 13 batch 301: backward pass 67.281 ms, 67.70 s total
[ 2023-10-07 22:50:31 ] Completed Epoch: 13 batch 301: computing loss 2,225.979 ms, 69.93 s total
EPOCH: [13], BATCH: [301/889], loss: 0.373, loss_box_reg: 0.108, loss_classifier: 0.093, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 301
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 23:03:42 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 23:03:42 ] Completed importing Timer 0.029 ms, 0.00 s total
[ 2023-10-07 23:03:43 ] Completed importing everything else 538.647 ms, 0.54 s total
[ 2023-10-07 23:03:43 ] Completed defined other functions 0.026 ms, 0.54 s total
| distributed init (rank 2): env://
| distributed init (rank 1): env://
| distributed init (rank 5): env://
| distributed init (rank 3): env://
| distributed init (rank 0): env://
| distributed init (rank 4): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 23:03:46 ] Completed main preliminaries 3,430.887 ms, 3.97 s total
loading annotations into memory...
Done (t=11.59s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 23:04:00 ] Completed loading data 13,493.033 ms, 17.46 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 23:04:00 ] Completed creating data samplers 103.517 ms, 17.57 s total
[ 2023-10-07 23:04:00 ] Completed creating data loaders 0.220 ms, 17.57 s total
[ 2023-10-07 23:04:01 ] Completed creating model and .to(device) 662.858 ms, 18.23 s total
[ 2023-10-07 23:04:03 ] Completed preparing model for distributed training 2,522.539 ms, 20.75 s total
[ 2023-10-07 23:04:03 ] Completed optimizer and scaler 0.548 ms, 20.75 s total
[ 2023-10-07 23:04:03 ] Completed learning rate schedulers 0.128 ms, 20.75 s total
[ 2023-10-07 23:04:04 ] Completed init coco evaluator 983.124 ms, 21.74 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 23:04:05 ] Completed retrieving checkpoint 880.634 ms, 22.62 s total
EPOCH :: 13
[ 2023-10-07 23:04:05 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 23:04:05 ] Completed training preliminaries 0.878 ms, 0.00 s total
Training / resuming epoch 13 from training step 301
[ 2023-10-07 23:04:05 ] Completed Epoch: 13 batch 301: moving batch data to device 243.322 ms, 0.24 s total
[ 2023-10-07 23:04:11 ] Completed Epoch: 13 batch 301: forward pass 5,355.716 ms, 5.60 s total
[ 2023-10-07 23:04:11 ] Completed Epoch: 13 batch 301: backward pass 147.753 ms, 5.75 s total
[ 2023-10-07 23:04:12 ] Completed Epoch: 13 batch 301: computing loss 1,092.340 ms, 6.84 s total
EPOCH: [13], BATCH: [301/889], loss: 0.372, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 301
[ 2023-10-07 23:04:13 ] Completed saving temp checkpoint 1,430.508 ms, 8.27 s total
[ 2023-10-07 23:04:14 ] Completed replacing temp checkpoint with checkpoint 136.463 ms, 8.41 s total
[ 2023-10-07 23:04:14 ] Completed Epoch: 13 batch 302: moving batch data to device 51.146 ms, 8.46 s total
[ 2023-10-07 23:04:14 ] Completed Epoch: 13 batch 302: forward pass 435.122 ms, 8.89 s total
[ 2023-10-07 23:04:14 ] Completed Epoch: 13 batch 302: backward pass 166.494 ms, 9.06 s total
[ 2023-10-07 23:04:16 ] Completed Epoch: 13 batch 302: computing loss 1,657.215 ms, 10.72 s total
EPOCH: [13], BATCH: [302/889], loss: 0.400, loss_box_reg: 0.125, loss_classifier: 0.105, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 302
[ 2023-10-07 23:04:17 ] Completed saving temp checkpoint 1,168.966 ms, 11.89 s total
[ 2023-10-07 23:04:17 ] Completed replacing temp checkpoint with checkpoint 70.602 ms, 11.96 s total
[ 2023-10-07 23:04:17 ] Completed Epoch: 13 batch 303: moving batch data to device 20.686 ms, 11.98 s total
[ 2023-10-07 23:04:17 ] Completed Epoch: 13 batch 303: forward pass 323.360 ms, 12.30 s total
[ 2023-10-07 23:04:18 ] Completed Epoch: 13 batch 303: backward pass 419.196 ms, 12.72 s total
[ 2023-10-07 23:04:19 ] Completed Epoch: 13 batch 303: computing loss 957.482 ms, 13.68 s total
EPOCH: [13], BATCH: [303/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.099, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 303
[ 2023-10-07 23:04:20 ] Completed saving temp checkpoint 1,339.188 ms, 15.02 s total
[ 2023-10-07 23:04:20 ] Completed replacing temp checkpoint with checkpoint 80.619 ms, 15.10 s total
[ 2023-10-07 23:04:20 ] Completed Epoch: 13 batch 304: moving batch data to device 27.539 ms, 15.12 s total
[ 2023-10-07 23:04:21 ] Completed Epoch: 13 batch 304: forward pass 332.471 ms, 15.46 s total
[ 2023-10-07 23:04:21 ] Completed Epoch: 13 batch 304: backward pass 73.261 ms, 15.53 s total
[ 2023-10-07 23:04:22 ] Completed Epoch: 13 batch 304: computing loss 1,294.268 ms, 16.82 s total
EPOCH: [13], BATCH: [304/889], loss: 0.408, loss_box_reg: 0.127, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 304
[ 2023-10-07 23:04:23 ] Completed saving temp checkpoint 1,386.639 ms, 18.21 s total
[ 2023-10-07 23:04:23 ] Completed replacing temp checkpoint with checkpoint 69.203 ms, 18.28 s total
[ 2023-10-07 23:04:23 ] Completed Epoch: 13 batch 305: moving batch data to device 19.032 ms, 18.30 s total
[ 2023-10-07 23:04:24 ] Completed Epoch: 13 batch 305: forward pass 330.573 ms, 18.63 s total
[ 2023-10-07 23:04:24 ] Completed Epoch: 13 batch 305: backward pass 67.202 ms, 18.70 s total
[ 2023-10-07 23:04:25 ] Completed Epoch: 13 batch 305: computing loss 1,334.916 ms, 20.03 s total
EPOCH: [13], BATCH: [305/889], loss: 0.390, loss_box_reg: 0.119, loss_classifier: 0.103, loss_mask: 0.126, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 305
[ 2023-10-07 23:04:26 ] Completed saving temp checkpoint 1,338.655 ms, 21.37 s total
[ 2023-10-07 23:04:27 ] Completed replacing temp checkpoint with checkpoint 51.345 ms, 21.42 s total
[ 2023-10-07 23:04:27 ] Completed Epoch: 13 batch 306: moving batch data to device 22.069 ms, 21.44 s total
[ 2023-10-07 23:04:27 ] Completed Epoch: 13 batch 306: forward pass 365.246 ms, 21.81 s total
[ 2023-10-07 23:04:27 ] Completed Epoch: 13 batch 306: backward pass 94.330 ms, 21.90 s total
[ 2023-10-07 23:04:29 ] Completed Epoch: 13 batch 306: computing loss 1,636.522 ms, 23.54 s total
EPOCH: [13], BATCH: [306/889], loss: 0.380, loss_box_reg: 0.114, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.017, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 306
[ 2023-10-07 23:04:30 ] Completed saving temp checkpoint 1,665.230 ms, 25.21 s total
[ 2023-10-07 23:04:30 ] Completed replacing temp checkpoint with checkpoint 105.027 ms, 25.31 s total
[ 2023-10-07 23:04:30 ] Completed Epoch: 13 batch 307: moving batch data to device 21.304 ms, 25.33 s total
[ 2023-10-07 23:04:31 ] Completed Epoch: 13 batch 307: forward pass 313.372 ms, 25.65 s total
[ 2023-10-07 23:04:31 ] Completed Epoch: 13 batch 307: backward pass 69.406 ms, 25.71 s total
[ 2023-10-07 23:04:32 ] Completed Epoch: 13 batch 307: computing loss 1,407.106 ms, 27.12 s total
EPOCH: [13], BATCH: [307/889], loss: 0.382, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.020, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 307
[ 2023-10-07 23:04:33 ] Completed saving temp checkpoint 1,112.016 ms, 28.23 s total
[ 2023-10-07 23:04:33 ] Completed replacing temp checkpoint with checkpoint 48.038 ms, 28.28 s total
[ 2023-10-07 23:04:33 ] Completed Epoch: 13 batch 308: moving batch data to device 24.594 ms, 28.31 s total
[ 2023-10-07 23:04:34 ] Completed Epoch: 13 batch 308: forward pass 310.615 ms, 28.62 s total
[ 2023-10-07 23:04:34 ] Completed Epoch: 13 batch 308: backward pass 78.591 ms, 28.70 s total
[ 2023-10-07 23:04:35 ] Completed Epoch: 13 batch 308: computing loss 1,394.855 ms, 30.09 s total
EPOCH: [13], BATCH: [308/889], loss: 0.410, loss_box_reg: 0.130, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 308
[ 2023-10-07 23:04:37 ] Completed saving temp checkpoint 1,699.131 ms, 31.79 s total
[ 2023-10-07 23:04:37 ] Completed replacing temp checkpoint with checkpoint 74.590 ms, 31.86 s total
[ 2023-10-07 23:04:37 ] Completed Epoch: 13 batch 309: moving batch data to device 21.640 ms, 31.89 s total
[ 2023-10-07 23:04:37 ] Completed Epoch: 13 batch 309: forward pass 304.554 ms, 32.19 s total
[ 2023-10-07 23:04:37 ] Completed Epoch: 13 batch 309: backward pass 74.502 ms, 32.26 s total
[ 2023-10-07 23:04:39 ] Completed Epoch: 13 batch 309: computing loss 1,677.974 ms, 33.94 s total
EPOCH: [13], BATCH: [309/889], loss: 0.386, loss_box_reg: 0.113, loss_classifier: 0.102, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 309
[ 2023-10-07 23:04:40 ] Completed saving temp checkpoint 1,324.402 ms, 35.27 s total
[ 2023-10-07 23:04:40 ] Completed replacing temp checkpoint with checkpoint 33.534 ms, 35.30 s total
[ 2023-10-07 23:04:40 ] Completed Epoch: 13 batch 310: moving batch data to device 21.241 ms, 35.32 s total
[ 2023-10-07 23:04:41 ] Completed Epoch: 13 batch 310: forward pass 400.110 ms, 35.72 s total
[ 2023-10-07 23:04:41 ] Completed Epoch: 13 batch 310: backward pass 38.415 ms, 35.76 s total
[ 2023-10-07 23:04:42 ] Completed Epoch: 13 batch 310: computing loss 1,153.256 ms, 36.91 s total
EPOCH: [13], BATCH: [310/889], loss: 0.436, loss_box_reg: 0.136, loss_classifier: 0.116, loss_mask: 0.133, loss_objectness: 0.020, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 310
[ 2023-10-07 23:04:43 ] Completed saving temp checkpoint 1,155.327 ms, 38.07 s total
[ 2023-10-07 23:04:43 ] Completed replacing temp checkpoint with checkpoint 72.886 ms, 38.14 s total
[ 2023-10-07 23:04:43 ] Completed Epoch: 13 batch 311: moving batch data to device 24.239 ms, 38.17 s total
[ 2023-10-07 23:04:44 ] Completed Epoch: 13 batch 311: forward pass 326.253 ms, 38.49 s total
[ 2023-10-07 23:04:44 ] Completed Epoch: 13 batch 311: backward pass 57.155 ms, 38.55 s total
[ 2023-10-07 23:04:45 ] Completed Epoch: 13 batch 311: computing loss 1,431.964 ms, 39.98 s total
EPOCH: [13], BATCH: [311/889], loss: 0.413, loss_box_reg: 0.126, loss_classifier: 0.109, loss_mask: 0.128, loss_objectness: 0.018, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 311
[ 2023-10-07 23:04:46 ] Completed saving temp checkpoint 1,126.873 ms, 41.11 s total
[ 2023-10-07 23:04:46 ] Completed replacing temp checkpoint with checkpoint 82.451 ms, 41.19 s total
[ 2023-10-07 23:04:46 ] Completed Epoch: 13 batch 312: moving batch data to device 24.011 ms, 41.21 s total
[ 2023-10-07 23:04:47 ] Completed Epoch: 13 batch 312: forward pass 328.376 ms, 41.54 s total
[ 2023-10-07 23:04:47 ] Completed Epoch: 13 batch 312: backward pass 74.964 ms, 41.62 s total
[ 2023-10-07 23:04:48 ] Completed Epoch: 13 batch 312: computing loss 1,542.955 ms, 43.16 s total
EPOCH: [13], BATCH: [312/889], loss: 0.407, loss_box_reg: 0.122, loss_classifier: 0.107, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 312
[ 2023-10-07 23:04:50 ] Completed saving temp checkpoint 1,723.195 ms, 44.88 s total
[ 2023-10-07 23:04:50 ] Completed replacing temp checkpoint with checkpoint 90.924 ms, 44.98 s total
[ 2023-10-07 23:04:50 ] Completed Epoch: 13 batch 313: moving batch data to device 25.602 ms, 45.00 s total
[ 2023-10-07 23:04:51 ] Completed Epoch: 13 batch 313: forward pass 424.119 ms, 45.43 s total
[ 2023-10-07 23:04:51 ] Completed Epoch: 13 batch 313: backward pass 120.875 ms, 45.55 s total
[ 2023-10-07 23:04:52 ] Completed Epoch: 13 batch 313: computing loss 1,276.906 ms, 46.82 s total
EPOCH: [13], BATCH: [313/889], loss: 0.375, loss_box_reg: 0.113, loss_classifier: 0.092, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 313
[ 2023-10-07 23:04:53 ] Completed saving temp checkpoint 1,257.744 ms, 48.08 s total
[ 2023-10-07 23:04:53 ] Completed replacing temp checkpoint with checkpoint 78.780 ms, 48.16 s total
[ 2023-10-07 23:04:53 ] Completed Epoch: 13 batch 314: moving batch data to device 25.905 ms, 48.19 s total
[ 2023-10-07 23:04:54 ] Completed Epoch: 13 batch 314: forward pass 310.520 ms, 48.50 s total
[ 2023-10-07 23:04:54 ] Completed Epoch: 13 batch 314: backward pass 97.505 ms, 48.59 s total
[ 2023-10-07 23:04:55 ] Completed Epoch: 13 batch 314: computing loss 1,381.827 ms, 49.98 s total
EPOCH: [13], BATCH: [314/889], loss: 0.359, loss_box_reg: 0.109, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 314
[ 2023-10-07 23:04:57 ] Completed saving temp checkpoint 1,761.358 ms, 51.74 s total
[ 2023-10-07 23:04:57 ] Completed replacing temp checkpoint with checkpoint 57.995 ms, 51.79 s total
[ 2023-10-07 23:04:57 ] Completed Epoch: 13 batch 315: moving batch data to device 5.549 ms, 51.80 s total
[ 2023-10-07 23:04:57 ] Completed Epoch: 13 batch 315: forward pass 445.665 ms, 52.25 s total
[ 2023-10-07 23:04:57 ] Completed Epoch: 13 batch 315: backward pass 78.487 ms, 52.32 s total
[ 2023-10-07 23:04:59 ] Completed Epoch: 13 batch 315: computing loss 1,574.620 ms, 53.90 s total
EPOCH: [13], BATCH: [315/889], loss: 0.345, loss_box_reg: 0.103, loss_classifier: 0.080, loss_mask: 0.125, loss_objectness: 0.012, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 315
[ 2023-10-07 23:05:00 ] Completed saving temp checkpoint 1,191.869 ms, 55.09 s total
[ 2023-10-07 23:05:00 ] Completed replacing temp checkpoint with checkpoint 59.433 ms, 55.15 s total
[ 2023-10-07 23:05:00 ] Completed Epoch: 13 batch 316: moving batch data to device 22.522 ms, 55.17 s total
[ 2023-10-07 23:05:01 ] Completed Epoch: 13 batch 316: forward pass 332.471 ms, 55.51 s total
[ 2023-10-07 23:05:01 ] Completed Epoch: 13 batch 316: backward pass 75.105 ms, 55.58 s total
[ 2023-10-07 23:05:02 ] Completed Epoch: 13 batch 316: computing loss 1,329.994 ms, 56.91 s total
EPOCH: [13], BATCH: [316/889], loss: 0.401, loss_box_reg: 0.121, loss_classifier: 0.097, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 316
[ 2023-10-07 23:05:03 ] Completed saving temp checkpoint 1,077.108 ms, 57.99 s total
[ 2023-10-07 23:05:03 ] Completed replacing temp checkpoint with checkpoint 50.702 ms, 58.04 s total
[ 2023-10-07 23:05:03 ] Completed Epoch: 13 batch 317: moving batch data to device 20.158 ms, 58.06 s total
[ 2023-10-07 23:05:03 ] Completed Epoch: 13 batch 317: forward pass 315.937 ms, 58.37 s total
[ 2023-10-07 23:05:04 ] Completed Epoch: 13 batch 317: backward pass 35.615 ms, 58.41 s total
[ 2023-10-07 23:05:05 ] Completed Epoch: 13 batch 317: computing loss 1,420.966 ms, 59.83 s total
EPOCH: [13], BATCH: [317/889], loss: 0.382, loss_box_reg: 0.109, loss_classifier: 0.098, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 317
[ 2023-10-07 23:05:06 ] Completed saving temp checkpoint 1,197.669 ms, 61.03 s total
[ 2023-10-07 23:05:06 ] Completed replacing temp checkpoint with checkpoint 77.506 ms, 61.11 s total
[ 2023-10-07 23:05:06 ] Completed Epoch: 13 batch 318: moving batch data to device 21.772 ms, 61.13 s total
[ 2023-10-07 23:05:07 ] Completed Epoch: 13 batch 318: forward pass 310.432 ms, 61.44 s total
[ 2023-10-07 23:05:07 ] Completed Epoch: 13 batch 318: backward pass 72.765 ms, 61.51 s total
[ 2023-10-07 23:05:08 ] Completed Epoch: 13 batch 318: computing loss 1,486.382 ms, 63.00 s total
EPOCH: [13], BATCH: [318/889], loss: 0.375, loss_box_reg: 0.108, loss_classifier: 0.093, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 318
[ 2023-10-07 23:05:10 ] Completed saving temp checkpoint 1,562.765 ms, 64.56 s total
[ 2023-10-07 23:05:10 ] Completed replacing temp checkpoint with checkpoint 41.529 ms, 64.60 s total
[ 2023-10-07 23:05:10 ] Completed Epoch: 13 batch 319: moving batch data to device 7.443 ms, 64.61 s total
[ 2023-10-07 23:05:10 ] Completed Epoch: 13 batch 319: forward pass 431.437 ms, 65.04 s total
[ 2023-10-07 23:05:10 ] Completed Epoch: 13 batch 319: backward pass 97.587 ms, 65.14 s total
[ 2023-10-07 23:05:11 ] Completed Epoch: 13 batch 319: computing loss 1,235.868 ms, 66.37 s total
EPOCH: [13], BATCH: [319/889], loss: 0.341, loss_box_reg: 0.104, loss_classifier: 0.084, loss_mask: 0.125, loss_objectness: 0.012, loss_rpn_box_reg: 0.018
Saving checkpoint at epoch 13 train batch 319
[ 2023-10-07 23:05:13 ] Completed saving temp checkpoint 1,263.550 ms, 67.64 s total
[ 2023-10-07 23:05:13 ] Completed replacing temp checkpoint with checkpoint 66.271 ms, 67.70 s total
[ 2023-10-07 23:05:13 ] Completed Epoch: 13 batch 320: moving batch data to device 22.925 ms, 67.73 s total
[ 2023-10-07 23:05:13 ] Completed Epoch: 13 batch 320: forward pass 312.307 ms, 68.04 s total
[ 2023-10-07 23:05:13 ] Completed Epoch: 13 batch 320: backward pass 52.473 ms, 68.09 s total
[ 2023-10-07 23:05:15 ] Completed Epoch: 13 batch 320: computing loss 1,307.788 ms, 69.40 s total
EPOCH: [13], BATCH: [320/889], loss: 0.378, loss_box_reg: 0.109, loss_classifier: 0.094, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 320
[ 2023-10-07 23:05:16 ] Completed saving temp checkpoint 1,235.148 ms, 70.63 s total
[ 2023-10-07 23:05:16 ] Completed replacing temp checkpoint with checkpoint 81.179 ms, 70.72 s total
[ 2023-10-07 23:05:16 ] Completed Epoch: 13 batch 321: moving batch data to device 21.750 ms, 70.74 s total
[ 2023-10-07 23:05:16 ] Completed Epoch: 13 batch 321: forward pass 322.865 ms, 71.06 s total
[ 2023-10-07 23:05:16 ] Completed Epoch: 13 batch 321: backward pass 68.438 ms, 71.13 s total
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 23:18:15 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 23:18:15 ] Completed importing Timer 0.022 ms, 0.00 s total
[ 2023-10-07 23:18:16 ] Completed importing everything else 641.526 ms, 0.64 s total
[ 2023-10-07 23:18:16 ] Completed defined other functions 0.025 ms, 0.64 s total
| distributed init (rank 0): env://
| distributed init (rank 4): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 23:18:19 ] Completed main preliminaries 2,914.096 ms, 3.56 s total
loading annotations into memory...
Done (t=12.76s)
creating index...
index created!
loading annotations into memory...
Done (t=0.30s)
creating index...
index created!
[ 2023-10-07 23:18:34 ] Completed loading data 14,805.068 ms, 18.36 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 23:18:34 ] Completed creating data samplers 110.107 ms, 18.47 s total
[ 2023-10-07 23:18:34 ] Completed creating data loaders 0.219 ms, 18.47 s total
[ 2023-10-07 23:18:35 ] Completed creating model and .to(device) 1,675.373 ms, 20.15 s total
[ 2023-10-07 23:18:36 ] Completed preparing model for distributed training 564.833 ms, 20.71 s total
[ 2023-10-07 23:18:36 ] Completed optimizer and scaler 0.585 ms, 20.71 s total
[ 2023-10-07 23:18:36 ] Completed learning rate schedulers 0.132 ms, 20.71 s total
[ 2023-10-07 23:18:37 ] Completed init coco evaluator 1,013.679 ms, 21.73 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 23:18:38 ] Completed retrieving checkpoint 824.760 ms, 22.55 s total
EPOCH :: 13
[ 2023-10-07 23:18:38 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 23:18:38 ] Completed training preliminaries 0.870 ms, 0.00 s total
Training / resuming epoch 13 from training step 321
[ 2023-10-07 23:18:38 ] Completed Epoch: 13 batch 321: moving batch data to device 278.177 ms, 0.28 s total
[ 2023-10-07 23:18:44 ] Completed Epoch: 13 batch 321: forward pass 5,785.119 ms, 6.06 s total
[ 2023-10-07 23:18:44 ] Completed Epoch: 13 batch 321: backward pass 404.636 ms, 6.47 s total
[ 2023-10-07 23:18:45 ] Completed Epoch: 13 batch 321: computing loss 740.755 ms, 7.21 s total
EPOCH: [13], BATCH: [321/889], loss: 0.409, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 321
[ 2023-10-07 23:18:46 ] Completed saving temp checkpoint 1,503.950 ms, 8.71 s total
[ 2023-10-07 23:18:47 ] Completed replacing temp checkpoint with checkpoint 157.773 ms, 8.87 s total
[ 2023-10-07 23:18:47 ] Completed Epoch: 13 batch 322: moving batch data to device 19.801 ms, 8.89 s total
[ 2023-10-07 23:18:47 ] Completed Epoch: 13 batch 322: forward pass 319.160 ms, 9.21 s total
[ 2023-10-07 23:18:47 ] Completed Epoch: 13 batch 322: backward pass 403.427 ms, 9.61 s total
[ 2023-10-07 23:18:48 ] Completed Epoch: 13 batch 322: computing loss 1,027.970 ms, 10.64 s total
EPOCH: [13], BATCH: [322/889], loss: 0.375, loss_box_reg: 0.109, loss_classifier: 0.089, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 322
[ 2023-10-07 23:18:50 ] Completed saving temp checkpoint 1,332.080 ms, 11.97 s total
[ 2023-10-07 23:18:50 ] Completed replacing temp checkpoint with checkpoint 75.118 ms, 12.05 s total
[ 2023-10-07 23:18:50 ] Completed Epoch: 13 batch 323: moving batch data to device 20.907 ms, 12.07 s total
[ 2023-10-07 23:18:50 ] Completed Epoch: 13 batch 323: forward pass 316.478 ms, 12.39 s total
[ 2023-10-07 23:18:50 ] Completed Epoch: 13 batch 323: backward pass 68.078 ms, 12.45 s total
[ 2023-10-07 23:18:52 ] Completed Epoch: 13 batch 323: computing loss 1,531.039 ms, 13.99 s total
EPOCH: [13], BATCH: [323/889], loss: 0.413, loss_box_reg: 0.118, loss_classifier: 0.111, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 323
[ 2023-10-07 23:18:53 ] Completed saving temp checkpoint 1,413.841 ms, 15.40 s total
[ 2023-10-07 23:18:53 ] Completed replacing temp checkpoint with checkpoint 88.066 ms, 15.49 s total
[ 2023-10-07 23:18:53 ] Completed Epoch: 13 batch 324: moving batch data to device 29.615 ms, 15.52 s total
[ 2023-10-07 23:18:54 ] Completed Epoch: 13 batch 324: forward pass 317.897 ms, 15.83 s total
[ 2023-10-07 23:18:54 ] Completed Epoch: 13 batch 324: backward pass 396.467 ms, 16.23 s total
[ 2023-10-07 23:18:55 ] Completed Epoch: 13 batch 324: computing loss 1,177.481 ms, 17.41 s total
EPOCH: [13], BATCH: [324/889], loss: 0.387, loss_box_reg: 0.113, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 324
[ 2023-10-07 23:18:56 ] Completed saving temp checkpoint 1,129.828 ms, 18.54 s total
[ 2023-10-07 23:18:56 ] Completed replacing temp checkpoint with checkpoint 57.139 ms, 18.60 s total
[ 2023-10-07 23:18:56 ] Completed Epoch: 13 batch 325: moving batch data to device 21.841 ms, 18.62 s total
[ 2023-10-07 23:18:57 ] Completed Epoch: 13 batch 325: forward pass 326.708 ms, 18.94 s total
[ 2023-10-07 23:18:57 ] Completed Epoch: 13 batch 325: backward pass 91.485 ms, 19.04 s total
[ 2023-10-07 23:18:58 ] Completed Epoch: 13 batch 325: computing loss 1,578.181 ms, 20.61 s total
EPOCH: [13], BATCH: [325/889], loss: 0.382, loss_box_reg: 0.116, loss_classifier: 0.095, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 325
[ 2023-10-07 23:19:00 ] Completed saving temp checkpoint 1,204.481 ms, 21.82 s total
[ 2023-10-07 23:19:00 ] Completed replacing temp checkpoint with checkpoint 50.852 ms, 21.87 s total
[ 2023-10-07 23:19:00 ] Completed Epoch: 13 batch 326: moving batch data to device 20.596 ms, 21.89 s total
[ 2023-10-07 23:19:00 ] Completed Epoch: 13 batch 326: forward pass 334.277 ms, 22.22 s total
[ 2023-10-07 23:19:00 ] Completed Epoch: 13 batch 326: backward pass 71.318 ms, 22.30 s total
[ 2023-10-07 23:19:01 ] Completed Epoch: 13 batch 326: computing loss 1,330.727 ms, 23.63 s total
EPOCH: [13], BATCH: [326/889], loss: 0.365, loss_box_reg: 0.109, loss_classifier: 0.091, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 326
[ 2023-10-07 23:19:03 ] Completed saving temp checkpoint 1,310.637 ms, 24.94 s total
[ 2023-10-07 23:19:03 ] Completed replacing temp checkpoint with checkpoint 76.460 ms, 25.01 s total
[ 2023-10-07 23:19:03 ] Completed Epoch: 13 batch 327: moving batch data to device 19.581 ms, 25.03 s total
[ 2023-10-07 23:19:03 ] Completed Epoch: 13 batch 327: forward pass 322.169 ms, 25.35 s total
[ 2023-10-07 23:19:03 ] Completed Epoch: 13 batch 327: backward pass 69.885 ms, 25.42 s total
[ 2023-10-07 23:19:05 ] Completed Epoch: 13 batch 327: computing loss 1,410.949 ms, 26.84 s total
EPOCH: [13], BATCH: [327/889], loss: 0.338, loss_box_reg: 0.105, loss_classifier: 0.083, loss_mask: 0.125, loss_objectness: 0.012, loss_rpn_box_reg: 0.014
Saving checkpoint at epoch 13 train batch 327
[ 2023-10-07 23:19:06 ] Completed saving temp checkpoint 1,484.391 ms, 28.32 s total
[ 2023-10-07 23:19:06 ] Completed replacing temp checkpoint with checkpoint 41.104 ms, 28.36 s total
[ 2023-10-07 23:19:06 ] Completed Epoch: 13 batch 328: moving batch data to device 5.811 ms, 28.37 s total
[ 2023-10-07 23:19:07 ] Completed Epoch: 13 batch 328: forward pass 448.668 ms, 28.82 s total
[ 2023-10-07 23:19:07 ] Completed Epoch: 13 batch 328: backward pass 84.196 ms, 28.90 s total
[ 2023-10-07 23:19:07 ] Completed Epoch: 13 batch 328: computing loss 869.091 ms, 29.77 s total
EPOCH: [13], BATCH: [328/889], loss: 0.379, loss_box_reg: 0.120, loss_classifier: 0.095, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 328
[ 2023-10-07 23:19:09 ] Completed saving temp checkpoint 1,176.187 ms, 30.95 s total
[ 2023-10-07 23:19:09 ] Completed replacing temp checkpoint with checkpoint 72.518 ms, 31.02 s total
[ 2023-10-07 23:19:09 ] Completed Epoch: 13 batch 329: moving batch data to device 20.823 ms, 31.04 s total
[ 2023-10-07 23:19:09 ] Completed Epoch: 13 batch 329: forward pass 256.583 ms, 31.30 s total
[ 2023-10-07 23:19:09 ] Completed Epoch: 13 batch 329: backward pass 44.435 ms, 31.34 s total
[ 2023-10-07 23:19:09 ] Completed Epoch: 13 batch 329: computing loss 288.484 ms, 31.63 s total
EPOCH: [13], BATCH: [329/889], loss: 0.370, loss_box_reg: 0.110, loss_classifier: 0.099, loss_mask: 0.122, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 329
[ 2023-10-07 23:19:12 ] Completed saving temp checkpoint 2,651.412 ms, 34.28 s total
[ 2023-10-07 23:19:12 ] Completed replacing temp checkpoint with checkpoint 52.028 ms, 34.33 s total
[ 2023-10-07 23:19:12 ] Completed Epoch: 13 batch 330: moving batch data to device 5.928 ms, 34.34 s total
[ 2023-10-07 23:19:12 ] Completed Epoch: 13 batch 330: forward pass 434.143 ms, 34.77 s total
[ 2023-10-07 23:19:13 ] Completed Epoch: 13 batch 330: backward pass 48.449 ms, 34.82 s total
[ 2023-10-07 23:19:13 ] Completed Epoch: 13 batch 330: computing loss 421.426 ms, 35.24 s total
EPOCH: [13], BATCH: [330/889], loss: 0.384, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 330
[ 2023-10-07 23:19:14 ] Completed saving temp checkpoint 1,347.189 ms, 36.59 s total
[ 2023-10-07 23:19:15 ] Completed replacing temp checkpoint with checkpoint 476.493 ms, 37.07 s total
[ 2023-10-07 23:19:15 ] Completed Epoch: 13 batch 331: moving batch data to device 7.056 ms, 37.07 s total
[ 2023-10-07 23:19:15 ] Completed Epoch: 13 batch 331: forward pass 182.579 ms, 37.25 s total
[ 2023-10-07 23:19:15 ] Completed Epoch: 13 batch 331: backward pass 85.177 ms, 37.34 s total
[ 2023-10-07 23:19:15 ] Completed Epoch: 13 batch 331: computing loss 404.887 ms, 37.74 s total
EPOCH: [13], BATCH: [331/889], loss: 0.408, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 331
[ 2023-10-07 23:19:17 ] Completed saving temp checkpoint 1,932.763 ms, 39.68 s total
[ 2023-10-07 23:19:18 ] Completed replacing temp checkpoint with checkpoint 610.041 ms, 40.29 s total
[ 2023-10-07 23:19:18 ] Completed Epoch: 13 batch 332: moving batch data to device 4.958 ms, 40.29 s total
[ 2023-10-07 23:19:18 ] Completed Epoch: 13 batch 332: forward pass 163.774 ms, 40.46 s total
[ 2023-10-07 23:19:18 ] Completed Epoch: 13 batch 332: backward pass 37.137 ms, 40.49 s total
[ 2023-10-07 23:19:19 ] Completed Epoch: 13 batch 332: computing loss 495.396 ms, 40.99 s total
EPOCH: [13], BATCH: [332/889], loss: 0.387, loss_box_reg: 0.115, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 332
[ 2023-10-07 23:19:21 ] Completed saving temp checkpoint 2,056.961 ms, 43.05 s total
[ 2023-10-07 23:19:21 ] Completed replacing temp checkpoint with checkpoint 61.231 ms, 43.11 s total
[ 2023-10-07 23:19:21 ] Completed Epoch: 13 batch 333: moving batch data to device 8.198 ms, 43.12 s total
[ 2023-10-07 23:19:21 ] Completed Epoch: 13 batch 333: forward pass 164.047 ms, 43.28 s total
[ 2023-10-07 23:19:21 ] Completed Epoch: 13 batch 333: backward pass 153.606 ms, 43.43 s total
[ 2023-10-07 23:19:22 ] Completed Epoch: 13 batch 333: computing loss 440.974 ms, 43.87 s total
EPOCH: [13], BATCH: [333/889], loss: 0.413, loss_box_reg: 0.127, loss_classifier: 0.104, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 333
[ 2023-10-07 23:19:24 ] Completed saving temp checkpoint 2,351.858 ms, 46.23 s total
[ 2023-10-07 23:19:24 ] Completed replacing temp checkpoint with checkpoint 89.276 ms, 46.32 s total
[ 2023-10-07 23:19:24 ] Completed Epoch: 13 batch 334: moving batch data to device 7.956 ms, 46.32 s total
[ 2023-10-07 23:19:24 ] Completed Epoch: 13 batch 334: forward pass 175.482 ms, 46.50 s total
[ 2023-10-07 23:19:24 ] Completed Epoch: 13 batch 334: backward pass 79.918 ms, 46.58 s total
[ 2023-10-07 23:19:25 ] Completed Epoch: 13 batch 334: computing loss 357.521 ms, 46.94 s total
EPOCH: [13], BATCH: [334/889], loss: 0.407, loss_box_reg: 0.127, loss_classifier: 0.104, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 334
[ 2023-10-07 23:19:27 ] Completed saving temp checkpoint 1,894.754 ms, 48.83 s total
[ 2023-10-07 23:19:28 ] Completed replacing temp checkpoint with checkpoint 982.677 ms, 49.81 s total
[ 2023-10-07 23:19:28 ] Completed Epoch: 13 batch 335: moving batch data to device 5.759 ms, 49.82 s total
[ 2023-10-07 23:19:28 ] Completed Epoch: 13 batch 335: forward pass 159.817 ms, 49.98 s total
[ 2023-10-07 23:19:28 ] Completed Epoch: 13 batch 335: backward pass 61.291 ms, 50.04 s total
[ 2023-10-07 23:19:28 ] Completed Epoch: 13 batch 335: computing loss 349.832 ms, 50.39 s total
EPOCH: [13], BATCH: [335/889], loss: 0.350, loss_box_reg: 0.104, loss_classifier: 0.087, loss_mask: 0.123, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 335
[ 2023-10-07 23:19:30 ] Completed saving temp checkpoint 1,816.666 ms, 52.21 s total
[ 2023-10-07 23:19:30 ] Completed replacing temp checkpoint with checkpoint 55.035 ms, 52.26 s total
[ 2023-10-07 23:19:30 ] Completed Epoch: 13 batch 336: moving batch data to device 8.858 ms, 52.27 s total
[ 2023-10-07 23:19:30 ] Completed Epoch: 13 batch 336: forward pass 157.371 ms, 52.43 s total
[ 2023-10-07 23:19:30 ] Completed Epoch: 13 batch 336: backward pass 154.859 ms, 52.58 s total
[ 2023-10-07 23:19:31 ] Completed Epoch: 13 batch 336: computing loss 331.870 ms, 52.91 s total
EPOCH: [13], BATCH: [336/889], loss: 0.358, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 336
[ 2023-10-07 23:19:32 ] Completed saving temp checkpoint 1,762.175 ms, 54.68 s total
[ 2023-10-07 23:19:33 ] Completed replacing temp checkpoint with checkpoint 620.930 ms, 55.30 s total
[ 2023-10-07 23:19:33 ] Completed Epoch: 13 batch 337: moving batch data to device 6.934 ms, 55.30 s total
[ 2023-10-07 23:19:33 ] Completed Epoch: 13 batch 337: forward pass 165.216 ms, 55.47 s total
[ 2023-10-07 23:19:33 ] Completed Epoch: 13 batch 337: backward pass 73.175 ms, 55.54 s total
[ 2023-10-07 23:19:34 ] Completed Epoch: 13 batch 337: computing loss 373.905 ms, 55.92 s total
EPOCH: [13], BATCH: [337/889], loss: 0.360, loss_box_reg: 0.106, loss_classifier: 0.087, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 337
[ 2023-10-07 23:19:35 ] Completed saving temp checkpoint 1,486.968 ms, 57.40 s total
[ 2023-10-07 23:19:35 ] Completed replacing temp checkpoint with checkpoint 96.844 ms, 57.50 s total
[ 2023-10-07 23:19:35 ] Completed Epoch: 13 batch 338: moving batch data to device 5.998 ms, 57.51 s total
[ 2023-10-07 23:19:35 ] Completed Epoch: 13 batch 338: forward pass 164.582 ms, 57.67 s total
[ 2023-10-07 23:19:35 ] Completed Epoch: 13 batch 338: backward pass 64.191 ms, 57.74 s total
[ 2023-10-07 23:19:36 ] Completed Epoch: 13 batch 338: computing loss 375.411 ms, 58.11 s total
EPOCH: [13], BATCH: [338/889], loss: 0.396, loss_box_reg: 0.118, loss_classifier: 0.103, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 338
[ 2023-10-07 23:19:38 ] Completed saving temp checkpoint 1,887.530 ms, 60.00 s total
[ 2023-10-07 23:19:38 ] Completed replacing temp checkpoint with checkpoint 92.703 ms, 60.09 s total
[ 2023-10-07 23:19:38 ] Completed Epoch: 13 batch 339: moving batch data to device 5.161 ms, 60.10 s total
[ 2023-10-07 23:19:38 ] Completed Epoch: 13 batch 339: forward pass 163.000 ms, 60.26 s total
[ 2023-10-07 23:19:38 ] Completed Epoch: 13 batch 339: backward pass 64.061 ms, 60.32 s total
[ 2023-10-07 23:19:39 ] Completed Epoch: 13 batch 339: computing loss 479.046 ms, 60.80 s total
EPOCH: [13], BATCH: [339/889], loss: 0.341, loss_box_reg: 0.100, loss_classifier: 0.085, loss_mask: 0.122, loss_objectness: 0.012, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 339
[ 2023-10-07 23:19:41 ] Completed saving temp checkpoint 2,113.673 ms, 62.92 s total
[ 2023-10-07 23:19:42 ] Completed replacing temp checkpoint with checkpoint 920.496 ms, 63.84 s total
[ 2023-10-07 23:19:42 ] Completed Epoch: 13 batch 340: moving batch data to device 8.945 ms, 63.85 s total
[ 2023-10-07 23:19:42 ] Completed Epoch: 13 batch 340: forward pass 156.035 ms, 64.00 s total
[ 2023-10-07 23:19:42 ] Completed Epoch: 13 batch 340: backward pass 83.486 ms, 64.09 s total
[ 2023-10-07 23:19:42 ] Completed Epoch: 13 batch 340: computing loss 330.806 ms, 64.42 s total
EPOCH: [13], BATCH: [340/889], loss: 0.382, loss_box_reg: 0.114, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 340
[ 2023-10-07 23:19:45 ] Completed saving temp checkpoint 2,637.513 ms, 67.05 s total
[ 2023-10-07 23:19:45 ] Completed replacing temp checkpoint with checkpoint 66.049 ms, 67.12 s total
[ 2023-10-07 23:19:45 ] Completed Epoch: 13 batch 341: moving batch data to device 8.325 ms, 67.13 s total
[ 2023-10-07 23:19:45 ] Completed Epoch: 13 batch 341: forward pass 166.794 ms, 67.29 s total
[ 2023-10-07 23:19:45 ] Completed Epoch: 13 batch 341: backward pass 76.817 ms, 67.37 s total
[ 2023-10-07 23:19:46 ] Completed Epoch: 13 batch 341: computing loss 511.830 ms, 67.88 s total
EPOCH: [13], BATCH: [341/889], loss: 0.402, loss_box_reg: 0.123, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 341
[ 2023-10-07 23:19:47 ] Completed saving temp checkpoint 1,557.509 ms, 69.44 s total
[ 2023-10-07 23:19:48 ] Completed replacing temp checkpoint with checkpoint 491.105 ms, 69.93 s total
[ 2023-10-07 23:19:48 ] Completed Epoch: 13 batch 342: moving batch data to device 8.116 ms, 69.94 s total
[ 2023-10-07 23:19:48 ] Completed Epoch: 13 batch 342: forward pass 156.847 ms, 70.10 s total
[ 2023-10-07 23:19:48 ] Completed Epoch: 13 batch 342: backward pass 74.633 ms, 70.17 s total
[ 2023-10-07 23:19:48 ] Completed Epoch: 13 batch 342: computing loss 495.586 ms, 70.67 s total
EPOCH: [13], BATCH: [342/889], loss: 0.380, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 342
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 23:32:47 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 23:32:47 ] Completed importing Timer 0.031 ms, 0.00 s total
[ 2023-10-07 23:32:48 ] Completed importing everything else 494.857 ms, 0.49 s total
[ 2023-10-07 23:32:48 ] Completed defined other functions 0.023 ms, 0.49 s total
| distributed init (rank 0): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
| distributed init (rank 4): env://
| distributed init (rank 3): env://
| distributed init (rank 5): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 23:32:51 ] Completed main preliminaries 2,950.913 ms, 3.45 s total
loading annotations into memory...
Done (t=12.48s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-07 23:33:05 ] Completed loading data 14,440.212 ms, 17.89 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 23:33:05 ] Completed creating data samplers 98.215 ms, 17.98 s total
[ 2023-10-07 23:33:05 ] Completed creating data loaders 0.206 ms, 17.98 s total
[ 2023-10-07 23:33:06 ] Completed creating model and .to(device) 734.790 ms, 18.72 s total
[ 2023-10-07 23:33:08 ] Completed preparing model for distributed training 1,503.595 ms, 20.22 s total
[ 2023-10-07 23:33:08 ] Completed optimizer and scaler 0.568 ms, 20.22 s total
[ 2023-10-07 23:33:08 ] Completed learning rate schedulers 0.249 ms, 20.22 s total
[ 2023-10-07 23:33:09 ] Completed init coco evaluator 976.995 ms, 21.20 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 23:33:10 ] Completed retrieving checkpoint 915.699 ms, 22.12 s total
EPOCH :: 13
[ 2023-10-07 23:33:10 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 23:33:10 ] Completed training preliminaries 0.894 ms, 0.00 s total
Training / resuming epoch 13 from training step 342
[ 2023-10-07 23:33:10 ] Completed Epoch: 13 batch 342: moving batch data to device 302.752 ms, 0.30 s total
[ 2023-10-07 23:33:15 ] Completed Epoch: 13 batch 342: forward pass 5,517.893 ms, 5.82 s total
[ 2023-10-07 23:33:16 ] Completed Epoch: 13 batch 342: backward pass 262.761 ms, 6.08 s total
[ 2023-10-07 23:33:17 ] Completed Epoch: 13 batch 342: computing loss 991.985 ms, 7.08 s total
EPOCH: [13], BATCH: [342/889], loss: 0.383, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 342
[ 2023-10-07 23:33:18 ] Completed saving temp checkpoint 1,198.801 ms, 8.28 s total
[ 2023-10-07 23:33:18 ] Completed replacing temp checkpoint with checkpoint 177.211 ms, 8.45 s total
[ 2023-10-07 23:33:18 ] Completed Epoch: 13 batch 343: moving batch data to device 20.730 ms, 8.47 s total
[ 2023-10-07 23:33:18 ] Completed Epoch: 13 batch 343: forward pass 319.295 ms, 8.79 s total
[ 2023-10-07 23:33:19 ] Completed Epoch: 13 batch 343: backward pass 414.537 ms, 9.21 s total
[ 2023-10-07 23:33:20 ] Completed Epoch: 13 batch 343: computing loss 943.688 ms, 10.15 s total
EPOCH: [13], BATCH: [343/889], loss: 0.361, loss_box_reg: 0.103, loss_classifier: 0.093, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 343
[ 2023-10-07 23:33:21 ] Completed saving temp checkpoint 1,683.340 ms, 11.83 s total
[ 2023-10-07 23:33:21 ] Completed replacing temp checkpoint with checkpoint 77.303 ms, 11.91 s total
[ 2023-10-07 23:33:21 ] Completed Epoch: 13 batch 344: moving batch data to device 20.894 ms, 11.93 s total
[ 2023-10-07 23:33:22 ] Completed Epoch: 13 batch 344: forward pass 312.740 ms, 12.24 s total
[ 2023-10-07 23:33:22 ] Completed Epoch: 13 batch 344: backward pass 83.460 ms, 12.33 s total
[ 2023-10-07 23:33:23 ] Completed Epoch: 13 batch 344: computing loss 1,361.220 ms, 13.69 s total
EPOCH: [13], BATCH: [344/889], loss: 0.365, loss_box_reg: 0.107, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 344
[ 2023-10-07 23:33:25 ] Completed saving temp checkpoint 1,322.759 ms, 15.01 s total
[ 2023-10-07 23:33:25 ] Completed replacing temp checkpoint with checkpoint 70.827 ms, 15.08 s total
[ 2023-10-07 23:33:25 ] Completed Epoch: 13 batch 345: moving batch data to device 19.250 ms, 15.10 s total
[ 2023-10-07 23:33:25 ] Completed Epoch: 13 batch 345: forward pass 321.074 ms, 15.42 s total
[ 2023-10-07 23:33:25 ] Completed Epoch: 13 batch 345: backward pass 67.750 ms, 15.49 s total
[ 2023-10-07 23:33:27 ] Completed Epoch: 13 batch 345: computing loss 2,071.751 ms, 17.56 s total
EPOCH: [13], BATCH: [345/889], loss: 0.367, loss_box_reg: 0.105, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 345
[ 2023-10-07 23:33:28 ] Completed saving temp checkpoint 1,114.762 ms, 18.68 s total
[ 2023-10-07 23:33:28 ] Completed replacing temp checkpoint with checkpoint 61.837 ms, 18.74 s total
[ 2023-10-07 23:33:28 ] Completed Epoch: 13 batch 346: moving batch data to device 26.316 ms, 18.77 s total
[ 2023-10-07 23:33:29 ] Completed Epoch: 13 batch 346: forward pass 321.675 ms, 19.09 s total
[ 2023-10-07 23:33:29 ] Completed Epoch: 13 batch 346: backward pass 81.498 ms, 19.17 s total
[ 2023-10-07 23:33:31 ] Completed Epoch: 13 batch 346: computing loss 1,852.726 ms, 21.02 s total
EPOCH: [13], BATCH: [346/889], loss: 0.413, loss_box_reg: 0.123, loss_classifier: 0.109, loss_mask: 0.133, loss_objectness: 0.019, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 346
[ 2023-10-07 23:33:32 ] Completed saving temp checkpoint 1,634.594 ms, 22.66 s total
[ 2023-10-07 23:33:32 ] Completed replacing temp checkpoint with checkpoint 45.039 ms, 22.70 s total
[ 2023-10-07 23:33:32 ] Completed Epoch: 13 batch 347: moving batch data to device 18.529 ms, 22.72 s total
[ 2023-10-07 23:33:33 ] Completed Epoch: 13 batch 347: forward pass 305.981 ms, 23.03 s total
[ 2023-10-07 23:33:33 ] Completed Epoch: 13 batch 347: backward pass 70.307 ms, 23.10 s total
[ 2023-10-07 23:33:34 ] Completed Epoch: 13 batch 347: computing loss 1,507.570 ms, 24.60 s total
EPOCH: [13], BATCH: [347/889], loss: 0.413, loss_box_reg: 0.129, loss_classifier: 0.109, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 347
[ 2023-10-07 23:33:35 ] Completed saving temp checkpoint 1,226.077 ms, 25.83 s total
[ 2023-10-07 23:33:35 ] Completed replacing temp checkpoint with checkpoint 55.635 ms, 25.89 s total
[ 2023-10-07 23:33:35 ] Completed Epoch: 13 batch 348: moving batch data to device 18.613 ms, 25.90 s total
[ 2023-10-07 23:33:36 ] Completed Epoch: 13 batch 348: forward pass 383.111 ms, 26.29 s total
[ 2023-10-07 23:33:36 ] Completed Epoch: 13 batch 348: backward pass 55.111 ms, 26.34 s total
[ 2023-10-07 23:33:37 ] Completed Epoch: 13 batch 348: computing loss 1,142.869 ms, 27.49 s total
EPOCH: [13], BATCH: [348/889], loss: 0.388, loss_box_reg: 0.117, loss_classifier: 0.101, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 348
[ 2023-10-07 23:33:38 ] Completed saving temp checkpoint 1,174.962 ms, 28.66 s total
[ 2023-10-07 23:33:38 ] Completed replacing temp checkpoint with checkpoint 47.639 ms, 28.71 s total
[ 2023-10-07 23:33:38 ] Completed Epoch: 13 batch 349: moving batch data to device 20.743 ms, 28.73 s total
[ 2023-10-07 23:33:39 ] Completed Epoch: 13 batch 349: forward pass 306.627 ms, 29.04 s total
[ 2023-10-07 23:33:39 ] Completed Epoch: 13 batch 349: backward pass 71.743 ms, 29.11 s total
[ 2023-10-07 23:33:40 ] Completed Epoch: 13 batch 349: computing loss 1,246.800 ms, 30.35 s total
EPOCH: [13], BATCH: [349/889], loss: 0.401, loss_box_reg: 0.125, loss_classifier: 0.102, loss_mask: 0.135, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 349
[ 2023-10-07 23:33:41 ] Completed saving temp checkpoint 959.687 ms, 31.31 s total
[ 2023-10-07 23:33:41 ] Completed replacing temp checkpoint with checkpoint 55.856 ms, 31.37 s total
[ 2023-10-07 23:33:41 ] Completed Epoch: 13 batch 350: moving batch data to device 20.964 ms, 31.39 s total
[ 2023-10-07 23:33:41 ] Completed Epoch: 13 batch 350: forward pass 328.210 ms, 31.72 s total
[ 2023-10-07 23:33:41 ] Completed Epoch: 13 batch 350: backward pass 39.729 ms, 31.76 s total
[ 2023-10-07 23:33:43 ] Completed Epoch: 13 batch 350: computing loss 1,392.202 ms, 33.15 s total
EPOCH: [13], BATCH: [350/889], loss: 0.387, loss_box_reg: 0.121, loss_classifier: 0.095, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 350
[ 2023-10-07 23:33:44 ] Completed saving temp checkpoint 1,023.027 ms, 34.17 s total
[ 2023-10-07 23:33:44 ] Completed replacing temp checkpoint with checkpoint 56.690 ms, 34.23 s total
[ 2023-10-07 23:33:44 ] Completed Epoch: 13 batch 351: moving batch data to device 21.409 ms, 34.25 s total
[ 2023-10-07 23:33:44 ] Completed Epoch: 13 batch 351: forward pass 325.704 ms, 34.58 s total
[ 2023-10-07 23:33:44 ] Completed Epoch: 13 batch 351: backward pass 46.213 ms, 34.62 s total
[ 2023-10-07 23:33:46 ] Completed Epoch: 13 batch 351: computing loss 1,642.433 ms, 36.27 s total
EPOCH: [13], BATCH: [351/889], loss: 0.444, loss_box_reg: 0.135, loss_classifier: 0.116, loss_mask: 0.142, loss_objectness: 0.017, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 351
[ 2023-10-07 23:33:47 ] Completed saving temp checkpoint 1,061.084 ms, 37.33 s total
[ 2023-10-07 23:33:47 ] Completed replacing temp checkpoint with checkpoint 72.389 ms, 37.40 s total
[ 2023-10-07 23:33:47 ] Completed Epoch: 13 batch 352: moving batch data to device 23.281 ms, 37.42 s total
[ 2023-10-07 23:33:47 ] Completed Epoch: 13 batch 352: forward pass 329.441 ms, 37.75 s total
[ 2023-10-07 23:33:47 ] Completed Epoch: 13 batch 352: backward pass 113.344 ms, 37.87 s total
[ 2023-10-07 23:33:49 ] Completed Epoch: 13 batch 352: computing loss 1,419.980 ms, 39.29 s total
EPOCH: [13], BATCH: [352/889], loss: 0.400, loss_box_reg: 0.121, loss_classifier: 0.103, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 352
[ 2023-10-07 23:33:50 ] Completed saving temp checkpoint 1,275.732 ms, 40.56 s total
[ 2023-10-07 23:33:50 ] Completed replacing temp checkpoint with checkpoint 47.744 ms, 40.61 s total
[ 2023-10-07 23:33:50 ] Completed Epoch: 13 batch 353: moving batch data to device 23.691 ms, 40.63 s total
[ 2023-10-07 23:33:50 ] Completed Epoch: 13 batch 353: forward pass 330.090 ms, 40.96 s total
[ 2023-10-07 23:33:51 ] Completed Epoch: 13 batch 353: backward pass 75.218 ms, 41.04 s total
[ 2023-10-07 23:33:52 ] Completed Epoch: 13 batch 353: computing loss 1,348.106 ms, 42.39 s total
EPOCH: [13], BATCH: [353/889], loss: 0.373, loss_box_reg: 0.112, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 353
[ 2023-10-07 23:33:54 ] Completed saving temp checkpoint 1,920.569 ms, 44.31 s total
[ 2023-10-07 23:33:54 ] Completed replacing temp checkpoint with checkpoint 79.363 ms, 44.39 s total
[ 2023-10-07 23:33:54 ] Completed Epoch: 13 batch 354: moving batch data to device 22.917 ms, 44.41 s total
[ 2023-10-07 23:33:54 ] Completed Epoch: 13 batch 354: forward pass 364.436 ms, 44.77 s total
[ 2023-10-07 23:33:54 ] Completed Epoch: 13 batch 354: backward pass 89.313 ms, 44.86 s total
[ 2023-10-07 23:33:56 ] Completed Epoch: 13 batch 354: computing loss 1,315.139 ms, 46.18 s total
EPOCH: [13], BATCH: [354/889], loss: 0.402, loss_box_reg: 0.117, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.020, loss_rpn_box_reg: 0.036
Saving checkpoint at epoch 13 train batch 354
[ 2023-10-07 23:33:57 ] Completed saving temp checkpoint 1,360.318 ms, 47.54 s total
[ 2023-10-07 23:33:57 ] Completed replacing temp checkpoint with checkpoint 38.209 ms, 47.58 s total
[ 2023-10-07 23:33:57 ] Completed Epoch: 13 batch 355: moving batch data to device 21.975 ms, 47.60 s total
[ 2023-10-07 23:33:58 ] Completed Epoch: 13 batch 355: forward pass 384.107 ms, 47.98 s total
[ 2023-10-07 23:33:58 ] Completed Epoch: 13 batch 355: backward pass 64.964 ms, 48.05 s total
[ 2023-10-07 23:34:00 ] Completed Epoch: 13 batch 355: computing loss 2,054.364 ms, 50.10 s total
EPOCH: [13], BATCH: [355/889], loss: 0.385, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 355
[ 2023-10-07 23:34:01 ] Completed saving temp checkpoint 1,339.994 ms, 51.44 s total
[ 2023-10-07 23:34:01 ] Completed replacing temp checkpoint with checkpoint 68.263 ms, 51.51 s total
[ 2023-10-07 23:34:01 ] Completed Epoch: 13 batch 356: moving batch data to device 24.354 ms, 51.53 s total
[ 2023-10-07 23:34:01 ] Completed Epoch: 13 batch 356: forward pass 404.173 ms, 51.94 s total
[ 2023-10-07 23:34:02 ] Completed Epoch: 13 batch 356: backward pass 74.507 ms, 52.01 s total
[ 2023-10-07 23:34:03 ] Completed Epoch: 13 batch 356: computing loss 1,342.839 ms, 53.36 s total
EPOCH: [13], BATCH: [356/889], loss: 0.338, loss_box_reg: 0.101, loss_classifier: 0.086, loss_mask: 0.118, loss_objectness: 0.012, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 356
[ 2023-10-07 23:34:05 ] Completed saving temp checkpoint 1,723.358 ms, 55.08 s total
[ 2023-10-07 23:34:05 ] Completed replacing temp checkpoint with checkpoint 74.056 ms, 55.15 s total
[ 2023-10-07 23:34:05 ] Completed Epoch: 13 batch 357: moving batch data to device 12.970 ms, 55.17 s total
[ 2023-10-07 23:34:05 ] Completed Epoch: 13 batch 357: forward pass 373.815 ms, 55.54 s total
[ 2023-10-07 23:34:05 ] Completed Epoch: 13 batch 357: backward pass 71.087 ms, 55.61 s total
[ 2023-10-07 23:34:06 ] Completed Epoch: 13 batch 357: computing loss 1,151.223 ms, 56.76 s total
EPOCH: [13], BATCH: [357/889], loss: 0.386, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.131, loss_objectness: 0.013, loss_rpn_box_reg: 0.019
Saving checkpoint at epoch 13 train batch 357
[ 2023-10-07 23:34:07 ] Completed saving temp checkpoint 1,146.887 ms, 57.91 s total
[ 2023-10-07 23:34:07 ] Completed replacing temp checkpoint with checkpoint 35.700 ms, 57.94 s total
[ 2023-10-07 23:34:07 ] Completed Epoch: 13 batch 358: moving batch data to device 22.465 ms, 57.97 s total
[ 2023-10-07 23:34:08 ] Completed Epoch: 13 batch 358: forward pass 325.907 ms, 58.29 s total
[ 2023-10-07 23:34:08 ] Completed Epoch: 13 batch 358: backward pass 75.351 ms, 58.37 s total
[ 2023-10-07 23:34:09 ] Completed Epoch: 13 batch 358: computing loss 1,243.902 ms, 59.61 s total
EPOCH: [13], BATCH: [358/889], loss: 0.369, loss_box_reg: 0.111, loss_classifier: 0.089, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 358
[ 2023-10-07 23:34:10 ] Completed saving temp checkpoint 1,272.832 ms, 60.89 s total
[ 2023-10-07 23:34:10 ] Completed replacing temp checkpoint with checkpoint 59.639 ms, 60.94 s total
[ 2023-10-07 23:34:10 ] Completed Epoch: 13 batch 359: moving batch data to device 21.379 ms, 60.97 s total
[ 2023-10-07 23:34:11 ] Completed Epoch: 13 batch 359: forward pass 333.195 ms, 61.30 s total
[ 2023-10-07 23:34:11 ] Completed Epoch: 13 batch 359: backward pass 67.879 ms, 61.37 s total
[ 2023-10-07 23:34:12 ] Completed Epoch: 13 batch 359: computing loss 1,261.087 ms, 62.63 s total
EPOCH: [13], BATCH: [359/889], loss: 0.402, loss_box_reg: 0.121, loss_classifier: 0.105, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 359
[ 2023-10-07 23:34:13 ] Completed saving temp checkpoint 1,088.679 ms, 63.72 s total
[ 2023-10-07 23:34:13 ] Completed replacing temp checkpoint with checkpoint 58.503 ms, 63.78 s total
[ 2023-10-07 23:34:13 ] Completed Epoch: 13 batch 360: moving batch data to device 22.417 ms, 63.80 s total
[ 2023-10-07 23:34:14 ] Completed Epoch: 13 batch 360: forward pass 328.434 ms, 64.13 s total
[ 2023-10-07 23:34:14 ] Completed Epoch: 13 batch 360: backward pass 37.743 ms, 64.16 s total
[ 2023-10-07 23:34:15 ] Completed Epoch: 13 batch 360: computing loss 1,255.995 ms, 65.42 s total
EPOCH: [13], BATCH: [360/889], loss: 0.402, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 360
[ 2023-10-07 23:34:16 ] Completed saving temp checkpoint 956.155 ms, 66.38 s total
[ 2023-10-07 23:34:16 ] Completed replacing temp checkpoint with checkpoint 59.376 ms, 66.44 s total
[ 2023-10-07 23:34:16 ] Completed Epoch: 13 batch 361: moving batch data to device 21.456 ms, 66.46 s total
[ 2023-10-07 23:34:16 ] Completed Epoch: 13 batch 361: forward pass 330.692 ms, 66.79 s total
[ 2023-10-07 23:34:16 ] Completed Epoch: 13 batch 361: backward pass 53.448 ms, 66.84 s total
[ 2023-10-07 23:34:18 ] Completed Epoch: 13 batch 361: computing loss 1,737.514 ms, 68.58 s total
EPOCH: [13], BATCH: [361/889], loss: 0.378, loss_box_reg: 0.108, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 361
[ 2023-10-07 23:34:19 ] Completed saving temp checkpoint 1,093.964 ms, 69.67 s total
[ 2023-10-07 23:34:19 ] Completed replacing temp checkpoint with checkpoint 68.408 ms, 69.74 s total
[ 2023-10-07 23:34:19 ] Completed Epoch: 13 batch 362: moving batch data to device 23.580 ms, 69.76 s total
[ 2023-10-07 23:34:20 ] Completed Epoch: 13 batch 362: forward pass 333.161 ms, 70.10 s total
[ 2023-10-07 23:34:20 ] Completed Epoch: 13 batch 362: backward pass 38.347 ms, 70.14 s total
[ 2023-10-07 23:34:21 ] Completed Epoch: 13 batch 362: computing loss 1,685.245 ms, 71.82 s total
EPOCH: [13], BATCH: [362/889], loss: 0.408, loss_box_reg: 0.123, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 362
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-07 23:47:24 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 23:47:24 ] Completed importing Timer 0.023 ms, 0.00 s total
[ 2023-10-07 23:47:24 ] Completed importing everything else 515.048 ms, 0.52 s total
[ 2023-10-07 23:47:24 ] Completed defined other functions 0.022 ms, 0.52 s total
| distributed init (rank 0): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 4): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-07 23:47:33 ] Completed main preliminaries 8,253.865 ms, 8.77 s total
loading annotations into memory...
Done (t=10.81s)
creating index...
index created!
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
[ 2023-10-07 23:47:45 ] Completed loading data 12,592.059 ms, 21.36 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-07 23:47:45 ] Completed creating data samplers 97.896 ms, 21.46 s total
[ 2023-10-07 23:47:45 ] Completed creating data loaders 0.201 ms, 21.46 s total
[ 2023-10-07 23:47:46 ] Completed creating model and .to(device) 644.435 ms, 22.10 s total
[ 2023-10-07 23:47:48 ] Completed preparing model for distributed training 2,345.585 ms, 24.45 s total
[ 2023-10-07 23:47:48 ] Completed optimizer and scaler 0.612 ms, 24.45 s total
[ 2023-10-07 23:47:48 ] Completed learning rate schedulers 0.234 ms, 24.45 s total
[ 2023-10-07 23:47:49 ] Completed init coco evaluator 970.920 ms, 25.42 s total
RESUMING FROM CURRENT JOB
[ 2023-10-07 23:47:50 ] Completed retrieving checkpoint 876.522 ms, 26.30 s total
EPOCH :: 13
[ 2023-10-07 23:47:50 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-07 23:47:50 ] Completed training preliminaries 0.846 ms, 0.00 s total
Training / resuming epoch 13 from training step 362
[ 2023-10-07 23:47:51 ] Completed Epoch: 13 batch 362: moving batch data to device 493.186 ms, 0.49 s total
[ 2023-10-07 23:47:52 ] Completed Epoch: 13 batch 362: forward pass 1,225.079 ms, 1.72 s total
[ 2023-10-07 23:47:52 ] Completed Epoch: 13 batch 362: backward pass 162.234 ms, 1.88 s total
[ 2023-10-07 23:47:53 ] Completed Epoch: 13 batch 362: computing loss 563.206 ms, 2.44 s total
EPOCH: [13], BATCH: [362/889], loss: 0.409, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.138, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 362
[ 2023-10-07 23:47:54 ] Completed saving temp checkpoint 1,027.034 ms, 3.47 s total
[ 2023-10-07 23:47:54 ] Completed replacing temp checkpoint with checkpoint 148.988 ms, 3.62 s total
[ 2023-10-07 23:47:54 ] Completed Epoch: 13 batch 363: moving batch data to device 4.902 ms, 3.63 s total
[ 2023-10-07 23:47:54 ] Completed Epoch: 13 batch 363: forward pass 211.946 ms, 3.84 s total
[ 2023-10-07 23:47:54 ] Completed Epoch: 13 batch 363: backward pass 251.583 ms, 4.09 s total
[ 2023-10-07 23:47:54 ] Completed Epoch: 13 batch 363: computing loss 96.260 ms, 4.19 s total
EPOCH: [13], BATCH: [363/889], loss: 0.370, loss_box_reg: 0.107, loss_classifier: 0.094, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 363
[ 2023-10-07 23:47:55 ] Completed saving temp checkpoint 967.078 ms, 5.15 s total
[ 2023-10-07 23:47:55 ] Completed replacing temp checkpoint with checkpoint 66.191 ms, 5.22 s total
[ 2023-10-07 23:47:55 ] Completed Epoch: 13 batch 364: moving batch data to device 2.361 ms, 5.22 s total
[ 2023-10-07 23:47:56 ] Completed Epoch: 13 batch 364: forward pass 110.610 ms, 5.33 s total
[ 2023-10-07 23:47:56 ] Completed Epoch: 13 batch 364: backward pass 119.743 ms, 5.45 s total
[ 2023-10-07 23:47:56 ] Completed Epoch: 13 batch 364: computing loss 101.803 ms, 5.55 s total
EPOCH: [13], BATCH: [364/889], loss: 0.375, loss_box_reg: 0.114, loss_classifier: 0.088, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 364
[ 2023-10-07 23:47:57 ] Completed saving temp checkpoint 1,091.830 ms, 6.64 s total
[ 2023-10-07 23:47:57 ] Completed replacing temp checkpoint with checkpoint 58.145 ms, 6.70 s total
[ 2023-10-07 23:47:57 ] Completed Epoch: 13 batch 365: moving batch data to device 62.499 ms, 6.77 s total
[ 2023-10-07 23:47:57 ] Completed Epoch: 13 batch 365: forward pass 106.019 ms, 6.87 s total
[ 2023-10-07 23:47:57 ] Completed Epoch: 13 batch 365: backward pass 82.787 ms, 6.95 s total
[ 2023-10-07 23:47:57 ] Completed Epoch: 13 batch 365: computing loss 122.762 ms, 7.08 s total
EPOCH: [13], BATCH: [365/889], loss: 0.373, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 365
[ 2023-10-07 23:47:59 ] Completed saving temp checkpoint 1,534.573 ms, 8.61 s total
[ 2023-10-07 23:47:59 ] Completed replacing temp checkpoint with checkpoint 27.969 ms, 8.64 s total
[ 2023-10-07 23:47:59 ] Completed Epoch: 13 batch 366: moving batch data to device 2.295 ms, 8.64 s total
[ 2023-10-07 23:47:59 ] Completed Epoch: 13 batch 366: forward pass 106.308 ms, 8.75 s total
[ 2023-10-07 23:47:59 ] Completed Epoch: 13 batch 366: backward pass 33.797 ms, 8.78 s total
[ 2023-10-07 23:47:59 ] Completed Epoch: 13 batch 366: computing loss 179.739 ms, 8.96 s total
EPOCH: [13], BATCH: [366/889], loss: 0.382, loss_box_reg: 0.113, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 366
[ 2023-10-07 23:48:01 ] Completed saving temp checkpoint 1,949.756 ms, 10.91 s total
[ 2023-10-07 23:48:01 ] Completed replacing temp checkpoint with checkpoint 66.658 ms, 10.98 s total
[ 2023-10-07 23:48:01 ] Completed Epoch: 13 batch 367: moving batch data to device 4.493 ms, 10.98 s total
[ 2023-10-07 23:48:01 ] Completed Epoch: 13 batch 367: forward pass 104.361 ms, 11.09 s total
[ 2023-10-07 23:48:01 ] Completed Epoch: 13 batch 367: backward pass 79.752 ms, 11.17 s total
[ 2023-10-07 23:48:02 ] Completed Epoch: 13 batch 367: computing loss 124.332 ms, 11.29 s total
EPOCH: [13], BATCH: [367/889], loss: 0.382, loss_box_reg: 0.118, loss_classifier: 0.097, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 367
[ 2023-10-07 23:48:03 ] Completed saving temp checkpoint 1,309.782 ms, 12.60 s total
[ 2023-10-07 23:48:03 ] Completed replacing temp checkpoint with checkpoint 48.290 ms, 12.65 s total
[ 2023-10-07 23:48:03 ] Completed Epoch: 13 batch 368: moving batch data to device 3.990 ms, 12.65 s total
[ 2023-10-07 23:48:03 ] Completed Epoch: 13 batch 368: forward pass 111.543 ms, 12.76 s total
[ 2023-10-07 23:48:03 ] Completed Epoch: 13 batch 368: backward pass 81.619 ms, 12.85 s total
[ 2023-10-07 23:48:03 ] Completed Epoch: 13 batch 368: computing loss 124.674 ms, 12.97 s total
EPOCH: [13], BATCH: [368/889], loss: 0.388, loss_box_reg: 0.115, loss_classifier: 0.098, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 368
[ 2023-10-07 23:48:05 ] Completed saving temp checkpoint 1,522.662 ms, 14.49 s total
[ 2023-10-07 23:48:05 ] Completed replacing temp checkpoint with checkpoint 100.141 ms, 14.59 s total
[ 2023-10-07 23:48:05 ] Completed Epoch: 13 batch 369: moving batch data to device 7.041 ms, 14.60 s total
[ 2023-10-07 23:48:05 ] Completed Epoch: 13 batch 369: forward pass 108.958 ms, 14.71 s total
[ 2023-10-07 23:48:05 ] Completed Epoch: 13 batch 369: backward pass 69.310 ms, 14.78 s total
[ 2023-10-07 23:48:05 ] Completed Epoch: 13 batch 369: computing loss 124.962 ms, 14.90 s total
EPOCH: [13], BATCH: [369/889], loss: 0.375, loss_box_reg: 0.112, loss_classifier: 0.093, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 369
[ 2023-10-07 23:48:07 ] Completed saving temp checkpoint 1,372.162 ms, 16.28 s total
[ 2023-10-07 23:48:07 ] Completed replacing temp checkpoint with checkpoint 92.533 ms, 16.37 s total
[ 2023-10-07 23:48:07 ] Completed Epoch: 13 batch 370: moving batch data to device 3.018 ms, 16.37 s total
[ 2023-10-07 23:48:07 ] Completed Epoch: 13 batch 370: forward pass 169.624 ms, 16.54 s total
[ 2023-10-07 23:48:07 ] Completed Epoch: 13 batch 370: backward pass 67.342 ms, 16.61 s total
[ 2023-10-07 23:48:07 ] Completed Epoch: 13 batch 370: computing loss 145.552 ms, 16.75 s total
EPOCH: [13], BATCH: [370/889], loss: 0.346, loss_box_reg: 0.107, loss_classifier: 0.085, loss_mask: 0.122, loss_objectness: 0.013, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 370
[ 2023-10-07 23:48:09 ] Completed saving temp checkpoint 1,651.696 ms, 18.41 s total
[ 2023-10-07 23:48:09 ] Completed replacing temp checkpoint with checkpoint 100.076 ms, 18.51 s total
[ 2023-10-07 23:48:09 ] Completed Epoch: 13 batch 371: moving batch data to device 9.638 ms, 18.52 s total
[ 2023-10-07 23:48:09 ] Completed Epoch: 13 batch 371: forward pass 108.913 ms, 18.62 s total
[ 2023-10-07 23:48:09 ] Completed Epoch: 13 batch 371: backward pass 77.859 ms, 18.70 s total
[ 2023-10-07 23:48:09 ] Completed Epoch: 13 batch 371: computing loss 110.183 ms, 18.81 s total
EPOCH: [13], BATCH: [371/889], loss: 0.354, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.121, loss_objectness: 0.013, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 371
[ 2023-10-07 23:48:11 ] Completed saving temp checkpoint 1,673.481 ms, 20.49 s total
[ 2023-10-07 23:48:11 ] Completed replacing temp checkpoint with checkpoint 91.700 ms, 20.58 s total
[ 2023-10-07 23:48:11 ] Completed Epoch: 13 batch 372: moving batch data to device 7.860 ms, 20.59 s total
[ 2023-10-07 23:48:11 ] Completed Epoch: 13 batch 372: forward pass 112.286 ms, 20.70 s total
[ 2023-10-07 23:48:11 ] Completed Epoch: 13 batch 372: backward pass 57.925 ms, 20.76 s total
[ 2023-10-07 23:48:11 ] Completed Epoch: 13 batch 372: computing loss 134.790 ms, 20.89 s total
EPOCH: [13], BATCH: [372/889], loss: 0.407, loss_box_reg: 0.125, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.020, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 372
[ 2023-10-07 23:48:13 ] Completed saving temp checkpoint 1,913.322 ms, 22.80 s total
[ 2023-10-07 23:48:13 ] Completed replacing temp checkpoint with checkpoint 61.025 ms, 22.87 s total
[ 2023-10-07 23:48:13 ] Completed Epoch: 13 batch 373: moving batch data to device 5.316 ms, 22.87 s total
[ 2023-10-07 23:48:13 ] Completed Epoch: 13 batch 373: forward pass 104.895 ms, 22.98 s total
[ 2023-10-07 23:48:13 ] Completed Epoch: 13 batch 373: backward pass 88.657 ms, 23.06 s total
[ 2023-10-07 23:48:13 ] Completed Epoch: 13 batch 373: computing loss 104.703 ms, 23.17 s total
EPOCH: [13], BATCH: [373/889], loss: 0.370, loss_box_reg: 0.112, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.019
Saving checkpoint at epoch 13 train batch 373
[ 2023-10-07 23:48:15 ] Completed saving temp checkpoint 1,665.656 ms, 24.83 s total
[ 2023-10-07 23:48:15 ] Completed replacing temp checkpoint with checkpoint 43.104 ms, 24.88 s total
[ 2023-10-07 23:48:15 ] Completed Epoch: 13 batch 374: moving batch data to device 4.945 ms, 24.88 s total
[ 2023-10-07 23:48:15 ] Completed Epoch: 13 batch 374: forward pass 107.574 ms, 24.99 s total
[ 2023-10-07 23:48:15 ] Completed Epoch: 13 batch 374: backward pass 78.182 ms, 25.07 s total
[ 2023-10-07 23:48:15 ] Completed Epoch: 13 batch 374: computing loss 111.556 ms, 25.18 s total
EPOCH: [13], BATCH: [374/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 374
[ 2023-10-07 23:48:17 ] Completed saving temp checkpoint 1,866.382 ms, 27.05 s total
[ 2023-10-07 23:48:17 ] Completed replacing temp checkpoint with checkpoint 57.519 ms, 27.10 s total
[ 2023-10-07 23:48:17 ] Completed Epoch: 13 batch 375: moving batch data to device 8.798 ms, 27.11 s total
[ 2023-10-07 23:48:17 ] Completed Epoch: 13 batch 375: forward pass 105.348 ms, 27.22 s total
[ 2023-10-07 23:48:18 ] Completed Epoch: 13 batch 375: backward pass 73.175 ms, 27.29 s total
[ 2023-10-07 23:48:18 ] Completed Epoch: 13 batch 375: computing loss 124.368 ms, 27.42 s total
EPOCH: [13], BATCH: [375/889], loss: 0.400, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.134, loss_objectness: 0.019, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 375
[ 2023-10-07 23:48:19 ] Completed saving temp checkpoint 1,751.293 ms, 29.17 s total
[ 2023-10-07 23:48:19 ] Completed replacing temp checkpoint with checkpoint 65.426 ms, 29.23 s total
[ 2023-10-07 23:48:19 ] Completed Epoch: 13 batch 376: moving batch data to device 7.237 ms, 29.24 s total
[ 2023-10-07 23:48:20 ] Completed Epoch: 13 batch 376: forward pass 109.062 ms, 29.35 s total
[ 2023-10-07 23:48:20 ] Completed Epoch: 13 batch 376: backward pass 80.903 ms, 29.43 s total
[ 2023-10-07 23:48:20 ] Completed Epoch: 13 batch 376: computing loss 110.329 ms, 29.54 s total
EPOCH: [13], BATCH: [376/889], loss: 0.387, loss_box_reg: 0.120, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 376
[ 2023-10-07 23:48:22 ] Completed saving temp checkpoint 2,177.750 ms, 31.72 s total
[ 2023-10-07 23:48:22 ] Completed replacing temp checkpoint with checkpoint 74.440 ms, 31.79 s total
[ 2023-10-07 23:48:22 ] Completed Epoch: 13 batch 377: moving batch data to device 5.085 ms, 31.80 s total
[ 2023-10-07 23:48:22 ] Completed Epoch: 13 batch 377: forward pass 106.509 ms, 31.90 s total
[ 2023-10-07 23:48:22 ] Completed Epoch: 13 batch 377: backward pass 73.767 ms, 31.98 s total
[ 2023-10-07 23:48:22 ] Completed Epoch: 13 batch 377: computing loss 126.992 ms, 32.10 s total
EPOCH: [13], BATCH: [377/889], loss: 0.413, loss_box_reg: 0.127, loss_classifier: 0.103, loss_mask: 0.134, loss_objectness: 0.019, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 377
[ 2023-10-07 23:48:24 ] Completed saving temp checkpoint 1,570.130 ms, 33.67 s total
[ 2023-10-07 23:48:24 ] Completed replacing temp checkpoint with checkpoint 37.146 ms, 33.71 s total
[ 2023-10-07 23:48:24 ] Completed Epoch: 13 batch 378: moving batch data to device 6.902 ms, 33.72 s total
[ 2023-10-07 23:48:24 ] Completed Epoch: 13 batch 378: forward pass 107.489 ms, 33.83 s total
[ 2023-10-07 23:48:24 ] Completed Epoch: 13 batch 378: backward pass 80.317 ms, 33.91 s total
[ 2023-10-07 23:48:24 ] Completed Epoch: 13 batch 378: computing loss 115.563 ms, 34.02 s total
EPOCH: [13], BATCH: [378/889], loss: 0.366, loss_box_reg: 0.105, loss_classifier: 0.084, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 378
[ 2023-10-07 23:48:26 ] Completed saving temp checkpoint 1,783.015 ms, 35.80 s total
[ 2023-10-07 23:48:26 ] Completed replacing temp checkpoint with checkpoint 73.394 ms, 35.88 s total
[ 2023-10-07 23:48:26 ] Completed Epoch: 13 batch 379: moving batch data to device 6.756 ms, 35.88 s total
[ 2023-10-07 23:48:26 ] Completed Epoch: 13 batch 379: forward pass 105.477 ms, 35.99 s total
[ 2023-10-07 23:48:26 ] Completed Epoch: 13 batch 379: backward pass 72.866 ms, 36.06 s total
[ 2023-10-07 23:48:26 ] Completed Epoch: 13 batch 379: computing loss 112.501 ms, 36.18 s total
EPOCH: [13], BATCH: [379/889], loss: 0.377, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 379
[ 2023-10-07 23:48:28 ] Completed saving temp checkpoint 1,740.085 ms, 37.92 s total
[ 2023-10-07 23:48:28 ] Completed replacing temp checkpoint with checkpoint 51.078 ms, 37.97 s total
[ 2023-10-07 23:48:28 ] Completed Epoch: 13 batch 380: moving batch data to device 4.776 ms, 37.97 s total
[ 2023-10-07 23:48:28 ] Completed Epoch: 13 batch 380: forward pass 103.542 ms, 38.08 s total
[ 2023-10-07 23:48:28 ] Completed Epoch: 13 batch 380: backward pass 71.719 ms, 38.15 s total
[ 2023-10-07 23:48:29 ] Completed Epoch: 13 batch 380: computing loss 203.538 ms, 38.35 s total
EPOCH: [13], BATCH: [380/889], loss: 0.386, loss_box_reg: 0.115, loss_classifier: 0.096, loss_mask: 0.133, loss_objectness: 0.013, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 380
[ 2023-10-07 23:48:30 ] Completed saving temp checkpoint 1,626.568 ms, 39.98 s total
[ 2023-10-07 23:48:30 ] Completed replacing temp checkpoint with checkpoint 71.523 ms, 40.05 s total
[ 2023-10-07 23:48:30 ] Completed Epoch: 13 batch 381: moving batch data to device 7.019 ms, 40.06 s total
[ 2023-10-07 23:48:30 ] Completed Epoch: 13 batch 381: forward pass 102.579 ms, 40.16 s total
[ 2023-10-07 23:48:30 ] Completed Epoch: 13 batch 381: backward pass 33.896 ms, 40.19 s total
[ 2023-10-07 23:48:31 ] Completed Epoch: 13 batch 381: computing loss 155.006 ms, 40.35 s total
EPOCH: [13], BATCH: [381/889], loss: 0.371, loss_box_reg: 0.112, loss_classifier: 0.087, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 381
[ 2023-10-07 23:48:32 ] Completed saving temp checkpoint 1,488.113 ms, 41.84 s total
[ 2023-10-07 23:48:32 ] Completed replacing temp checkpoint with checkpoint 46.609 ms, 41.88 s total
[ 2023-10-07 23:48:32 ] Completed Epoch: 13 batch 382: moving batch data to device 5.370 ms, 41.89 s total
[ 2023-10-07 23:48:32 ] Completed Epoch: 13 batch 382: forward pass 106.807 ms, 41.99 s total
[ 2023-10-07 23:48:32 ] Completed Epoch: 13 batch 382: backward pass 38.508 ms, 42.03 s total
[ 2023-10-07 23:48:32 ] Completed Epoch: 13 batch 382: computing loss 151.446 ms, 42.18 s total
EPOCH: [13], BATCH: [382/889], loss: 0.389, loss_box_reg: 0.118, loss_classifier: 0.095, loss_mask: 0.136, loss_objectness: 0.013, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 382
[ 2023-10-07 23:48:35 ] Completed saving temp checkpoint 2,117.420 ms, 44.30 s total
[ 2023-10-07 23:48:35 ] Completed replacing temp checkpoint with checkpoint 31.635 ms, 44.33 s total
[ 2023-10-07 23:48:35 ] Completed Epoch: 13 batch 383: moving batch data to device 8.080 ms, 44.34 s total
[ 2023-10-07 23:48:35 ] Completed Epoch: 13 batch 383: forward pass 104.409 ms, 44.45 s total
[ 2023-10-07 23:48:35 ] Completed Epoch: 13 batch 383: backward pass 56.629 ms, 44.50 s total
[ 2023-10-07 23:48:35 ] Completed Epoch: 13 batch 383: computing loss 142.091 ms, 44.64 s total
EPOCH: [13], BATCH: [383/889], loss: 0.404, loss_box_reg: 0.119, loss_classifier: 0.103, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 383
[ 2023-10-07 23:48:36 ] Completed saving temp checkpoint 1,296.659 ms, 45.94 s total
[ 2023-10-07 23:48:36 ] Completed replacing temp checkpoint with checkpoint 68.046 ms, 46.01 s total
[ 2023-10-07 23:48:36 ] Completed Epoch: 13 batch 384: moving batch data to device 8.051 ms, 46.02 s total
[ 2023-10-07 23:48:36 ] Completed Epoch: 13 batch 384: forward pass 103.593 ms, 46.12 s total
[ 2023-10-07 23:48:36 ] Completed Epoch: 13 batch 384: backward pass 78.081 ms, 46.20 s total
[ 2023-10-07 23:48:37 ] Completed Epoch: 13 batch 384: computing loss 117.721 ms, 46.32 s total
EPOCH: [13], BATCH: [384/889], loss: 0.376, loss_box_reg: 0.107, loss_classifier: 0.094, loss_mask: 0.123, loss_objectness: 0.024, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 384
[ 2023-10-07 23:48:38 ] Completed saving temp checkpoint 1,350.469 ms, 47.67 s total
[ 2023-10-07 23:48:38 ] Completed replacing temp checkpoint with checkpoint 50.772 ms, 47.72 s total
[ 2023-10-07 23:48:38 ] Completed Epoch: 13 batch 385: moving batch data to device 7.990 ms, 47.73 s total
[ 2023-10-07 23:48:38 ] Completed Epoch: 13 batch 385: forward pass 105.723 ms, 47.83 s total
[ 2023-10-07 23:48:38 ] Completed Epoch: 13 batch 385: backward pass 31.400 ms, 47.86 s total
[ 2023-10-07 23:48:38 ] Completed Epoch: 13 batch 385: computing loss 138.899 ms, 48.00 s total
EPOCH: [13], BATCH: [385/889], loss: 0.396, loss_box_reg: 0.118, loss_classifier: 0.095, loss_mask: 0.144, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 385
[ 2023-10-07 23:48:39 ] Completed saving temp checkpoint 1,215.342 ms, 49.22 s total
[ 2023-10-07 23:48:40 ] Completed replacing temp checkpoint with checkpoint 83.291 ms, 49.30 s total
[ 2023-10-07 23:48:40 ] Completed Epoch: 13 batch 386: moving batch data to device 9.583 ms, 49.31 s total
[ 2023-10-07 23:48:40 ] Completed Epoch: 13 batch 386: forward pass 104.164 ms, 49.41 s total
[ 2023-10-07 23:48:40 ] Completed Epoch: 13 batch 386: backward pass 38.731 ms, 49.45 s total
[ 2023-10-07 23:48:40 ] Completed Epoch: 13 batch 386: computing loss 155.418 ms, 49.61 s total
EPOCH: [13], BATCH: [386/889], loss: 0.393, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 386
[ 2023-10-07 23:48:41 ] Completed saving temp checkpoint 1,297.559 ms, 50.91 s total
[ 2023-10-07 23:48:41 ] Completed replacing temp checkpoint with checkpoint 67.528 ms, 50.97 s total
[ 2023-10-07 23:48:41 ] Completed Epoch: 13 batch 387: moving batch data to device 6.555 ms, 50.98 s total
[ 2023-10-07 23:48:41 ] Completed Epoch: 13 batch 387: forward pass 110.481 ms, 51.09 s total
[ 2023-10-07 23:48:41 ] Completed Epoch: 13 batch 387: backward pass 68.880 ms, 51.16 s total
[ 2023-10-07 23:48:42 ] Completed Epoch: 13 batch 387: computing loss 122.511 ms, 51.28 s total
EPOCH: [13], BATCH: [387/889], loss: 0.389, loss_box_reg: 0.117, loss_classifier: 0.097, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 387
[ 2023-10-07 23:48:43 ] Completed saving temp checkpoint 1,133.769 ms, 52.42 s total
[ 2023-10-07 23:48:43 ] Completed replacing temp checkpoint with checkpoint 77.255 ms, 52.49 s total
[ 2023-10-07 23:48:43 ] Completed Epoch: 13 batch 388: moving batch data to device 7.039 ms, 52.50 s total
[ 2023-10-07 23:48:43 ] Completed Epoch: 13 batch 388: forward pass 103.424 ms, 52.60 s total
[ 2023-10-07 23:48:43 ] Completed Epoch: 13 batch 388: backward pass 72.432 ms, 52.68 s total
[ 2023-10-07 23:48:43 ] Completed Epoch: 13 batch 388: computing loss 123.787 ms, 52.80 s total
EPOCH: [13], BATCH: [388/889], loss: 0.397, loss_box_reg: 0.114, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 388
[ 2023-10-07 23:48:44 ] Completed saving temp checkpoint 1,237.926 ms, 54.04 s total
[ 2023-10-07 23:48:44 ] Completed replacing temp checkpoint with checkpoint 56.034 ms, 54.09 s total
[ 2023-10-07 23:48:44 ] Completed Epoch: 13 batch 389: moving batch data to device 4.636 ms, 54.10 s total
[ 2023-10-07 23:48:44 ] Completed Epoch: 13 batch 389: forward pass 100.810 ms, 54.20 s total
[ 2023-10-07 23:48:44 ] Completed Epoch: 13 batch 389: backward pass 47.355 ms, 54.25 s total
[ 2023-10-07 23:48:45 ] Completed Epoch: 13 batch 389: computing loss 122.380 ms, 54.37 s total
EPOCH: [13], BATCH: [389/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.035
Saving checkpoint at epoch 13 train batch 389
[ 2023-10-07 23:48:46 ] Completed saving temp checkpoint 1,151.704 ms, 55.52 s total
[ 2023-10-07 23:48:46 ] Completed replacing temp checkpoint with checkpoint 64.797 ms, 55.58 s total
[ 2023-10-07 23:48:46 ] Completed Epoch: 13 batch 390: moving batch data to device 4.861 ms, 55.59 s total
[ 2023-10-07 23:48:46 ] Completed Epoch: 13 batch 390: forward pass 104.399 ms, 55.69 s total
[ 2023-10-07 23:48:46 ] Completed Epoch: 13 batch 390: backward pass 34.300 ms, 55.73 s total
[ 2023-10-07 23:48:46 ] Completed Epoch: 13 batch 390: computing loss 158.703 ms, 55.89 s total
EPOCH: [13], BATCH: [390/889], loss: 0.409, loss_box_reg: 0.130, loss_classifier: 0.102, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 390
[ 2023-10-07 23:48:48 ] Completed saving temp checkpoint 1,404.513 ms, 57.29 s total
[ 2023-10-07 23:48:48 ] Completed replacing temp checkpoint with checkpoint 80.087 ms, 57.37 s total
[ 2023-10-07 23:48:48 ] Completed Epoch: 13 batch 391: moving batch data to device 6.115 ms, 57.38 s total
[ 2023-10-07 23:48:48 ] Completed Epoch: 13 batch 391: forward pass 102.127 ms, 57.48 s total
[ 2023-10-07 23:48:48 ] Completed Epoch: 13 batch 391: backward pass 47.419 ms, 57.53 s total
[ 2023-10-07 23:48:48 ] Completed Epoch: 13 batch 391: computing loss 145.631 ms, 57.67 s total
EPOCH: [13], BATCH: [391/889], loss: 0.394, loss_box_reg: 0.125, loss_classifier: 0.098, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 391
[ 2023-10-07 23:48:49 ] Completed saving temp checkpoint 1,117.501 ms, 58.79 s total
[ 2023-10-07 23:48:49 ] Completed replacing temp checkpoint with checkpoint 40.987 ms, 58.83 s total
[ 2023-10-07 23:48:49 ] Completed Epoch: 13 batch 392: moving batch data to device 4.396 ms, 58.84 s total
[ 2023-10-07 23:48:49 ] Completed Epoch: 13 batch 392: forward pass 106.345 ms, 58.94 s total
[ 2023-10-07 23:48:49 ] Completed Epoch: 13 batch 392: backward pass 36.088 ms, 58.98 s total
[ 2023-10-07 23:48:49 ] Completed Epoch: 13 batch 392: computing loss 155.103 ms, 59.13 s total
EPOCH: [13], BATCH: [392/889], loss: 0.362, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 392
[ 2023-10-07 23:48:51 ] Completed saving temp checkpoint 1,975.229 ms, 61.11 s total
[ 2023-10-07 23:48:51 ] Completed replacing temp checkpoint with checkpoint 82.394 ms, 61.19 s total
[ 2023-10-07 23:48:51 ] Completed Epoch: 13 batch 393: moving batch data to device 6.410 ms, 61.20 s total
[ 2023-10-07 23:48:52 ] Completed Epoch: 13 batch 393: forward pass 111.458 ms, 61.31 s total
[ 2023-10-07 23:48:52 ] Completed Epoch: 13 batch 393: backward pass 73.696 ms, 61.38 s total
[ 2023-10-07 23:48:52 ] Completed Epoch: 13 batch 393: computing loss 114.262 ms, 61.50 s total
EPOCH: [13], BATCH: [393/889], loss: 0.390, loss_box_reg: 0.120, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 393
[ 2023-10-07 23:48:53 ] Completed saving temp checkpoint 1,220.980 ms, 62.72 s total
[ 2023-10-07 23:48:53 ] Completed replacing temp checkpoint with checkpoint 41.441 ms, 62.76 s total
[ 2023-10-07 23:48:53 ] Completed Epoch: 13 batch 394: moving batch data to device 4.497 ms, 62.76 s total
[ 2023-10-07 23:48:53 ] Completed Epoch: 13 batch 394: forward pass 101.206 ms, 62.86 s total
[ 2023-10-07 23:48:53 ] Completed Epoch: 13 batch 394: backward pass 75.207 ms, 62.94 s total
[ 2023-10-07 23:48:53 ] Completed Epoch: 13 batch 394: computing loss 97.040 ms, 63.04 s total
EPOCH: [13], BATCH: [394/889], loss: 0.383, loss_box_reg: 0.118, loss_classifier: 0.097, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 394
[ 2023-10-07 23:48:55 ] Completed saving temp checkpoint 1,625.280 ms, 64.66 s total
[ 2023-10-07 23:48:55 ] Completed replacing temp checkpoint with checkpoint 44.880 ms, 64.71 s total
[ 2023-10-07 23:48:55 ] Completed Epoch: 13 batch 395: moving batch data to device 4.977 ms, 64.71 s total
[ 2023-10-07 23:48:55 ] Completed Epoch: 13 batch 395: forward pass 107.839 ms, 64.82 s total
[ 2023-10-07 23:48:55 ] Completed Epoch: 13 batch 395: backward pass 70.974 ms, 64.89 s total
[ 2023-10-07 23:48:55 ] Completed Epoch: 13 batch 395: computing loss 111.584 ms, 65.00 s total
EPOCH: [13], BATCH: [395/889], loss: 0.368, loss_box_reg: 0.112, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 395
[ 2023-10-07 23:48:57 ] Completed saving temp checkpoint 1,782.452 ms, 66.79 s total
[ 2023-10-07 23:48:57 ] Completed replacing temp checkpoint with checkpoint 43.479 ms, 66.83 s total
[ 2023-10-07 23:48:57 ] Completed Epoch: 13 batch 396: moving batch data to device 5.201 ms, 66.83 s total
[ 2023-10-07 23:48:57 ] Completed Epoch: 13 batch 396: forward pass 104.990 ms, 66.94 s total
[ 2023-10-07 23:48:57 ] Completed Epoch: 13 batch 396: backward pass 43.994 ms, 66.98 s total
[ 2023-10-07 23:48:57 ] Completed Epoch: 13 batch 396: computing loss 148.510 ms, 67.13 s total
EPOCH: [13], BATCH: [396/889], loss: 0.400, loss_box_reg: 0.124, loss_classifier: 0.101, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 396
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 00:09:11 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 00:09:11 ] Completed importing Timer 0.019 ms, 0.00 s total
[ 2023-10-08 00:09:12 ] Completed importing everything else 568.003 ms, 0.57 s total
[ 2023-10-08 00:09:12 ] Completed defined other functions 0.021 ms, 0.57 s total
| distributed init (rank 4): env://
| distributed init (rank 5): env://
| distributed init (rank 3): env://
| distributed init (rank 0): env://
| distributed init (rank 2): env://
| distributed init (rank 1): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 00:09:20 ] Completed main preliminaries 7,607.749 ms, 8.18 s total
loading annotations into memory...
Done (t=10.76s)
creating index...
index created!
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
[ 2023-10-08 00:09:32 ] Completed loading data 12,547.218 ms, 20.72 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 00:09:32 ] Completed creating data samplers 98.207 ms, 20.82 s total
[ 2023-10-08 00:09:32 ] Completed creating data loaders 0.194 ms, 20.82 s total
[ 2023-10-08 00:09:33 ] Completed creating model and .to(device) 639.926 ms, 21.46 s total
[ 2023-10-08 00:09:35 ] Completed preparing model for distributed training 1,730.414 ms, 23.19 s total
[ 2023-10-08 00:09:35 ] Completed optimizer and scaler 0.636 ms, 23.19 s total
[ 2023-10-08 00:09:35 ] Completed learning rate schedulers 0.265 ms, 23.19 s total
[ 2023-10-08 00:09:36 ] Completed init coco evaluator 959.684 ms, 24.15 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 00:09:36 ] Completed retrieving checkpoint 841.078 ms, 24.99 s total
EPOCH :: 13
[ 2023-10-08 00:09:36 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 00:09:36 ] Completed training preliminaries 0.875 ms, 0.00 s total
Training / resuming epoch 13 from training step 396
[ 2023-10-08 00:09:37 ] Completed Epoch: 13 batch 396: moving batch data to device 464.246 ms, 0.47 s total
[ 2023-10-08 00:09:38 ] Completed Epoch: 13 batch 396: forward pass 1,334.083 ms, 1.80 s total
[ 2023-10-08 00:09:38 ] Completed Epoch: 13 batch 396: backward pass 171.036 ms, 1.97 s total
[ 2023-10-08 00:09:39 ] Completed Epoch: 13 batch 396: computing loss 182.057 ms, 2.15 s total
EPOCH: [13], BATCH: [396/889], loss: 0.402, loss_box_reg: 0.123, loss_classifier: 0.103, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 396
[ 2023-10-08 00:09:40 ] Completed saving temp checkpoint 968.499 ms, 3.12 s total
[ 2023-10-08 00:09:40 ] Completed replacing temp checkpoint with checkpoint 180.485 ms, 3.30 s total
[ 2023-10-08 00:09:40 ] Completed Epoch: 13 batch 397: moving batch data to device 59.216 ms, 3.36 s total
[ 2023-10-08 00:09:40 ] Completed Epoch: 13 batch 397: forward pass 113.800 ms, 3.47 s total
[ 2023-10-08 00:09:40 ] Completed Epoch: 13 batch 397: backward pass 77.970 ms, 3.55 s total
[ 2023-10-08 00:09:40 ] Completed Epoch: 13 batch 397: computing loss 140.631 ms, 3.69 s total
EPOCH: [13], BATCH: [397/889], loss: 0.382, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.126, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 397
[ 2023-10-08 00:09:41 ] Completed saving temp checkpoint 1,077.484 ms, 4.77 s total
[ 2023-10-08 00:09:41 ] Completed replacing temp checkpoint with checkpoint 34.308 ms, 4.80 s total
[ 2023-10-08 00:09:41 ] Completed Epoch: 13 batch 398: moving batch data to device 3.354 ms, 4.81 s total
[ 2023-10-08 00:09:41 ] Completed Epoch: 13 batch 398: forward pass 112.446 ms, 4.92 s total
[ 2023-10-08 00:09:41 ] Completed Epoch: 13 batch 398: backward pass 84.266 ms, 5.00 s total
[ 2023-10-08 00:09:42 ] Completed Epoch: 13 batch 398: computing loss 133.887 ms, 5.14 s total
EPOCH: [13], BATCH: [398/889], loss: 0.377, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.125, loss_objectness: 0.017, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 398
[ 2023-10-08 00:09:42 ] Completed saving temp checkpoint 967.043 ms, 6.11 s total
[ 2023-10-08 00:09:43 ] Completed replacing temp checkpoint with checkpoint 69.118 ms, 6.17 s total
[ 2023-10-08 00:09:43 ] Completed Epoch: 13 batch 399: moving batch data to device 20.891 ms, 6.20 s total
[ 2023-10-08 00:09:43 ] Completed Epoch: 13 batch 399: forward pass 209.693 ms, 6.41 s total
[ 2023-10-08 00:09:43 ] Completed Epoch: 13 batch 399: backward pass 78.746 ms, 6.48 s total
[ 2023-10-08 00:09:43 ] Completed Epoch: 13 batch 399: computing loss 139.851 ms, 6.62 s total
EPOCH: [13], BATCH: [399/889], loss: 0.356, loss_box_reg: 0.105, loss_classifier: 0.089, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 399
[ 2023-10-08 00:09:44 ] Completed saving temp checkpoint 1,216.887 ms, 7.84 s total
[ 2023-10-08 00:09:44 ] Completed replacing temp checkpoint with checkpoint 62.667 ms, 7.90 s total
[ 2023-10-08 00:09:44 ] Completed Epoch: 13 batch 400: moving batch data to device 5.584 ms, 7.91 s total
[ 2023-10-08 00:09:45 ] Completed Epoch: 13 batch 400: forward pass 223.588 ms, 8.13 s total
[ 2023-10-08 00:09:45 ] Completed Epoch: 13 batch 400: backward pass 36.227 ms, 8.17 s total
[ 2023-10-08 00:09:45 ] Completed Epoch: 13 batch 400: computing loss 166.526 ms, 8.34 s total
EPOCH: [13], BATCH: [400/889], loss: 0.358, loss_box_reg: 0.107, loss_classifier: 0.087, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 400
[ 2023-10-08 00:09:46 ] Completed saving temp checkpoint 953.912 ms, 9.29 s total
[ 2023-10-08 00:09:46 ] Completed replacing temp checkpoint with checkpoint 54.316 ms, 9.34 s total
[ 2023-10-08 00:09:46 ] Completed Epoch: 13 batch 401: moving batch data to device 2.499 ms, 9.35 s total
[ 2023-10-08 00:09:46 ] Completed Epoch: 13 batch 401: forward pass 110.621 ms, 9.46 s total
[ 2023-10-08 00:09:46 ] Completed Epoch: 13 batch 401: backward pass 81.333 ms, 9.54 s total
[ 2023-10-08 00:09:46 ] Completed Epoch: 13 batch 401: computing loss 125.349 ms, 9.66 s total
EPOCH: [13], BATCH: [401/889], loss: 0.396, loss_box_reg: 0.121, loss_classifier: 0.103, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 401
[ 2023-10-08 00:09:48 ] Completed saving temp checkpoint 1,522.659 ms, 11.19 s total
[ 2023-10-08 00:09:48 ] Completed replacing temp checkpoint with checkpoint 83.748 ms, 11.27 s total
[ 2023-10-08 00:09:48 ] Completed Epoch: 13 batch 402: moving batch data to device 3.752 ms, 11.27 s total
[ 2023-10-08 00:09:48 ] Completed Epoch: 13 batch 402: forward pass 108.143 ms, 11.38 s total
[ 2023-10-08 00:09:48 ] Completed Epoch: 13 batch 402: backward pass 97.312 ms, 11.48 s total
[ 2023-10-08 00:09:48 ] Completed Epoch: 13 batch 402: computing loss 115.408 ms, 11.59 s total
EPOCH: [13], BATCH: [402/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.109, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 402
[ 2023-10-08 00:09:49 ] Completed saving temp checkpoint 958.594 ms, 12.55 s total
[ 2023-10-08 00:09:49 ] Completed replacing temp checkpoint with checkpoint 45.184 ms, 12.60 s total
[ 2023-10-08 00:09:49 ] Completed Epoch: 13 batch 403: moving batch data to device 5.178 ms, 12.60 s total
[ 2023-10-08 00:09:49 ] Completed Epoch: 13 batch 403: forward pass 104.928 ms, 12.71 s total
[ 2023-10-08 00:09:49 ] Completed Epoch: 13 batch 403: backward pass 124.811 ms, 12.83 s total
[ 2023-10-08 00:09:49 ] Completed Epoch: 13 batch 403: computing loss 83.152 ms, 12.92 s total
EPOCH: [13], BATCH: [403/889], loss: 0.378, loss_box_reg: 0.115, loss_classifier: 0.100, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 403
[ 2023-10-08 00:09:50 ] Completed saving temp checkpoint 1,071.455 ms, 13.99 s total
[ 2023-10-08 00:09:50 ] Completed replacing temp checkpoint with checkpoint 68.797 ms, 14.06 s total
[ 2023-10-08 00:09:50 ] Completed Epoch: 13 batch 404: moving batch data to device 3.360 ms, 14.06 s total
[ 2023-10-08 00:09:51 ] Completed Epoch: 13 batch 404: forward pass 106.990 ms, 14.17 s total
[ 2023-10-08 00:09:51 ] Completed Epoch: 13 batch 404: backward pass 69.227 ms, 14.24 s total
[ 2023-10-08 00:09:51 ] Completed Epoch: 13 batch 404: computing loss 124.393 ms, 14.36 s total
EPOCH: [13], BATCH: [404/889], loss: 0.400, loss_box_reg: 0.121, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 404
[ 2023-10-08 00:09:52 ] Completed saving temp checkpoint 984.532 ms, 15.35 s total
[ 2023-10-08 00:09:52 ] Completed replacing temp checkpoint with checkpoint 71.505 ms, 15.42 s total
[ 2023-10-08 00:09:52 ] Completed Epoch: 13 batch 405: moving batch data to device 9.926 ms, 15.43 s total
[ 2023-10-08 00:09:52 ] Completed Epoch: 13 batch 405: forward pass 105.710 ms, 15.53 s total
[ 2023-10-08 00:09:52 ] Completed Epoch: 13 batch 405: backward pass 40.474 ms, 15.57 s total
[ 2023-10-08 00:09:52 ] Completed Epoch: 13 batch 405: computing loss 152.714 ms, 15.73 s total
EPOCH: [13], BATCH: [405/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 405
[ 2023-10-08 00:09:53 ] Completed saving temp checkpoint 1,131.871 ms, 16.86 s total
[ 2023-10-08 00:09:53 ] Completed replacing temp checkpoint with checkpoint 66.503 ms, 16.92 s total
[ 2023-10-08 00:09:53 ] Completed Epoch: 13 batch 406: moving batch data to device 9.081 ms, 16.93 s total
[ 2023-10-08 00:09:53 ] Completed Epoch: 13 batch 406: forward pass 107.662 ms, 17.04 s total
[ 2023-10-08 00:09:54 ] Completed Epoch: 13 batch 406: backward pass 83.092 ms, 17.12 s total
[ 2023-10-08 00:09:54 ] Completed Epoch: 13 batch 406: computing loss 117.158 ms, 17.24 s total
EPOCH: [13], BATCH: [406/889], loss: 0.427, loss_box_reg: 0.132, loss_classifier: 0.112, loss_mask: 0.129, loss_objectness: 0.021, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 406
[ 2023-10-08 00:09:55 ] Completed saving temp checkpoint 998.686 ms, 18.24 s total
[ 2023-10-08 00:09:55 ] Completed replacing temp checkpoint with checkpoint 69.365 ms, 18.31 s total
[ 2023-10-08 00:09:55 ] Completed Epoch: 13 batch 407: moving batch data to device 9.616 ms, 18.32 s total
[ 2023-10-08 00:09:55 ] Completed Epoch: 13 batch 407: forward pass 107.858 ms, 18.43 s total
[ 2023-10-08 00:09:55 ] Completed Epoch: 13 batch 407: backward pass 39.585 ms, 18.47 s total
[ 2023-10-08 00:09:55 ] Completed Epoch: 13 batch 407: computing loss 155.974 ms, 18.62 s total
EPOCH: [13], BATCH: [407/889], loss: 0.346, loss_box_reg: 0.100, loss_classifier: 0.082, loss_mask: 0.124, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 407
[ 2023-10-08 00:09:56 ] Completed saving temp checkpoint 1,156.981 ms, 19.78 s total
[ 2023-10-08 00:09:56 ] Completed replacing temp checkpoint with checkpoint 59.010 ms, 19.84 s total
[ 2023-10-08 00:09:56 ] Completed Epoch: 13 batch 408: moving batch data to device 8.004 ms, 19.85 s total
[ 2023-10-08 00:09:56 ] Completed Epoch: 13 batch 408: forward pass 105.285 ms, 19.95 s total
[ 2023-10-08 00:09:56 ] Completed Epoch: 13 batch 408: backward pass 72.263 ms, 20.02 s total
[ 2023-10-08 00:09:57 ] Completed Epoch: 13 batch 408: computing loss 123.374 ms, 20.15 s total
EPOCH: [13], BATCH: [408/889], loss: 0.358, loss_box_reg: 0.102, loss_classifier: 0.089, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 408
[ 2023-10-08 00:09:58 ] Completed saving temp checkpoint 993.402 ms, 21.14 s total
[ 2023-10-08 00:09:58 ] Completed replacing temp checkpoint with checkpoint 71.679 ms, 21.21 s total
[ 2023-10-08 00:09:58 ] Completed Epoch: 13 batch 409: moving batch data to device 5.925 ms, 21.22 s total
[ 2023-10-08 00:09:58 ] Completed Epoch: 13 batch 409: forward pass 106.876 ms, 21.32 s total
[ 2023-10-08 00:09:58 ] Completed Epoch: 13 batch 409: backward pass 31.971 ms, 21.36 s total
[ 2023-10-08 00:09:58 ] Completed Epoch: 13 batch 409: computing loss 164.053 ms, 21.52 s total
EPOCH: [13], BATCH: [409/889], loss: 0.410, loss_box_reg: 0.122, loss_classifier: 0.102, loss_mask: 0.139, loss_objectness: 0.016, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 409
[ 2023-10-08 00:09:59 ] Completed saving temp checkpoint 1,539.490 ms, 23.06 s total
[ 2023-10-08 00:10:00 ] Completed replacing temp checkpoint with checkpoint 76.489 ms, 23.14 s total
[ 2023-10-08 00:10:00 ] Completed Epoch: 13 batch 410: moving batch data to device 6.481 ms, 23.14 s total
[ 2023-10-08 00:10:00 ] Completed Epoch: 13 batch 410: forward pass 104.378 ms, 23.25 s total
[ 2023-10-08 00:10:00 ] Completed Epoch: 13 batch 410: backward pass 73.428 ms, 23.32 s total
[ 2023-10-08 00:10:00 ] Completed Epoch: 13 batch 410: computing loss 124.198 ms, 23.45 s total
EPOCH: [13], BATCH: [410/889], loss: 0.354, loss_box_reg: 0.106, loss_classifier: 0.082, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 410
[ 2023-10-08 00:10:02 ] Completed saving temp checkpoint 1,690.232 ms, 25.14 s total
[ 2023-10-08 00:10:02 ] Completed replacing temp checkpoint with checkpoint 79.079 ms, 25.21 s total
[ 2023-10-08 00:10:02 ] Completed Epoch: 13 batch 411: moving batch data to device 7.765 ms, 25.22 s total
[ 2023-10-08 00:10:02 ] Completed Epoch: 13 batch 411: forward pass 103.767 ms, 25.33 s total
[ 2023-10-08 00:10:02 ] Completed Epoch: 13 batch 411: backward pass 72.064 ms, 25.40 s total
[ 2023-10-08 00:10:02 ] Completed Epoch: 13 batch 411: computing loss 124.269 ms, 25.52 s total
EPOCH: [13], BATCH: [411/889], loss: 0.401, loss_box_reg: 0.123, loss_classifier: 0.105, loss_mask: 0.133, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 411
[ 2023-10-08 00:10:03 ] Completed saving temp checkpoint 1,225.109 ms, 26.75 s total
[ 2023-10-08 00:10:03 ] Completed replacing temp checkpoint with checkpoint 42.839 ms, 26.79 s total
[ 2023-10-08 00:10:03 ] Completed Epoch: 13 batch 412: moving batch data to device 5.001 ms, 26.80 s total
[ 2023-10-08 00:10:03 ] Completed Epoch: 13 batch 412: forward pass 105.178 ms, 26.90 s total
[ 2023-10-08 00:10:03 ] Completed Epoch: 13 batch 412: backward pass 71.978 ms, 26.97 s total
[ 2023-10-08 00:10:03 ] Completed Epoch: 13 batch 412: computing loss 120.017 ms, 27.09 s total
EPOCH: [13], BATCH: [412/889], loss: 0.372, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 412
[ 2023-10-08 00:10:05 ] Completed saving temp checkpoint 1,602.358 ms, 28.69 s total
[ 2023-10-08 00:10:05 ] Completed replacing temp checkpoint with checkpoint 105.759 ms, 28.80 s total
[ 2023-10-08 00:10:05 ] Completed Epoch: 13 batch 413: moving batch data to device 8.318 ms, 28.81 s total
[ 2023-10-08 00:10:05 ] Completed Epoch: 13 batch 413: forward pass 106.159 ms, 28.92 s total
[ 2023-10-08 00:10:05 ] Completed Epoch: 13 batch 413: backward pass 46.327 ms, 28.96 s total
[ 2023-10-08 00:10:05 ] Completed Epoch: 13 batch 413: computing loss 148.327 ms, 29.11 s total
EPOCH: [13], BATCH: [413/889], loss: 0.384, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.127, loss_objectness: 0.020, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 413
[ 2023-10-08 00:10:07 ] Completed saving temp checkpoint 1,715.622 ms, 30.83 s total
[ 2023-10-08 00:10:07 ] Completed replacing temp checkpoint with checkpoint 65.504 ms, 30.89 s total
[ 2023-10-08 00:10:07 ] Completed Epoch: 13 batch 414: moving batch data to device 7.711 ms, 30.90 s total
[ 2023-10-08 00:10:07 ] Completed Epoch: 13 batch 414: forward pass 109.577 ms, 31.01 s total
[ 2023-10-08 00:10:07 ] Completed Epoch: 13 batch 414: backward pass 80.269 ms, 31.09 s total
[ 2023-10-08 00:10:08 ] Completed Epoch: 13 batch 414: computing loss 112.593 ms, 31.20 s total
EPOCH: [13], BATCH: [414/889], loss: 0.345, loss_box_reg: 0.101, loss_classifier: 0.083, loss_mask: 0.123, loss_objectness: 0.017, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 414
[ 2023-10-08 00:10:09 ] Completed saving temp checkpoint 996.668 ms, 32.20 s total
[ 2023-10-08 00:10:09 ] Completed replacing temp checkpoint with checkpoint 69.654 ms, 32.27 s total
[ 2023-10-08 00:10:09 ] Completed Epoch: 13 batch 415: moving batch data to device 7.258 ms, 32.27 s total
[ 2023-10-08 00:10:09 ] Completed Epoch: 13 batch 415: forward pass 107.855 ms, 32.38 s total
[ 2023-10-08 00:10:09 ] Completed Epoch: 13 batch 415: backward pass 78.175 ms, 32.46 s total
[ 2023-10-08 00:10:09 ] Completed Epoch: 13 batch 415: computing loss 113.852 ms, 32.57 s total
EPOCH: [13], BATCH: [415/889], loss: 0.354, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 415
[ 2023-10-08 00:10:10 ] Completed saving temp checkpoint 988.153 ms, 33.56 s total
[ 2023-10-08 00:10:10 ] Completed replacing temp checkpoint with checkpoint 70.198 ms, 33.63 s total
[ 2023-10-08 00:10:10 ] Completed Epoch: 13 batch 416: moving batch data to device 8.468 ms, 33.64 s total
[ 2023-10-08 00:10:10 ] Completed Epoch: 13 batch 416: forward pass 106.398 ms, 33.75 s total
[ 2023-10-08 00:10:10 ] Completed Epoch: 13 batch 416: backward pass 42.163 ms, 33.79 s total
[ 2023-10-08 00:10:10 ] Completed Epoch: 13 batch 416: computing loss 140.533 ms, 33.93 s total
EPOCH: [13], BATCH: [416/889], loss: 0.375, loss_box_reg: 0.116, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 416
[ 2023-10-08 00:10:11 ] Completed saving temp checkpoint 959.335 ms, 34.89 s total
[ 2023-10-08 00:10:11 ] Completed replacing temp checkpoint with checkpoint 51.264 ms, 34.94 s total
[ 2023-10-08 00:10:11 ] Completed Epoch: 13 batch 417: moving batch data to device 8.485 ms, 34.95 s total
[ 2023-10-08 00:10:11 ] Completed Epoch: 13 batch 417: forward pass 108.838 ms, 35.06 s total
[ 2023-10-08 00:10:12 ] Completed Epoch: 13 batch 417: backward pass 76.837 ms, 35.14 s total
[ 2023-10-08 00:10:12 ] Completed Epoch: 13 batch 417: computing loss 119.379 ms, 35.25 s total
EPOCH: [13], BATCH: [417/889], loss: 0.362, loss_box_reg: 0.110, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 417
[ 2023-10-08 00:10:13 ] Completed saving temp checkpoint 1,181.724 ms, 36.44 s total
[ 2023-10-08 00:10:13 ] Completed replacing temp checkpoint with checkpoint 78.737 ms, 36.51 s total
[ 2023-10-08 00:10:13 ] Completed Epoch: 13 batch 418: moving batch data to device 7.097 ms, 36.52 s total
[ 2023-10-08 00:10:13 ] Completed Epoch: 13 batch 418: forward pass 107.082 ms, 36.63 s total
[ 2023-10-08 00:10:13 ] Completed Epoch: 13 batch 418: backward pass 75.453 ms, 36.70 s total
[ 2023-10-08 00:10:13 ] Completed Epoch: 13 batch 418: computing loss 117.590 ms, 36.82 s total
EPOCH: [13], BATCH: [418/889], loss: 0.372, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 418
[ 2023-10-08 00:10:14 ] Completed saving temp checkpoint 972.739 ms, 37.79 s total
[ 2023-10-08 00:10:14 ] Completed replacing temp checkpoint with checkpoint 26.786 ms, 37.82 s total
[ 2023-10-08 00:10:14 ] Completed Epoch: 13 batch 419: moving batch data to device 5.971 ms, 37.83 s total
[ 2023-10-08 00:10:14 ] Completed Epoch: 13 batch 419: forward pass 104.280 ms, 37.93 s total
[ 2023-10-08 00:10:14 ] Completed Epoch: 13 batch 419: backward pass 78.307 ms, 38.01 s total
[ 2023-10-08 00:10:15 ] Completed Epoch: 13 batch 419: computing loss 106.303 ms, 38.12 s total
EPOCH: [13], BATCH: [419/889], loss: 0.348, loss_box_reg: 0.104, loss_classifier: 0.082, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 419
[ 2023-10-08 00:10:16 ] Completed saving temp checkpoint 1,090.195 ms, 39.21 s total
[ 2023-10-08 00:10:16 ] Completed replacing temp checkpoint with checkpoint 73.979 ms, 39.28 s total
[ 2023-10-08 00:10:16 ] Completed Epoch: 13 batch 420: moving batch data to device 8.612 ms, 39.29 s total
[ 2023-10-08 00:10:16 ] Completed Epoch: 13 batch 420: forward pass 105.264 ms, 39.39 s total
[ 2023-10-08 00:10:16 ] Completed Epoch: 13 batch 420: backward pass 73.417 ms, 39.47 s total
[ 2023-10-08 00:10:16 ] Completed Epoch: 13 batch 420: computing loss 122.085 ms, 39.59 s total
EPOCH: [13], BATCH: [420/889], loss: 0.380, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 420
[ 2023-10-08 00:10:17 ] Completed saving temp checkpoint 1,051.361 ms, 40.64 s total
[ 2023-10-08 00:10:17 ] Completed replacing temp checkpoint with checkpoint 79.444 ms, 40.72 s total
[ 2023-10-08 00:10:17 ] Completed Epoch: 13 batch 421: moving batch data to device 6.457 ms, 40.73 s total
[ 2023-10-08 00:10:17 ] Completed Epoch: 13 batch 421: forward pass 107.342 ms, 40.83 s total
[ 2023-10-08 00:10:17 ] Completed Epoch: 13 batch 421: backward pass 35.905 ms, 40.87 s total
[ 2023-10-08 00:10:17 ] Completed Epoch: 13 batch 421: computing loss 134.440 ms, 41.01 s total
EPOCH: [13], BATCH: [421/889], loss: 0.370, loss_box_reg: 0.104, loss_classifier: 0.091, loss_mask: 0.126, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 421
[ 2023-10-08 00:10:19 ] Completed saving temp checkpoint 1,425.464 ms, 42.43 s total
[ 2023-10-08 00:10:19 ] Completed replacing temp checkpoint with checkpoint 68.341 ms, 42.50 s total
[ 2023-10-08 00:10:19 ] Completed Epoch: 13 batch 422: moving batch data to device 8.662 ms, 42.51 s total
[ 2023-10-08 00:10:19 ] Completed Epoch: 13 batch 422: forward pass 110.967 ms, 42.62 s total
[ 2023-10-08 00:10:19 ] Completed Epoch: 13 batch 422: backward pass 69.929 ms, 42.69 s total
[ 2023-10-08 00:10:19 ] Completed Epoch: 13 batch 422: computing loss 129.456 ms, 42.82 s total
EPOCH: [13], BATCH: [422/889], loss: 0.383, loss_box_reg: 0.122, loss_classifier: 0.090, loss_mask: 0.132, loss_objectness: 0.013, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 422
[ 2023-10-08 00:10:21 ] Completed saving temp checkpoint 1,396.568 ms, 44.21 s total
[ 2023-10-08 00:10:21 ] Completed replacing temp checkpoint with checkpoint 80.047 ms, 44.29 s total
[ 2023-10-08 00:10:21 ] Completed Epoch: 13 batch 423: moving batch data to device 6.923 ms, 44.30 s total
[ 2023-10-08 00:10:21 ] Completed Epoch: 13 batch 423: forward pass 110.219 ms, 44.41 s total
[ 2023-10-08 00:10:21 ] Completed Epoch: 13 batch 423: backward pass 76.882 ms, 44.49 s total
[ 2023-10-08 00:10:21 ] Completed Epoch: 13 batch 423: computing loss 112.401 ms, 44.60 s total
EPOCH: [13], BATCH: [423/889], loss: 0.377, loss_box_reg: 0.110, loss_classifier: 0.093, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 423
[ 2023-10-08 00:10:22 ] Completed saving temp checkpoint 1,252.446 ms, 45.85 s total
[ 2023-10-08 00:10:22 ] Completed replacing temp checkpoint with checkpoint 50.373 ms, 45.90 s total
[ 2023-10-08 00:10:22 ] Completed Epoch: 13 batch 424: moving batch data to device 7.364 ms, 45.91 s total
[ 2023-10-08 00:10:22 ] Completed Epoch: 13 batch 424: forward pass 109.785 ms, 46.02 s total
[ 2023-10-08 00:10:22 ] Completed Epoch: 13 batch 424: backward pass 72.388 ms, 46.09 s total
[ 2023-10-08 00:10:23 ] Completed Epoch: 13 batch 424: computing loss 102.446 ms, 46.20 s total
EPOCH: [13], BATCH: [424/889], loss: 0.389, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 424
[ 2023-10-08 00:10:24 ] Completed saving temp checkpoint 1,183.574 ms, 47.38 s total
[ 2023-10-08 00:10:24 ] Completed replacing temp checkpoint with checkpoint 89.904 ms, 47.47 s total
[ 2023-10-08 00:10:24 ] Completed Epoch: 13 batch 425: moving batch data to device 7.415 ms, 47.48 s total
[ 2023-10-08 00:10:24 ] Completed Epoch: 13 batch 425: forward pass 105.981 ms, 47.58 s total
[ 2023-10-08 00:10:24 ] Completed Epoch: 13 batch 425: backward pass 59.368 ms, 47.64 s total
[ 2023-10-08 00:10:24 ] Completed Epoch: 13 batch 425: computing loss 134.380 ms, 47.78 s total
EPOCH: [13], BATCH: [425/889], loss: 0.377, loss_box_reg: 0.116, loss_classifier: 0.095, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 425
[ 2023-10-08 00:10:26 ] Completed saving temp checkpoint 1,943.491 ms, 49.72 s total
[ 2023-10-08 00:10:26 ] Completed replacing temp checkpoint with checkpoint 108.407 ms, 49.83 s total
[ 2023-10-08 00:10:26 ] Completed Epoch: 13 batch 426: moving batch data to device 9.293 ms, 49.84 s total
[ 2023-10-08 00:10:26 ] Completed Epoch: 13 batch 426: forward pass 101.954 ms, 49.94 s total
[ 2023-10-08 00:10:26 ] Completed Epoch: 13 batch 426: backward pass 41.970 ms, 49.98 s total
[ 2023-10-08 00:10:27 ] Completed Epoch: 13 batch 426: computing loss 142.402 ms, 50.12 s total
EPOCH: [13], BATCH: [426/889], loss: 0.382, loss_box_reg: 0.115, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 426
[ 2023-10-08 00:10:28 ] Completed saving temp checkpoint 1,306.116 ms, 51.43 s total
[ 2023-10-08 00:10:28 ] Completed replacing temp checkpoint with checkpoint 88.309 ms, 51.52 s total
[ 2023-10-08 00:10:28 ] Completed Epoch: 13 batch 427: moving batch data to device 7.107 ms, 51.53 s total
[ 2023-10-08 00:10:28 ] Completed Epoch: 13 batch 427: forward pass 104.430 ms, 51.63 s total
[ 2023-10-08 00:10:28 ] Completed Epoch: 13 batch 427: backward pass 33.783 ms, 51.66 s total
[ 2023-10-08 00:10:28 ] Completed Epoch: 13 batch 427: computing loss 154.129 ms, 51.82 s total
EPOCH: [13], BATCH: [427/889], loss: 0.365, loss_box_reg: 0.111, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 427
[ 2023-10-08 00:10:29 ] Completed saving temp checkpoint 1,201.799 ms, 53.02 s total
[ 2023-10-08 00:10:29 ] Completed replacing temp checkpoint with checkpoint 63.996 ms, 53.08 s total
[ 2023-10-08 00:10:29 ] Completed Epoch: 13 batch 428: moving batch data to device 8.677 ms, 53.09 s total
[ 2023-10-08 00:10:30 ] Completed Epoch: 13 batch 428: forward pass 104.601 ms, 53.20 s total
[ 2023-10-08 00:10:30 ] Completed Epoch: 13 batch 428: backward pass 71.831 ms, 53.27 s total
[ 2023-10-08 00:10:30 ] Completed Epoch: 13 batch 428: computing loss 124.018 ms, 53.39 s total
EPOCH: [13], BATCH: [428/889], loss: 0.348, loss_box_reg: 0.107, loss_classifier: 0.082, loss_mask: 0.127, loss_objectness: 0.012, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 428
[ 2023-10-08 00:10:31 ] Completed saving temp checkpoint 954.882 ms, 54.35 s total
[ 2023-10-08 00:10:31 ] Completed replacing temp checkpoint with checkpoint 75.514 ms, 54.42 s total
[ 2023-10-08 00:10:31 ] Completed Epoch: 13 batch 429: moving batch data to device 7.001 ms, 54.43 s total
[ 2023-10-08 00:10:31 ] Completed Epoch: 13 batch 429: forward pass 105.870 ms, 54.54 s total
[ 2023-10-08 00:10:31 ] Completed Epoch: 13 batch 429: backward pass 74.038 ms, 54.61 s total
[ 2023-10-08 00:10:31 ] Completed Epoch: 13 batch 429: computing loss 120.416 ms, 54.73 s total
EPOCH: [13], BATCH: [429/889], loss: 0.395, loss_box_reg: 0.122, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 429
[ 2023-10-08 00:10:32 ] Completed saving temp checkpoint 945.189 ms, 55.68 s total
[ 2023-10-08 00:10:32 ] Completed replacing temp checkpoint with checkpoint 72.102 ms, 55.75 s total
[ 2023-10-08 00:10:32 ] Completed Epoch: 13 batch 430: moving batch data to device 8.776 ms, 55.76 s total
[ 2023-10-08 00:10:32 ] Completed Epoch: 13 batch 430: forward pass 101.526 ms, 55.86 s total
[ 2023-10-08 00:10:32 ] Completed Epoch: 13 batch 430: backward pass 76.065 ms, 55.93 s total
[ 2023-10-08 00:10:32 ] Completed Epoch: 13 batch 430: computing loss 95.837 ms, 56.03 s total
EPOCH: [13], BATCH: [430/889], loss: 0.400, loss_box_reg: 0.115, loss_classifier: 0.109, loss_mask: 0.126, loss_objectness: 0.017, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 430
[ 2023-10-08 00:10:33 ] Completed saving temp checkpoint 951.885 ms, 56.98 s total
[ 2023-10-08 00:10:33 ] Completed replacing temp checkpoint with checkpoint 63.036 ms, 57.04 s total
[ 2023-10-08 00:10:33 ] Completed Epoch: 13 batch 431: moving batch data to device 7.279 ms, 57.05 s total
[ 2023-10-08 00:10:34 ] Completed Epoch: 13 batch 431: forward pass 108.168 ms, 57.16 s total
[ 2023-10-08 00:10:34 ] Completed Epoch: 13 batch 431: backward pass 71.093 ms, 57.23 s total
[ 2023-10-08 00:10:34 ] Completed Epoch: 13 batch 431: computing loss 123.645 ms, 57.36 s total
EPOCH: [13], BATCH: [431/889], loss: 0.412, loss_box_reg: 0.129, loss_classifier: 0.103, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 431
[ 2023-10-08 00:10:35 ] Completed saving temp checkpoint 1,067.476 ms, 58.42 s total
[ 2023-10-08 00:10:35 ] Completed replacing temp checkpoint with checkpoint 69.122 ms, 58.49 s total
[ 2023-10-08 00:10:35 ] Completed Epoch: 13 batch 432: moving batch data to device 5.125 ms, 58.50 s total
[ 2023-10-08 00:10:35 ] Completed Epoch: 13 batch 432: forward pass 110.949 ms, 58.61 s total
[ 2023-10-08 00:10:35 ] Completed Epoch: 13 batch 432: backward pass 79.536 ms, 58.69 s total
[ 2023-10-08 00:10:35 ] Completed Epoch: 13 batch 432: computing loss 119.756 ms, 58.81 s total
EPOCH: [13], BATCH: [432/889], loss: 0.371, loss_box_reg: 0.107, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 432
[ 2023-10-08 00:10:37 ] Completed saving temp checkpoint 1,321.100 ms, 60.13 s total
[ 2023-10-08 00:10:37 ] Completed replacing temp checkpoint with checkpoint 68.820 ms, 60.20 s total
[ 2023-10-08 00:10:37 ] Completed Epoch: 13 batch 433: moving batch data to device 8.519 ms, 60.21 s total
[ 2023-10-08 00:10:37 ] Completed Epoch: 13 batch 433: forward pass 106.738 ms, 60.31 s total
[ 2023-10-08 00:10:37 ] Completed Epoch: 13 batch 433: backward pass 49.450 ms, 60.36 s total
[ 2023-10-08 00:10:37 ] Completed Epoch: 13 batch 433: computing loss 124.465 ms, 60.49 s total
EPOCH: [13], BATCH: [433/889], loss: 0.404, loss_box_reg: 0.117, loss_classifier: 0.104, loss_mask: 0.139, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 433
[ 2023-10-08 00:10:38 ] Completed saving temp checkpoint 1,624.244 ms, 62.11 s total
[ 2023-10-08 00:10:39 ] Completed replacing temp checkpoint with checkpoint 81.543 ms, 62.19 s total
[ 2023-10-08 00:10:39 ] Completed Epoch: 13 batch 434: moving batch data to device 9.682 ms, 62.20 s total
[ 2023-10-08 00:10:39 ] Completed Epoch: 13 batch 434: forward pass 109.819 ms, 62.31 s total
[ 2023-10-08 00:10:39 ] Completed Epoch: 13 batch 434: backward pass 79.493 ms, 62.39 s total
[ 2023-10-08 00:10:39 ] Completed Epoch: 13 batch 434: computing loss 118.891 ms, 62.51 s total
EPOCH: [13], BATCH: [434/889], loss: 0.408, loss_box_reg: 0.128, loss_classifier: 0.104, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 434
[ 2023-10-08 00:10:40 ] Completed saving temp checkpoint 1,449.955 ms, 63.96 s total
[ 2023-10-08 00:10:40 ] Completed replacing temp checkpoint with checkpoint 65.405 ms, 64.03 s total
[ 2023-10-08 00:10:40 ] Completed Epoch: 13 batch 435: moving batch data to device 6.607 ms, 64.03 s total
[ 2023-10-08 00:10:41 ] Completed Epoch: 13 batch 435: forward pass 109.764 ms, 64.14 s total
[ 2023-10-08 00:10:41 ] Completed Epoch: 13 batch 435: backward pass 71.658 ms, 64.21 s total
[ 2023-10-08 00:10:41 ] Completed Epoch: 13 batch 435: computing loss 120.854 ms, 64.33 s total
EPOCH: [13], BATCH: [435/889], loss: 0.385, loss_box_reg: 0.115, loss_classifier: 0.092, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 435
[ 2023-10-08 00:10:42 ] Completed saving temp checkpoint 1,630.021 ms, 65.96 s total
[ 2023-10-08 00:10:42 ] Completed replacing temp checkpoint with checkpoint 58.745 ms, 66.02 s total
[ 2023-10-08 00:10:42 ] Completed Epoch: 13 batch 436: moving batch data to device 6.784 ms, 66.03 s total
[ 2023-10-08 00:10:43 ] Completed Epoch: 13 batch 436: forward pass 101.636 ms, 66.13 s total
[ 2023-10-08 00:10:43 ] Completed Epoch: 13 batch 436: backward pass 79.814 ms, 66.21 s total
[ 2023-10-08 00:10:43 ] Completed Epoch: 13 batch 436: computing loss 114.576 ms, 66.33 s total
EPOCH: [13], BATCH: [436/889], loss: 0.360, loss_box_reg: 0.106, loss_classifier: 0.091, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 436
[ 2023-10-08 00:10:44 ] Completed saving temp checkpoint 1,023.315 ms, 67.35 s total
[ 2023-10-08 00:10:44 ] Completed replacing temp checkpoint with checkpoint 56.390 ms, 67.41 s total
[ 2023-10-08 00:10:44 ] Completed Epoch: 13 batch 437: moving batch data to device 4.311 ms, 67.41 s total
[ 2023-10-08 00:10:44 ] Completed Epoch: 13 batch 437: forward pass 106.662 ms, 67.52 s total
[ 2023-10-08 00:10:44 ] Completed Epoch: 13 batch 437: backward pass 34.966 ms, 67.55 s total
[ 2023-10-08 00:10:44 ] Completed Epoch: 13 batch 437: computing loss 131.644 ms, 67.68 s total
EPOCH: [13], BATCH: [437/889], loss: 0.392, loss_box_reg: 0.117, loss_classifier: 0.095, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 437
[ 2023-10-08 00:10:45 ] Completed saving temp checkpoint 1,135.008 ms, 68.82 s total
[ 2023-10-08 00:10:45 ] Completed replacing temp checkpoint with checkpoint 61.065 ms, 68.88 s total
[ 2023-10-08 00:10:45 ] Completed Epoch: 13 batch 438: moving batch data to device 7.632 ms, 68.89 s total
[ 2023-10-08 00:10:45 ] Completed Epoch: 13 batch 438: forward pass 107.459 ms, 68.99 s total
[ 2023-10-08 00:10:45 ] Completed Epoch: 13 batch 438: backward pass 71.343 ms, 69.07 s total
[ 2023-10-08 00:10:46 ] Completed Epoch: 13 batch 438: computing loss 105.443 ms, 69.17 s total
EPOCH: [13], BATCH: [438/889], loss: 0.398, loss_box_reg: 0.126, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 438
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 00:26:47 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 00:26:47 ] Completed importing Timer 0.026 ms, 0.00 s total
[ 2023-10-08 00:26:48 ] Completed importing everything else 561.263 ms, 0.56 s total
[ 2023-10-08 00:26:48 ] Completed defined other functions 0.024 ms, 0.56 s total
| distributed init (rank 1): env://
| distributed init (rank 4): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 0): env://
| distributed init (rank 3): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 00:26:51 ] Completed main preliminaries 3,091.399 ms, 3.65 s total
loading annotations into memory...
Done (t=10.26s)
creating index...
index created!
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
[ 2023-10-08 00:27:03 ] Completed loading data 11,979.255 ms, 15.63 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 00:27:03 ] Completed creating data samplers 93.937 ms, 15.73 s total
[ 2023-10-08 00:27:03 ] Completed creating data loaders 0.194 ms, 15.73 s total
[ 2023-10-08 00:27:03 ] Completed creating model and .to(device) 647.964 ms, 16.37 s total
[ 2023-10-08 00:27:06 ] Completed preparing model for distributed training 2,742.250 ms, 19.12 s total
[ 2023-10-08 00:27:06 ] Completed optimizer and scaler 0.568 ms, 19.12 s total
[ 2023-10-08 00:27:06 ] Completed learning rate schedulers 0.206 ms, 19.12 s total
[ 2023-10-08 00:27:07 ] Completed init coco evaluator 954.332 ms, 20.07 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 00:27:08 ] Completed retrieving checkpoint 897.476 ms, 20.97 s total
EPOCH :: 13
[ 2023-10-08 00:27:08 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 00:27:08 ] Completed training preliminaries 0.929 ms, 0.00 s total
Training / resuming epoch 13 from training step 438
[ 2023-10-08 00:27:09 ] Completed Epoch: 13 batch 438: moving batch data to device 466.706 ms, 0.47 s total
[ 2023-10-08 00:27:10 ] Completed Epoch: 13 batch 438: forward pass 1,012.193 ms, 1.48 s total
[ 2023-10-08 00:27:10 ] Completed Epoch: 13 batch 438: backward pass 185.818 ms, 1.67 s total
[ 2023-10-08 00:27:10 ] Completed Epoch: 13 batch 438: computing loss 472.224 ms, 2.14 s total
EPOCH: [13], BATCH: [438/889], loss: 0.404, loss_box_reg: 0.129, loss_classifier: 0.102, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 438
[ 2023-10-08 00:27:11 ] Completed saving temp checkpoint 1,046.752 ms, 3.18 s total
[ 2023-10-08 00:27:11 ] Completed replacing temp checkpoint with checkpoint 146.926 ms, 3.33 s total
[ 2023-10-08 00:27:11 ] Completed Epoch: 13 batch 439: moving batch data to device 2.879 ms, 3.33 s total
[ 2023-10-08 00:27:11 ] Completed Epoch: 13 batch 439: forward pass 108.444 ms, 3.44 s total
[ 2023-10-08 00:27:12 ] Completed Epoch: 13 batch 439: backward pass 81.614 ms, 3.52 s total
[ 2023-10-08 00:27:12 ] Completed Epoch: 13 batch 439: computing loss 138.446 ms, 3.66 s total
EPOCH: [13], BATCH: [439/889], loss: 0.377, loss_box_reg: 0.115, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 439
[ 2023-10-08 00:27:13 ] Completed saving temp checkpoint 930.346 ms, 4.59 s total
[ 2023-10-08 00:27:13 ] Completed replacing temp checkpoint with checkpoint 54.635 ms, 4.65 s total
[ 2023-10-08 00:27:13 ] Completed Epoch: 13 batch 440: moving batch data to device 4.822 ms, 4.65 s total
[ 2023-10-08 00:27:13 ] Completed Epoch: 13 batch 440: forward pass 110.622 ms, 4.76 s total
[ 2023-10-08 00:27:13 ] Completed Epoch: 13 batch 440: backward pass 89.442 ms, 4.85 s total
[ 2023-10-08 00:27:13 ] Completed Epoch: 13 batch 440: computing loss 127.149 ms, 4.98 s total
EPOCH: [13], BATCH: [440/889], loss: 0.368, loss_box_reg: 0.109, loss_classifier: 0.097, loss_mask: 0.121, loss_objectness: 0.017, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 440
[ 2023-10-08 00:27:14 ] Completed saving temp checkpoint 1,006.189 ms, 5.99 s total
[ 2023-10-08 00:27:14 ] Completed replacing temp checkpoint with checkpoint 62.406 ms, 6.05 s total
[ 2023-10-08 00:27:14 ] Completed Epoch: 13 batch 441: moving batch data to device 60.081 ms, 6.11 s total
[ 2023-10-08 00:27:14 ] Completed Epoch: 13 batch 441: forward pass 106.278 ms, 6.21 s total
[ 2023-10-08 00:27:14 ] Completed Epoch: 13 batch 441: backward pass 39.206 ms, 6.25 s total
[ 2023-10-08 00:27:14 ] Completed Epoch: 13 batch 441: computing loss 170.487 ms, 6.42 s total
EPOCH: [13], BATCH: [441/889], loss: 0.381, loss_box_reg: 0.113, loss_classifier: 0.092, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.035
Saving checkpoint at epoch 13 train batch 441
[ 2023-10-08 00:27:15 ] Completed saving temp checkpoint 983.290 ms, 7.41 s total
[ 2023-10-08 00:27:16 ] Completed replacing temp checkpoint with checkpoint 67.286 ms, 7.48 s total
[ 2023-10-08 00:27:16 ] Completed Epoch: 13 batch 442: moving batch data to device 4.200 ms, 7.48 s total
[ 2023-10-08 00:27:16 ] Completed Epoch: 13 batch 442: forward pass 111.924 ms, 7.59 s total
[ 2023-10-08 00:27:16 ] Completed Epoch: 13 batch 442: backward pass 80.811 ms, 7.67 s total
[ 2023-10-08 00:27:16 ] Completed Epoch: 13 batch 442: computing loss 128.474 ms, 7.80 s total
EPOCH: [13], BATCH: [442/889], loss: 0.407, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 442
[ 2023-10-08 00:27:17 ] Completed saving temp checkpoint 1,034.166 ms, 8.83 s total
[ 2023-10-08 00:27:17 ] Completed replacing temp checkpoint with checkpoint 71.574 ms, 8.91 s total
[ 2023-10-08 00:27:17 ] Completed Epoch: 13 batch 443: moving batch data to device 3.767 ms, 8.91 s total
[ 2023-10-08 00:27:17 ] Completed Epoch: 13 batch 443: forward pass 107.761 ms, 9.02 s total
[ 2023-10-08 00:27:17 ] Completed Epoch: 13 batch 443: backward pass 73.438 ms, 9.09 s total
[ 2023-10-08 00:27:17 ] Completed Epoch: 13 batch 443: computing loss 138.865 ms, 9.23 s total
EPOCH: [13], BATCH: [443/889], loss: 0.396, loss_box_reg: 0.119, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 443
[ 2023-10-08 00:27:18 ] Completed saving temp checkpoint 908.331 ms, 10.14 s total
[ 2023-10-08 00:27:18 ] Completed replacing temp checkpoint with checkpoint 45.678 ms, 10.18 s total
[ 2023-10-08 00:27:18 ] Completed Epoch: 13 batch 444: moving batch data to device 3.848 ms, 10.19 s total
[ 2023-10-08 00:27:18 ] Completed Epoch: 13 batch 444: forward pass 185.089 ms, 10.37 s total
[ 2023-10-08 00:27:18 ] Completed Epoch: 13 batch 444: backward pass 54.344 ms, 10.43 s total
[ 2023-10-08 00:27:19 ] Completed Epoch: 13 batch 444: computing loss 251.128 ms, 10.68 s total
EPOCH: [13], BATCH: [444/889], loss: 0.410, loss_box_reg: 0.126, loss_classifier: 0.103, loss_mask: 0.135, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 444
[ 2023-10-08 00:27:20 ] Completed saving temp checkpoint 1,024.248 ms, 11.70 s total
[ 2023-10-08 00:27:20 ] Completed replacing temp checkpoint with checkpoint 72.850 ms, 11.78 s total
[ 2023-10-08 00:27:20 ] Completed Epoch: 13 batch 445: moving batch data to device 7.246 ms, 11.78 s total
[ 2023-10-08 00:27:20 ] Completed Epoch: 13 batch 445: forward pass 108.208 ms, 11.89 s total
[ 2023-10-08 00:27:20 ] Completed Epoch: 13 batch 445: backward pass 74.781 ms, 11.97 s total
[ 2023-10-08 00:27:20 ] Completed Epoch: 13 batch 445: computing loss 117.956 ms, 12.08 s total
EPOCH: [13], BATCH: [445/889], loss: 0.382, loss_box_reg: 0.110, loss_classifier: 0.096, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 445
[ 2023-10-08 00:27:21 ] Completed saving temp checkpoint 794.574 ms, 12.88 s total
[ 2023-10-08 00:27:21 ] Completed replacing temp checkpoint with checkpoint 60.623 ms, 12.94 s total
[ 2023-10-08 00:27:21 ] Completed Epoch: 13 batch 446: moving batch data to device 4.231 ms, 12.94 s total
[ 2023-10-08 00:27:21 ] Completed Epoch: 13 batch 446: forward pass 114.677 ms, 13.06 s total
[ 2023-10-08 00:27:21 ] Completed Epoch: 13 batch 446: backward pass 74.685 ms, 13.13 s total
[ 2023-10-08 00:27:21 ] Completed Epoch: 13 batch 446: computing loss 120.701 ms, 13.25 s total
EPOCH: [13], BATCH: [446/889], loss: 0.391, loss_box_reg: 0.119, loss_classifier: 0.096, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 446
[ 2023-10-08 00:27:23 ] Completed saving temp checkpoint 1,246.891 ms, 14.50 s total
[ 2023-10-08 00:27:23 ] Completed replacing temp checkpoint with checkpoint 69.354 ms, 14.57 s total
[ 2023-10-08 00:27:23 ] Completed Epoch: 13 batch 447: moving batch data to device 6.937 ms, 14.58 s total
[ 2023-10-08 00:27:23 ] Completed Epoch: 13 batch 447: forward pass 106.765 ms, 14.68 s total
[ 2023-10-08 00:27:23 ] Completed Epoch: 13 batch 447: backward pass 39.750 ms, 14.72 s total
[ 2023-10-08 00:27:23 ] Completed Epoch: 13 batch 447: computing loss 150.955 ms, 14.87 s total
EPOCH: [13], BATCH: [447/889], loss: 0.382, loss_box_reg: 0.109, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 447
[ 2023-10-08 00:27:24 ] Completed saving temp checkpoint 894.944 ms, 15.77 s total
[ 2023-10-08 00:27:24 ] Completed replacing temp checkpoint with checkpoint 56.535 ms, 15.83 s total
[ 2023-10-08 00:27:24 ] Completed Epoch: 13 batch 448: moving batch data to device 4.997 ms, 15.83 s total
[ 2023-10-08 00:27:24 ] Completed Epoch: 13 batch 448: forward pass 105.805 ms, 15.94 s total
[ 2023-10-08 00:27:24 ] Completed Epoch: 13 batch 448: backward pass 80.304 ms, 16.02 s total
[ 2023-10-08 00:27:24 ] Completed Epoch: 13 batch 448: computing loss 109.684 ms, 16.13 s total
EPOCH: [13], BATCH: [448/889], loss: 0.390, loss_box_reg: 0.112, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 448
[ 2023-10-08 00:27:26 ] Completed saving temp checkpoint 1,769.599 ms, 17.90 s total
[ 2023-10-08 00:27:26 ] Completed replacing temp checkpoint with checkpoint 72.178 ms, 17.97 s total
[ 2023-10-08 00:27:26 ] Completed Epoch: 13 batch 449: moving batch data to device 5.390 ms, 17.97 s total
[ 2023-10-08 00:27:26 ] Completed Epoch: 13 batch 449: forward pass 106.190 ms, 18.08 s total
[ 2023-10-08 00:27:26 ] Completed Epoch: 13 batch 449: backward pass 73.987 ms, 18.15 s total
[ 2023-10-08 00:27:26 ] Completed Epoch: 13 batch 449: computing loss 115.214 ms, 18.27 s total
EPOCH: [13], BATCH: [449/889], loss: 0.406, loss_box_reg: 0.121, loss_classifier: 0.102, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 449
[ 2023-10-08 00:27:27 ] Completed saving temp checkpoint 1,042.814 ms, 19.31 s total
[ 2023-10-08 00:27:27 ] Completed replacing temp checkpoint with checkpoint 48.396 ms, 19.36 s total
[ 2023-10-08 00:27:27 ] Completed Epoch: 13 batch 450: moving batch data to device 4.678 ms, 19.36 s total
[ 2023-10-08 00:27:28 ] Completed Epoch: 13 batch 450: forward pass 107.892 ms, 19.47 s total
[ 2023-10-08 00:27:28 ] Completed Epoch: 13 batch 450: backward pass 50.137 ms, 19.52 s total
[ 2023-10-08 00:27:28 ] Completed Epoch: 13 batch 450: computing loss 143.301 ms, 19.67 s total
EPOCH: [13], BATCH: [450/889], loss: 0.412, loss_box_reg: 0.126, loss_classifier: 0.106, loss_mask: 0.132, loss_objectness: 0.020, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 450
[ 2023-10-08 00:27:30 ] Completed saving temp checkpoint 2,050.121 ms, 21.72 s total
[ 2023-10-08 00:27:30 ] Completed replacing temp checkpoint with checkpoint 64.998 ms, 21.78 s total
[ 2023-10-08 00:27:30 ] Completed Epoch: 13 batch 451: moving batch data to device 4.809 ms, 21.79 s total
[ 2023-10-08 00:27:30 ] Completed Epoch: 13 batch 451: forward pass 102.926 ms, 21.89 s total
[ 2023-10-08 00:27:30 ] Completed Epoch: 13 batch 451: backward pass 81.644 ms, 21.97 s total
[ 2023-10-08 00:27:30 ] Completed Epoch: 13 batch 451: computing loss 115.423 ms, 22.09 s total
EPOCH: [13], BATCH: [451/889], loss: 0.418, loss_box_reg: 0.125, loss_classifier: 0.105, loss_mask: 0.136, loss_objectness: 0.019, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 451
[ 2023-10-08 00:27:31 ] Completed saving temp checkpoint 965.228 ms, 23.05 s total
[ 2023-10-08 00:27:31 ] Completed replacing temp checkpoint with checkpoint 68.568 ms, 23.12 s total
[ 2023-10-08 00:27:31 ] Completed Epoch: 13 batch 452: moving batch data to device 7.143 ms, 23.13 s total
[ 2023-10-08 00:27:31 ] Completed Epoch: 13 batch 452: forward pass 103.507 ms, 23.23 s total
[ 2023-10-08 00:27:31 ] Completed Epoch: 13 batch 452: backward pass 51.675 ms, 23.28 s total
[ 2023-10-08 00:27:31 ] Completed Epoch: 13 batch 452: computing loss 149.339 ms, 23.43 s total
EPOCH: [13], BATCH: [452/889], loss: 0.370, loss_box_reg: 0.109, loss_classifier: 0.090, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 452
[ 2023-10-08 00:27:33 ] Completed saving temp checkpoint 1,104.294 ms, 24.54 s total
[ 2023-10-08 00:27:33 ] Completed replacing temp checkpoint with checkpoint 73.873 ms, 24.61 s total
[ 2023-10-08 00:27:33 ] Completed Epoch: 13 batch 453: moving batch data to device 7.403 ms, 24.62 s total
[ 2023-10-08 00:27:33 ] Completed Epoch: 13 batch 453: forward pass 107.599 ms, 24.72 s total
[ 2023-10-08 00:27:33 ] Completed Epoch: 13 batch 453: backward pass 71.845 ms, 24.80 s total
[ 2023-10-08 00:27:33 ] Completed Epoch: 13 batch 453: computing loss 117.709 ms, 24.91 s total
EPOCH: [13], BATCH: [453/889], loss: 0.393, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 453
[ 2023-10-08 00:27:34 ] Completed saving temp checkpoint 948.605 ms, 25.86 s total
[ 2023-10-08 00:27:34 ] Completed replacing temp checkpoint with checkpoint 67.362 ms, 25.93 s total
[ 2023-10-08 00:27:34 ] Completed Epoch: 13 batch 454: moving batch data to device 6.855 ms, 25.94 s total
[ 2023-10-08 00:27:34 ] Completed Epoch: 13 batch 454: forward pass 109.587 ms, 26.05 s total
[ 2023-10-08 00:27:34 ] Completed Epoch: 13 batch 454: backward pass 49.519 ms, 26.10 s total
[ 2023-10-08 00:27:34 ] Completed Epoch: 13 batch 454: computing loss 142.130 ms, 26.24 s total
EPOCH: [13], BATCH: [454/889], loss: 0.381, loss_box_reg: 0.117, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 454
[ 2023-10-08 00:27:35 ] Completed saving temp checkpoint 1,201.299 ms, 27.44 s total
[ 2023-10-08 00:27:36 ] Completed replacing temp checkpoint with checkpoint 74.214 ms, 27.51 s total
[ 2023-10-08 00:27:36 ] Completed Epoch: 13 batch 455: moving batch data to device 8.654 ms, 27.52 s total
[ 2023-10-08 00:27:36 ] Completed Epoch: 13 batch 455: forward pass 106.475 ms, 27.63 s total
[ 2023-10-08 00:27:36 ] Completed Epoch: 13 batch 455: backward pass 75.284 ms, 27.70 s total
[ 2023-10-08 00:27:36 ] Completed Epoch: 13 batch 455: computing loss 119.655 ms, 27.82 s total
EPOCH: [13], BATCH: [455/889], loss: 0.391, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 455
[ 2023-10-08 00:27:37 ] Completed saving temp checkpoint 1,010.907 ms, 28.83 s total
[ 2023-10-08 00:27:37 ] Completed replacing temp checkpoint with checkpoint 72.131 ms, 28.91 s total
[ 2023-10-08 00:27:37 ] Completed Epoch: 13 batch 456: moving batch data to device 8.182 ms, 28.92 s total
[ 2023-10-08 00:27:37 ] Completed Epoch: 13 batch 456: forward pass 100.909 ms, 29.02 s total
[ 2023-10-08 00:27:37 ] Completed Epoch: 13 batch 456: backward pass 49.347 ms, 29.07 s total
[ 2023-10-08 00:27:37 ] Completed Epoch: 13 batch 456: computing loss 142.818 ms, 29.21 s total
EPOCH: [13], BATCH: [456/889], loss: 0.364, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 456
[ 2023-10-08 00:27:38 ] Completed saving temp checkpoint 1,039.352 ms, 30.25 s total
[ 2023-10-08 00:27:38 ] Completed replacing temp checkpoint with checkpoint 48.820 ms, 30.30 s total
[ 2023-10-08 00:27:38 ] Completed Epoch: 13 batch 457: moving batch data to device 7.210 ms, 30.30 s total
[ 2023-10-08 00:27:38 ] Completed Epoch: 13 batch 457: forward pass 104.599 ms, 30.41 s total
[ 2023-10-08 00:27:38 ] Completed Epoch: 13 batch 457: backward pass 38.459 ms, 30.45 s total
[ 2023-10-08 00:27:39 ] Completed Epoch: 13 batch 457: computing loss 157.175 ms, 30.60 s total
EPOCH: [13], BATCH: [457/889], loss: 0.380, loss_box_reg: 0.111, loss_classifier: 0.100, loss_mask: 0.124, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 457
[ 2023-10-08 00:27:40 ] Completed saving temp checkpoint 978.910 ms, 31.58 s total
[ 2023-10-08 00:27:40 ] Completed replacing temp checkpoint with checkpoint 50.745 ms, 31.63 s total
[ 2023-10-08 00:27:40 ] Completed Epoch: 13 batch 458: moving batch data to device 5.231 ms, 31.64 s total
[ 2023-10-08 00:27:40 ] Completed Epoch: 13 batch 458: forward pass 103.627 ms, 31.74 s total
[ 2023-10-08 00:27:40 ] Completed Epoch: 13 batch 458: backward pass 34.002 ms, 31.78 s total
[ 2023-10-08 00:27:40 ] Completed Epoch: 13 batch 458: computing loss 157.623 ms, 31.93 s total
EPOCH: [13], BATCH: [458/889], loss: 0.378, loss_box_reg: 0.118, loss_classifier: 0.093, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 458
[ 2023-10-08 00:27:41 ] Completed saving temp checkpoint 1,209.694 ms, 33.14 s total
[ 2023-10-08 00:27:41 ] Completed replacing temp checkpoint with checkpoint 72.741 ms, 33.22 s total
[ 2023-10-08 00:27:41 ] Completed Epoch: 13 batch 459: moving batch data to device 7.548 ms, 33.22 s total
[ 2023-10-08 00:27:41 ] Completed Epoch: 13 batch 459: forward pass 108.812 ms, 33.33 s total
[ 2023-10-08 00:27:41 ] Completed Epoch: 13 batch 459: backward pass 81.325 ms, 33.41 s total
[ 2023-10-08 00:27:42 ] Completed Epoch: 13 batch 459: computing loss 107.860 ms, 33.52 s total
EPOCH: [13], BATCH: [459/889], loss: 0.412, loss_box_reg: 0.122, loss_classifier: 0.101, loss_mask: 0.139, loss_objectness: 0.016, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 459
[ 2023-10-08 00:27:43 ] Completed saving temp checkpoint 1,216.412 ms, 34.74 s total
[ 2023-10-08 00:27:43 ] Completed replacing temp checkpoint with checkpoint 65.863 ms, 34.80 s total
[ 2023-10-08 00:27:43 ] Completed Epoch: 13 batch 460: moving batch data to device 4.960 ms, 34.81 s total
[ 2023-10-08 00:27:43 ] Completed Epoch: 13 batch 460: forward pass 105.510 ms, 34.91 s total
[ 2023-10-08 00:27:43 ] Completed Epoch: 13 batch 460: backward pass 45.692 ms, 34.96 s total
[ 2023-10-08 00:27:43 ] Completed Epoch: 13 batch 460: computing loss 143.720 ms, 35.10 s total
EPOCH: [13], BATCH: [460/889], loss: 0.346, loss_box_reg: 0.102, loss_classifier: 0.083, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 460
[ 2023-10-08 00:27:45 ] Completed saving temp checkpoint 1,617.656 ms, 36.72 s total
[ 2023-10-08 00:27:45 ] Completed replacing temp checkpoint with checkpoint 89.987 ms, 36.81 s total
[ 2023-10-08 00:27:45 ] Completed Epoch: 13 batch 461: moving batch data to device 8.021 ms, 36.82 s total
[ 2023-10-08 00:27:45 ] Completed Epoch: 13 batch 461: forward pass 104.611 ms, 36.92 s total
[ 2023-10-08 00:27:45 ] Completed Epoch: 13 batch 461: backward pass 42.164 ms, 36.97 s total
[ 2023-10-08 00:27:45 ] Completed Epoch: 13 batch 461: computing loss 141.510 ms, 37.11 s total
EPOCH: [13], BATCH: [461/889], loss: 0.407, loss_box_reg: 0.124, loss_classifier: 0.107, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 461
[ 2023-10-08 00:27:46 ] Completed saving temp checkpoint 993.900 ms, 38.10 s total
[ 2023-10-08 00:27:46 ] Completed replacing temp checkpoint with checkpoint 71.706 ms, 38.17 s total
[ 2023-10-08 00:27:46 ] Completed Epoch: 13 batch 462: moving batch data to device 6.764 ms, 38.18 s total
[ 2023-10-08 00:27:46 ] Completed Epoch: 13 batch 462: forward pass 116.752 ms, 38.30 s total
[ 2023-10-08 00:27:46 ] Completed Epoch: 13 batch 462: backward pass 84.333 ms, 38.38 s total
[ 2023-10-08 00:27:47 ] Completed Epoch: 13 batch 462: computing loss 103.025 ms, 38.48 s total
EPOCH: [13], BATCH: [462/889], loss: 0.382, loss_box_reg: 0.114, loss_classifier: 0.101, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 462
[ 2023-10-08 00:27:48 ] Completed saving temp checkpoint 1,728.020 ms, 40.21 s total
[ 2023-10-08 00:27:48 ] Completed replacing temp checkpoint with checkpoint 86.761 ms, 40.30 s total
[ 2023-10-08 00:27:48 ] Completed Epoch: 13 batch 463: moving batch data to device 9.055 ms, 40.31 s total
[ 2023-10-08 00:27:48 ] Completed Epoch: 13 batch 463: forward pass 105.415 ms, 40.41 s total
[ 2023-10-08 00:27:49 ] Completed Epoch: 13 batch 463: backward pass 81.602 ms, 40.50 s total
[ 2023-10-08 00:27:49 ] Completed Epoch: 13 batch 463: computing loss 108.276 ms, 40.60 s total
EPOCH: [13], BATCH: [463/889], loss: 0.415, loss_box_reg: 0.129, loss_classifier: 0.108, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 463
[ 2023-10-08 00:27:50 ] Completed saving temp checkpoint 1,031.141 ms, 41.63 s total
[ 2023-10-08 00:27:50 ] Completed replacing temp checkpoint with checkpoint 70.420 ms, 41.71 s total
[ 2023-10-08 00:27:50 ] Completed Epoch: 13 batch 464: moving batch data to device 8.894 ms, 41.71 s total
[ 2023-10-08 00:27:50 ] Completed Epoch: 13 batch 464: forward pass 108.671 ms, 41.82 s total
[ 2023-10-08 00:27:50 ] Completed Epoch: 13 batch 464: backward pass 74.876 ms, 41.90 s total
[ 2023-10-08 00:27:50 ] Completed Epoch: 13 batch 464: computing loss 119.630 ms, 42.02 s total
EPOCH: [13], BATCH: [464/889], loss: 0.385, loss_box_reg: 0.111, loss_classifier: 0.093, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 464
[ 2023-10-08 00:27:51 ] Completed saving temp checkpoint 1,139.162 ms, 43.16 s total
[ 2023-10-08 00:27:51 ] Completed replacing temp checkpoint with checkpoint 63.691 ms, 43.22 s total
[ 2023-10-08 00:27:51 ] Completed Epoch: 13 batch 465: moving batch data to device 7.099 ms, 43.23 s total
[ 2023-10-08 00:27:51 ] Completed Epoch: 13 batch 465: forward pass 106.853 ms, 43.33 s total
[ 2023-10-08 00:27:51 ] Completed Epoch: 13 batch 465: backward pass 34.916 ms, 43.37 s total
[ 2023-10-08 00:27:52 ] Completed Epoch: 13 batch 465: computing loss 370.792 ms, 43.74 s total
EPOCH: [13], BATCH: [465/889], loss: 0.384, loss_box_reg: 0.112, loss_classifier: 0.099, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 465
[ 2023-10-08 00:27:53 ] Completed saving temp checkpoint 1,049.448 ms, 44.79 s total
[ 2023-10-08 00:27:53 ] Completed replacing temp checkpoint with checkpoint 72.676 ms, 44.86 s total
[ 2023-10-08 00:27:53 ] Completed Epoch: 13 batch 466: moving batch data to device 8.762 ms, 44.87 s total
[ 2023-10-08 00:27:53 ] Completed Epoch: 13 batch 466: forward pass 106.274 ms, 44.98 s total
[ 2023-10-08 00:27:53 ] Completed Epoch: 13 batch 466: backward pass 41.865 ms, 45.02 s total
[ 2023-10-08 00:27:53 ] Completed Epoch: 13 batch 466: computing loss 152.258 ms, 45.17 s total
EPOCH: [13], BATCH: [466/889], loss: 0.400, loss_box_reg: 0.116, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 466
[ 2023-10-08 00:27:54 ] Completed saving temp checkpoint 1,093.764 ms, 46.26 s total
[ 2023-10-08 00:27:54 ] Completed replacing temp checkpoint with checkpoint 60.484 ms, 46.33 s total
[ 2023-10-08 00:27:54 ] Completed Epoch: 13 batch 467: moving batch data to device 6.267 ms, 46.33 s total
[ 2023-10-08 00:27:54 ] Completed Epoch: 13 batch 467: forward pass 104.425 ms, 46.44 s total
[ 2023-10-08 00:27:55 ] Completed Epoch: 13 batch 467: backward pass 79.081 ms, 46.51 s total
[ 2023-10-08 00:27:55 ] Completed Epoch: 13 batch 467: computing loss 113.438 ms, 46.63 s total
EPOCH: [13], BATCH: [467/889], loss: 0.354, loss_box_reg: 0.106, loss_classifier: 0.086, loss_mask: 0.123, loss_objectness: 0.017, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 467
[ 2023-10-08 00:27:56 ] Completed saving temp checkpoint 1,087.961 ms, 47.72 s total
[ 2023-10-08 00:27:56 ] Completed replacing temp checkpoint with checkpoint 73.579 ms, 47.79 s total
[ 2023-10-08 00:27:56 ] Completed Epoch: 13 batch 468: moving batch data to device 8.476 ms, 47.80 s total
[ 2023-10-08 00:27:56 ] Completed Epoch: 13 batch 468: forward pass 103.971 ms, 47.90 s total
[ 2023-10-08 00:27:56 ] Completed Epoch: 13 batch 468: backward pass 73.076 ms, 47.98 s total
[ 2023-10-08 00:27:56 ] Completed Epoch: 13 batch 468: computing loss 120.382 ms, 48.10 s total
EPOCH: [13], BATCH: [468/889], loss: 0.400, loss_box_reg: 0.119, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.019, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 468
[ 2023-10-08 00:27:57 ] Completed saving temp checkpoint 1,105.642 ms, 49.20 s total
[ 2023-10-08 00:27:57 ] Completed replacing temp checkpoint with checkpoint 48.036 ms, 49.25 s total
[ 2023-10-08 00:27:57 ] Completed Epoch: 13 batch 469: moving batch data to device 7.256 ms, 49.26 s total
[ 2023-10-08 00:27:57 ] Completed Epoch: 13 batch 469: forward pass 103.210 ms, 49.36 s total
[ 2023-10-08 00:27:57 ] Completed Epoch: 13 batch 469: backward pass 70.191 ms, 49.43 s total
[ 2023-10-08 00:27:58 ] Completed Epoch: 13 batch 469: computing loss 123.173 ms, 49.55 s total
EPOCH: [13], BATCH: [469/889], loss: 0.352, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.019
Saving checkpoint at epoch 13 train batch 469
[ 2023-10-08 00:27:59 ] Completed saving temp checkpoint 1,032.172 ms, 50.59 s total
[ 2023-10-08 00:27:59 ] Completed replacing temp checkpoint with checkpoint 46.422 ms, 50.63 s total
[ 2023-10-08 00:27:59 ] Completed Epoch: 13 batch 470: moving batch data to device 8.820 ms, 50.64 s total
[ 2023-10-08 00:27:59 ] Completed Epoch: 13 batch 470: forward pass 104.609 ms, 50.75 s total
[ 2023-10-08 00:27:59 ] Completed Epoch: 13 batch 470: backward pass 82.510 ms, 50.83 s total
[ 2023-10-08 00:27:59 ] Completed Epoch: 13 batch 470: computing loss 88.752 ms, 50.92 s total
EPOCH: [13], BATCH: [470/889], loss: 0.374, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 470
[ 2023-10-08 00:28:00 ] Completed saving temp checkpoint 1,137.938 ms, 52.05 s total
[ 2023-10-08 00:28:00 ] Completed replacing temp checkpoint with checkpoint 60.209 ms, 52.11 s total
[ 2023-10-08 00:28:00 ] Completed Epoch: 13 batch 471: moving batch data to device 5.849 ms, 52.12 s total
[ 2023-10-08 00:28:00 ] Completed Epoch: 13 batch 471: forward pass 99.723 ms, 52.22 s total
[ 2023-10-08 00:28:00 ] Completed Epoch: 13 batch 471: backward pass 77.125 ms, 52.30 s total
[ 2023-10-08 00:28:00 ] Completed Epoch: 13 batch 471: computing loss 116.637 ms, 52.41 s total
EPOCH: [13], BATCH: [471/889], loss: 0.371, loss_box_reg: 0.113, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 471
[ 2023-10-08 00:28:02 ] Completed saving temp checkpoint 1,306.294 ms, 53.72 s total
[ 2023-10-08 00:28:02 ] Completed replacing temp checkpoint with checkpoint 81.563 ms, 53.80 s total
[ 2023-10-08 00:28:02 ] Completed Epoch: 13 batch 472: moving batch data to device 7.002 ms, 53.81 s total
[ 2023-10-08 00:28:02 ] Completed Epoch: 13 batch 472: forward pass 108.315 ms, 53.92 s total
[ 2023-10-08 00:28:02 ] Completed Epoch: 13 batch 472: backward pass 81.526 ms, 54.00 s total
[ 2023-10-08 00:28:02 ] Completed Epoch: 13 batch 472: computing loss 89.539 ms, 54.09 s total
EPOCH: [13], BATCH: [472/889], loss: 0.401, loss_box_reg: 0.120, loss_classifier: 0.108, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 472
[ 2023-10-08 00:28:04 ] Completed saving temp checkpoint 1,516.185 ms, 55.60 s total
[ 2023-10-08 00:28:04 ] Completed replacing temp checkpoint with checkpoint 89.486 ms, 55.69 s total
[ 2023-10-08 00:28:04 ] Completed Epoch: 13 batch 473: moving batch data to device 6.292 ms, 55.70 s total
[ 2023-10-08 00:28:04 ] Completed Epoch: 13 batch 473: forward pass 109.600 ms, 55.81 s total
[ 2023-10-08 00:28:04 ] Completed Epoch: 13 batch 473: backward pass 73.331 ms, 55.88 s total
[ 2023-10-08 00:28:04 ] Completed Epoch: 13 batch 473: computing loss 107.421 ms, 55.99 s total
EPOCH: [13], BATCH: [473/889], loss: 0.364, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 473
[ 2023-10-08 00:28:05 ] Completed saving temp checkpoint 1,373.161 ms, 57.36 s total
[ 2023-10-08 00:28:05 ] Completed replacing temp checkpoint with checkpoint 66.297 ms, 57.43 s total
[ 2023-10-08 00:28:05 ] Completed Epoch: 13 batch 474: moving batch data to device 7.843 ms, 57.44 s total
[ 2023-10-08 00:28:06 ] Completed Epoch: 13 batch 474: forward pass 106.236 ms, 57.54 s total
[ 2023-10-08 00:28:06 ] Completed Epoch: 13 batch 474: backward pass 69.873 ms, 57.61 s total
[ 2023-10-08 00:28:06 ] Completed Epoch: 13 batch 474: computing loss 109.866 ms, 57.72 s total
EPOCH: [13], BATCH: [474/889], loss: 0.388, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.128, loss_objectness: 0.018, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 474
[ 2023-10-08 00:28:07 ] Completed saving temp checkpoint 1,647.687 ms, 59.37 s total
[ 2023-10-08 00:28:07 ] Completed replacing temp checkpoint with checkpoint 44.047 ms, 59.42 s total
[ 2023-10-08 00:28:07 ] Completed Epoch: 13 batch 475: moving batch data to device 6.069 ms, 59.42 s total
[ 2023-10-08 00:28:08 ] Completed Epoch: 13 batch 475: forward pass 109.540 ms, 59.53 s total
[ 2023-10-08 00:28:08 ] Completed Epoch: 13 batch 475: backward pass 77.556 ms, 59.61 s total
[ 2023-10-08 00:28:08 ] Completed Epoch: 13 batch 475: computing loss 110.889 ms, 59.72 s total
EPOCH: [13], BATCH: [475/889], loss: 0.406, loss_box_reg: 0.127, loss_classifier: 0.104, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 475
[ 2023-10-08 00:28:09 ] Completed saving temp checkpoint 1,068.043 ms, 60.79 s total
[ 2023-10-08 00:28:09 ] Completed replacing temp checkpoint with checkpoint 69.553 ms, 60.86 s total
[ 2023-10-08 00:28:09 ] Completed Epoch: 13 batch 476: moving batch data to device 10.060 ms, 60.87 s total
[ 2023-10-08 00:28:09 ] Completed Epoch: 13 batch 476: forward pass 104.652 ms, 60.97 s total
[ 2023-10-08 00:28:09 ] Completed Epoch: 13 batch 476: backward pass 76.827 ms, 61.05 s total
[ 2023-10-08 00:28:09 ] Completed Epoch: 13 batch 476: computing loss 109.068 ms, 61.16 s total
EPOCH: [13], BATCH: [476/889], loss: 0.400, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 476
[ 2023-10-08 00:28:10 ] Completed saving temp checkpoint 1,149.827 ms, 62.31 s total
[ 2023-10-08 00:28:10 ] Completed replacing temp checkpoint with checkpoint 71.717 ms, 62.38 s total
[ 2023-10-08 00:28:10 ] Completed Epoch: 13 batch 477: moving batch data to device 9.432 ms, 62.39 s total
[ 2023-10-08 00:28:11 ] Completed Epoch: 13 batch 477: forward pass 108.758 ms, 62.50 s total
[ 2023-10-08 00:28:11 ] Completed Epoch: 13 batch 477: backward pass 36.685 ms, 62.53 s total
[ 2023-10-08 00:28:11 ] Completed Epoch: 13 batch 477: computing loss 160.386 ms, 62.69 s total
EPOCH: [13], BATCH: [477/889], loss: 0.400, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 477
[ 2023-10-08 00:28:12 ] Completed saving temp checkpoint 971.962 ms, 63.67 s total
[ 2023-10-08 00:28:12 ] Completed replacing temp checkpoint with checkpoint 68.238 ms, 63.73 s total
[ 2023-10-08 00:28:12 ] Completed Epoch: 13 batch 478: moving batch data to device 10.215 ms, 63.75 s total
[ 2023-10-08 00:28:12 ] Completed Epoch: 13 batch 478: forward pass 104.188 ms, 63.85 s total
[ 2023-10-08 00:28:12 ] Completed Epoch: 13 batch 478: backward pass 56.562 ms, 63.91 s total
[ 2023-10-08 00:28:12 ] Completed Epoch: 13 batch 478: computing loss 138.515 ms, 64.04 s total
EPOCH: [13], BATCH: [478/889], loss: 0.396, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.131, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 478
[ 2023-10-08 00:28:13 ] Completed saving temp checkpoint 1,137.559 ms, 65.18 s total
[ 2023-10-08 00:28:13 ] Completed replacing temp checkpoint with checkpoint 84.973 ms, 65.27 s total
[ 2023-10-08 00:28:13 ] Completed Epoch: 13 batch 479: moving batch data to device 6.846 ms, 65.27 s total
[ 2023-10-08 00:28:13 ] Completed Epoch: 13 batch 479: forward pass 102.797 ms, 65.38 s total
[ 2023-10-08 00:28:13 ] Completed Epoch: 13 batch 479: backward pass 70.725 ms, 65.45 s total
[ 2023-10-08 00:28:14 ] Completed Epoch: 13 batch 479: computing loss 118.000 ms, 65.57 s total
EPOCH: [13], BATCH: [479/889], loss: 0.426, loss_box_reg: 0.127, loss_classifier: 0.109, loss_mask: 0.138, loss_objectness: 0.022, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 479
[ 2023-10-08 00:28:15 ] Completed saving temp checkpoint 1,011.411 ms, 66.58 s total
[ 2023-10-08 00:28:15 ] Completed replacing temp checkpoint with checkpoint 68.859 ms, 66.65 s total
[ 2023-10-08 00:28:15 ] Completed Epoch: 13 batch 480: moving batch data to device 7.093 ms, 66.65 s total
[ 2023-10-08 00:28:15 ] Completed Epoch: 13 batch 480: forward pass 105.908 ms, 66.76 s total
[ 2023-10-08 00:28:15 ] Completed Epoch: 13 batch 480: backward pass 70.688 ms, 66.83 s total
[ 2023-10-08 00:28:15 ] Completed Epoch: 13 batch 480: computing loss 120.359 ms, 66.95 s total
EPOCH: [13], BATCH: [480/889], loss: 0.381, loss_box_reg: 0.112, loss_classifier: 0.099, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 480
[ 2023-10-08 00:28:16 ] Completed saving temp checkpoint 1,151.187 ms, 68.10 s total
[ 2023-10-08 00:28:16 ] Completed replacing temp checkpoint with checkpoint 80.634 ms, 68.18 s total
[ 2023-10-08 00:28:16 ] Completed Epoch: 13 batch 481: moving batch data to device 9.749 ms, 68.19 s total
[ 2023-10-08 00:28:16 ] Completed Epoch: 13 batch 481: forward pass 102.241 ms, 68.29 s total
[ 2023-10-08 00:28:16 ] Completed Epoch: 13 batch 481: backward pass 40.300 ms, 68.33 s total
[ 2023-10-08 00:28:17 ] Completed Epoch: 13 batch 481: computing loss 141.875 ms, 68.48 s total
EPOCH: [13], BATCH: [481/889], loss: 0.392, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.036
Saving checkpoint at epoch 13 train batch 481
[ 2023-10-08 00:28:18 ] Completed saving temp checkpoint 1,022.004 ms, 69.50 s total
[ 2023-10-08 00:28:18 ] Completed replacing temp checkpoint with checkpoint 67.224 ms, 69.56 s total
[ 2023-10-08 00:28:18 ] Completed Epoch: 13 batch 482: moving batch data to device 9.173 ms, 69.57 s total
[ 2023-10-08 00:28:18 ] Completed Epoch: 13 batch 482: forward pass 105.433 ms, 69.68 s total
[ 2023-10-08 00:28:18 ] Completed Epoch: 13 batch 482: backward pass 78.447 ms, 69.76 s total
[ 2023-10-08 00:28:18 ] Completed Epoch: 13 batch 482: computing loss 116.493 ms, 69.87 s total
EPOCH: [13], BATCH: [482/889], loss: 0.363, loss_box_reg: 0.109, loss_classifier: 0.089, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 482
[ 2023-10-08 00:28:19 ] Completed saving temp checkpoint 1,174.242 ms, 71.05 s total
[ 2023-10-08 00:28:19 ] Completed replacing temp checkpoint with checkpoint 29.608 ms, 71.08 s total
[ 2023-10-08 00:28:19 ] Completed Epoch: 13 batch 483: moving batch data to device 4.394 ms, 71.08 s total
[ 2023-10-08 00:28:19 ] Completed Epoch: 13 batch 483: forward pass 104.354 ms, 71.19 s total
[ 2023-10-08 00:28:19 ] Completed Epoch: 13 batch 483: backward pass 71.039 ms, 71.26 s total
[ 2023-10-08 00:28:19 ] Completed Epoch: 13 batch 483: computing loss 113.142 ms, 71.37 s total
EPOCH: [13], BATCH: [483/889], loss: 0.410, loss_box_reg: 0.123, loss_classifier: 0.107, loss_mask: 0.136, loss_objectness: 0.020, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 483
[ 2023-10-08 00:28:21 ] Completed saving temp checkpoint 1,342.829 ms, 72.71 s total
[ 2023-10-08 00:28:21 ] Completed replacing temp checkpoint with checkpoint 80.046 ms, 72.79 s total
[ 2023-10-08 00:28:21 ] Completed Epoch: 13 batch 484: moving batch data to device 6.781 ms, 72.80 s total
[ 2023-10-08 00:28:21 ] Completed Epoch: 13 batch 484: forward pass 104.698 ms, 72.91 s total
[ 2023-10-08 00:28:21 ] Completed Epoch: 13 batch 484: backward pass 69.686 ms, 72.98 s total
[ 2023-10-08 00:28:21 ] Completed Epoch: 13 batch 484: computing loss 123.560 ms, 73.10 s total
EPOCH: [13], BATCH: [484/889], loss: 0.394, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 484
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 00:41:36 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 00:41:36 ] Completed importing Timer 0.065 ms, 0.00 s total
[ 2023-10-08 00:41:37 ] Completed importing everything else 466.263 ms, 0.47 s total
[ 2023-10-08 00:41:37 ] Completed defined other functions 0.026 ms, 0.47 s total
| distributed init (rank 5): env://
| distributed init (rank 4): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
| distributed init (rank 0): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 00:41:40 ] Completed main preliminaries 2,871.601 ms, 3.34 s total
loading annotations into memory...
Done (t=11.32s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-08 00:41:53 ] Completed loading data 13,267.920 ms, 16.61 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 00:41:53 ] Completed creating data samplers 104.993 ms, 16.71 s total
[ 2023-10-08 00:41:53 ] Completed creating data loaders 0.252 ms, 16.71 s total
[ 2023-10-08 00:41:54 ] Completed creating model and .to(device) 651.308 ms, 17.36 s total
[ 2023-10-08 00:41:55 ] Completed preparing model for distributed training 976.222 ms, 18.34 s total
[ 2023-10-08 00:41:55 ] Completed optimizer and scaler 0.597 ms, 18.34 s total
[ 2023-10-08 00:41:55 ] Completed learning rate schedulers 0.241 ms, 18.34 s total
[ 2023-10-08 00:41:56 ] Completed init coco evaluator 971.876 ms, 19.31 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 00:41:56 ] Completed retrieving checkpoint 854.506 ms, 20.17 s total
EPOCH :: 13
[ 2023-10-08 00:41:56 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 00:41:56 ] Completed training preliminaries 0.881 ms, 0.00 s total
Training / resuming epoch 13 from training step 484
[ 2023-10-08 00:41:57 ] Completed Epoch: 13 batch 484: moving batch data to device 571.843 ms, 0.57 s total
[ 2023-10-08 00:41:58 ] Completed Epoch: 13 batch 484: forward pass 755.045 ms, 1.33 s total
[ 2023-10-08 00:41:58 ] Completed Epoch: 13 batch 484: backward pass 173.060 ms, 1.50 s total
[ 2023-10-08 00:41:58 ] Completed Epoch: 13 batch 484: computing loss 505.550 ms, 2.01 s total
EPOCH: [13], BATCH: [484/889], loss: 0.393, loss_box_reg: 0.116, loss_classifier: 0.093, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 484
[ 2023-10-08 00:41:59 ] Completed saving temp checkpoint 1,063.900 ms, 3.07 s total
[ 2023-10-08 00:42:00 ] Completed replacing temp checkpoint with checkpoint 185.103 ms, 3.26 s total
[ 2023-10-08 00:42:00 ] Completed Epoch: 13 batch 485: moving batch data to device 3.709 ms, 3.26 s total
[ 2023-10-08 00:42:00 ] Completed Epoch: 13 batch 485: forward pass 111.152 ms, 3.37 s total
[ 2023-10-08 00:42:00 ] Completed Epoch: 13 batch 485: backward pass 82.170 ms, 3.45 s total
[ 2023-10-08 00:42:00 ] Completed Epoch: 13 batch 485: computing loss 134.573 ms, 3.59 s total
EPOCH: [13], BATCH: [485/889], loss: 0.391, loss_box_reg: 0.115, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 485
[ 2023-10-08 00:42:01 ] Completed saving temp checkpoint 960.725 ms, 4.55 s total
[ 2023-10-08 00:42:01 ] Completed replacing temp checkpoint with checkpoint 46.214 ms, 4.59 s total
[ 2023-10-08 00:42:01 ] Completed Epoch: 13 batch 486: moving batch data to device 3.038 ms, 4.60 s total
[ 2023-10-08 00:42:01 ] Completed Epoch: 13 batch 486: forward pass 109.465 ms, 4.71 s total
[ 2023-10-08 00:42:01 ] Completed Epoch: 13 batch 486: backward pass 75.852 ms, 4.78 s total
[ 2023-10-08 00:42:01 ] Completed Epoch: 13 batch 486: computing loss 144.669 ms, 4.93 s total
EPOCH: [13], BATCH: [486/889], loss: 0.414, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.138, loss_objectness: 0.021, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 486
[ 2023-10-08 00:42:02 ] Completed saving temp checkpoint 1,103.491 ms, 6.03 s total
[ 2023-10-08 00:42:02 ] Completed replacing temp checkpoint with checkpoint 72.578 ms, 6.10 s total
[ 2023-10-08 00:42:02 ] Completed Epoch: 13 batch 487: moving batch data to device 3.517 ms, 6.11 s total
[ 2023-10-08 00:42:03 ] Completed Epoch: 13 batch 487: forward pass 109.319 ms, 6.22 s total
[ 2023-10-08 00:42:03 ] Completed Epoch: 13 batch 487: backward pass 90.213 ms, 6.31 s total
[ 2023-10-08 00:42:03 ] Completed Epoch: 13 batch 487: computing loss 128.069 ms, 6.43 s total
EPOCH: [13], BATCH: [487/889], loss: 0.403, loss_box_reg: 0.129, loss_classifier: 0.102, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 487
[ 2023-10-08 00:42:04 ] Completed saving temp checkpoint 925.122 ms, 7.36 s total
[ 2023-10-08 00:42:04 ] Completed replacing temp checkpoint with checkpoint 67.649 ms, 7.43 s total
[ 2023-10-08 00:42:04 ] Completed Epoch: 13 batch 488: moving batch data to device 10.500 ms, 7.44 s total
[ 2023-10-08 00:42:04 ] Completed Epoch: 13 batch 488: forward pass 224.915 ms, 7.66 s total
[ 2023-10-08 00:42:04 ] Completed Epoch: 13 batch 488: backward pass 32.195 ms, 7.69 s total
[ 2023-10-08 00:42:04 ] Completed Epoch: 13 batch 488: computing loss 186.831 ms, 7.88 s total
EPOCH: [13], BATCH: [488/889], loss: 0.415, loss_box_reg: 0.125, loss_classifier: 0.112, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 488
[ 2023-10-08 00:42:06 ] Completed saving temp checkpoint 1,295.748 ms, 9.18 s total
[ 2023-10-08 00:42:06 ] Completed replacing temp checkpoint with checkpoint 75.964 ms, 9.25 s total
[ 2023-10-08 00:42:06 ] Completed Epoch: 13 batch 489: moving batch data to device 3.773 ms, 9.26 s total
[ 2023-10-08 00:42:06 ] Completed Epoch: 13 batch 489: forward pass 110.258 ms, 9.37 s total
[ 2023-10-08 00:42:06 ] Completed Epoch: 13 batch 489: backward pass 92.444 ms, 9.46 s total
[ 2023-10-08 00:42:06 ] Completed Epoch: 13 batch 489: computing loss 125.618 ms, 9.59 s total
EPOCH: [13], BATCH: [489/889], loss: 0.363, loss_box_reg: 0.109, loss_classifier: 0.095, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 489
[ 2023-10-08 00:42:07 ] Completed saving temp checkpoint 1,471.975 ms, 11.06 s total
[ 2023-10-08 00:42:07 ] Completed replacing temp checkpoint with checkpoint 76.089 ms, 11.13 s total
[ 2023-10-08 00:42:08 ] Completed Epoch: 13 batch 490: moving batch data to device 7.613 ms, 11.14 s total
[ 2023-10-08 00:42:08 ] Completed Epoch: 13 batch 490: forward pass 107.760 ms, 11.25 s total
[ 2023-10-08 00:42:08 ] Completed Epoch: 13 batch 490: backward pass 82.193 ms, 11.33 s total
[ 2023-10-08 00:42:08 ] Completed Epoch: 13 batch 490: computing loss 208.685 ms, 11.54 s total
EPOCH: [13], BATCH: [490/889], loss: 0.395, loss_box_reg: 0.117, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 490
[ 2023-10-08 00:42:10 ] Completed saving temp checkpoint 1,755.074 ms, 13.29 s total
[ 2023-10-08 00:42:10 ] Completed replacing temp checkpoint with checkpoint 85.178 ms, 13.38 s total
[ 2023-10-08 00:42:10 ] Completed Epoch: 13 batch 491: moving batch data to device 3.177 ms, 13.38 s total
[ 2023-10-08 00:42:10 ] Completed Epoch: 13 batch 491: forward pass 107.116 ms, 13.49 s total
[ 2023-10-08 00:42:10 ] Completed Epoch: 13 batch 491: backward pass 75.718 ms, 13.57 s total
[ 2023-10-08 00:42:10 ] Completed Epoch: 13 batch 491: computing loss 120.070 ms, 13.69 s total
EPOCH: [13], BATCH: [491/889], loss: 0.390, loss_box_reg: 0.117, loss_classifier: 0.101, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 491
[ 2023-10-08 00:42:11 ] Completed saving temp checkpoint 1,024.426 ms, 14.71 s total
[ 2023-10-08 00:42:11 ] Completed replacing temp checkpoint with checkpoint 69.638 ms, 14.78 s total
[ 2023-10-08 00:42:11 ] Completed Epoch: 13 batch 492: moving batch data to device 6.741 ms, 14.79 s total
[ 2023-10-08 00:42:11 ] Completed Epoch: 13 batch 492: forward pass 101.586 ms, 14.89 s total
[ 2023-10-08 00:42:11 ] Completed Epoch: 13 batch 492: backward pass 69.618 ms, 14.96 s total
[ 2023-10-08 00:42:11 ] Completed Epoch: 13 batch 492: computing loss 170.824 ms, 15.13 s total
EPOCH: [13], BATCH: [492/889], loss: 0.364, loss_box_reg: 0.110, loss_classifier: 0.095, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 492
[ 2023-10-08 00:42:13 ] Completed saving temp checkpoint 1,682.376 ms, 16.81 s total
[ 2023-10-08 00:42:13 ] Completed replacing temp checkpoint with checkpoint 61.357 ms, 16.87 s total
[ 2023-10-08 00:42:13 ] Completed Epoch: 13 batch 493: moving batch data to device 7.431 ms, 16.88 s total
[ 2023-10-08 00:42:13 ] Completed Epoch: 13 batch 493: forward pass 103.966 ms, 16.98 s total
[ 2023-10-08 00:42:13 ] Completed Epoch: 13 batch 493: backward pass 76.157 ms, 17.06 s total
[ 2023-10-08 00:42:14 ] Completed Epoch: 13 batch 493: computing loss 120.258 ms, 17.18 s total
EPOCH: [13], BATCH: [493/889], loss: 0.376, loss_box_reg: 0.111, loss_classifier: 0.089, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 493
[ 2023-10-08 00:42:15 ] Completed saving temp checkpoint 1,127.607 ms, 18.31 s total
[ 2023-10-08 00:42:15 ] Completed replacing temp checkpoint with checkpoint 82.293 ms, 18.39 s total
[ 2023-10-08 00:42:15 ] Completed Epoch: 13 batch 494: moving batch data to device 8.214 ms, 18.40 s total
[ 2023-10-08 00:42:15 ] Completed Epoch: 13 batch 494: forward pass 102.970 ms, 18.50 s total
[ 2023-10-08 00:42:15 ] Completed Epoch: 13 batch 494: backward pass 52.584 ms, 18.55 s total
[ 2023-10-08 00:42:15 ] Completed Epoch: 13 batch 494: computing loss 144.354 ms, 18.70 s total
EPOCH: [13], BATCH: [494/889], loss: 0.392, loss_box_reg: 0.119, loss_classifier: 0.098, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 494
[ 2023-10-08 00:42:16 ] Completed saving temp checkpoint 1,241.753 ms, 19.94 s total
[ 2023-10-08 00:42:16 ] Completed replacing temp checkpoint with checkpoint 63.271 ms, 20.00 s total
[ 2023-10-08 00:42:16 ] Completed Epoch: 13 batch 495: moving batch data to device 9.260 ms, 20.01 s total
[ 2023-10-08 00:42:16 ] Completed Epoch: 13 batch 495: forward pass 112.474 ms, 20.12 s total
[ 2023-10-08 00:42:17 ] Completed Epoch: 13 batch 495: backward pass 73.834 ms, 20.20 s total
[ 2023-10-08 00:42:17 ] Completed Epoch: 13 batch 495: computing loss 126.773 ms, 20.33 s total
EPOCH: [13], BATCH: [495/889], loss: 0.431, loss_box_reg: 0.130, loss_classifier: 0.117, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 495
[ 2023-10-08 00:42:18 ] Completed saving temp checkpoint 1,171.203 ms, 21.50 s total
[ 2023-10-08 00:42:18 ] Completed replacing temp checkpoint with checkpoint 79.220 ms, 21.58 s total
[ 2023-10-08 00:42:18 ] Completed Epoch: 13 batch 496: moving batch data to device 7.304 ms, 21.58 s total
[ 2023-10-08 00:42:18 ] Completed Epoch: 13 batch 496: forward pass 113.993 ms, 21.70 s total
[ 2023-10-08 00:42:18 ] Completed Epoch: 13 batch 496: backward pass 79.571 ms, 21.78 s total
[ 2023-10-08 00:42:18 ] Completed Epoch: 13 batch 496: computing loss 121.078 ms, 21.90 s total
EPOCH: [13], BATCH: [496/889], loss: 0.384, loss_box_reg: 0.120, loss_classifier: 0.098, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 496
[ 2023-10-08 00:42:20 ] Completed saving temp checkpoint 1,282.007 ms, 23.18 s total
[ 2023-10-08 00:42:20 ] Completed replacing temp checkpoint with checkpoint 77.568 ms, 23.26 s total
[ 2023-10-08 00:42:20 ] Completed Epoch: 13 batch 497: moving batch data to device 7.264 ms, 23.26 s total
[ 2023-10-08 00:42:20 ] Completed Epoch: 13 batch 497: forward pass 103.161 ms, 23.37 s total
[ 2023-10-08 00:42:20 ] Completed Epoch: 13 batch 497: backward pass 74.265 ms, 23.44 s total
[ 2023-10-08 00:42:20 ] Completed Epoch: 13 batch 497: computing loss 117.014 ms, 23.56 s total
EPOCH: [13], BATCH: [497/889], loss: 0.349, loss_box_reg: 0.103, loss_classifier: 0.090, loss_mask: 0.120, loss_objectness: 0.014, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 497
[ 2023-10-08 00:42:21 ] Completed saving temp checkpoint 1,187.427 ms, 24.75 s total
[ 2023-10-08 00:42:21 ] Completed replacing temp checkpoint with checkpoint 59.840 ms, 24.81 s total
[ 2023-10-08 00:42:21 ] Completed Epoch: 13 batch 498: moving batch data to device 5.428 ms, 24.81 s total
[ 2023-10-08 00:42:21 ] Completed Epoch: 13 batch 498: forward pass 105.443 ms, 24.92 s total
[ 2023-10-08 00:42:21 ] Completed Epoch: 13 batch 498: backward pass 35.913 ms, 24.95 s total
[ 2023-10-08 00:42:21 ] Completed Epoch: 13 batch 498: computing loss 161.613 ms, 25.11 s total
EPOCH: [13], BATCH: [498/889], loss: 0.387, loss_box_reg: 0.116, loss_classifier: 0.103, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 498
[ 2023-10-08 00:42:23 ] Completed saving temp checkpoint 1,282.745 ms, 26.40 s total
[ 2023-10-08 00:42:23 ] Completed replacing temp checkpoint with checkpoint 83.641 ms, 26.48 s total
[ 2023-10-08 00:42:23 ] Completed Epoch: 13 batch 499: moving batch data to device 8.540 ms, 26.49 s total
[ 2023-10-08 00:42:23 ] Completed Epoch: 13 batch 499: forward pass 105.100 ms, 26.59 s total
[ 2023-10-08 00:42:23 ] Completed Epoch: 13 batch 499: backward pass 59.232 ms, 26.65 s total
[ 2023-10-08 00:42:23 ] Completed Epoch: 13 batch 499: computing loss 136.919 ms, 26.79 s total
EPOCH: [13], BATCH: [499/889], loss: 0.355, loss_box_reg: 0.106, loss_classifier: 0.089, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 499
[ 2023-10-08 00:42:24 ] Completed saving temp checkpoint 1,162.383 ms, 27.95 s total
[ 2023-10-08 00:42:24 ] Completed replacing temp checkpoint with checkpoint 61.545 ms, 28.01 s total
[ 2023-10-08 00:42:24 ] Completed Epoch: 13 batch 500: moving batch data to device 6.397 ms, 28.02 s total
[ 2023-10-08 00:42:24 ] Completed Epoch: 13 batch 500: forward pass 108.993 ms, 28.13 s total
[ 2023-10-08 00:42:25 ] Completed Epoch: 13 batch 500: backward pass 47.472 ms, 28.18 s total
[ 2023-10-08 00:42:25 ] Completed Epoch: 13 batch 500: computing loss 140.351 ms, 28.32 s total
EPOCH: [13], BATCH: [500/889], loss: 0.378, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.019
Saving checkpoint at epoch 13 train batch 500
[ 2023-10-08 00:42:26 ] Completed saving temp checkpoint 1,277.705 ms, 29.60 s total
[ 2023-10-08 00:42:26 ] Completed replacing temp checkpoint with checkpoint 85.256 ms, 29.68 s total
[ 2023-10-08 00:42:26 ] Completed Epoch: 13 batch 501: moving batch data to device 9.302 ms, 29.69 s total
[ 2023-10-08 00:42:26 ] Completed Epoch: 13 batch 501: forward pass 106.812 ms, 29.80 s total
[ 2023-10-08 00:42:26 ] Completed Epoch: 13 batch 501: backward pass 78.537 ms, 29.88 s total
[ 2023-10-08 00:42:26 ] Completed Epoch: 13 batch 501: computing loss 116.626 ms, 29.99 s total
EPOCH: [13], BATCH: [501/889], loss: 0.422, loss_box_reg: 0.130, loss_classifier: 0.111, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 501
[ 2023-10-08 00:42:28 ] Completed saving temp checkpoint 1,676.636 ms, 31.67 s total
[ 2023-10-08 00:42:28 ] Completed replacing temp checkpoint with checkpoint 118.064 ms, 31.79 s total
[ 2023-10-08 00:42:28 ] Completed Epoch: 13 batch 502: moving batch data to device 8.152 ms, 31.80 s total
[ 2023-10-08 00:42:28 ] Completed Epoch: 13 batch 502: forward pass 105.995 ms, 31.90 s total
[ 2023-10-08 00:42:28 ] Completed Epoch: 13 batch 502: backward pass 78.327 ms, 31.98 s total
[ 2023-10-08 00:42:28 ] Completed Epoch: 13 batch 502: computing loss 118.244 ms, 32.10 s total
EPOCH: [13], BATCH: [502/889], loss: 0.396, loss_box_reg: 0.115, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 502
[ 2023-10-08 00:42:30 ] Completed saving temp checkpoint 1,592.308 ms, 33.69 s total
[ 2023-10-08 00:42:30 ] Completed replacing temp checkpoint with checkpoint 99.895 ms, 33.79 s total
[ 2023-10-08 00:42:30 ] Completed Epoch: 13 batch 503: moving batch data to device 8.597 ms, 33.80 s total
[ 2023-10-08 00:42:30 ] Completed Epoch: 13 batch 503: forward pass 103.171 ms, 33.90 s total
[ 2023-10-08 00:42:30 ] Completed Epoch: 13 batch 503: backward pass 73.960 ms, 33.98 s total
[ 2023-10-08 00:42:30 ] Completed Epoch: 13 batch 503: computing loss 113.244 ms, 34.09 s total
EPOCH: [13], BATCH: [503/889], loss: 0.372, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 503
[ 2023-10-08 00:42:32 ] Completed saving temp checkpoint 1,233.337 ms, 35.32 s total
[ 2023-10-08 00:42:32 ] Completed replacing temp checkpoint with checkpoint 81.680 ms, 35.40 s total
[ 2023-10-08 00:42:32 ] Completed Epoch: 13 batch 504: moving batch data to device 7.782 ms, 35.41 s total
[ 2023-10-08 00:42:32 ] Completed Epoch: 13 batch 504: forward pass 103.011 ms, 35.51 s total
[ 2023-10-08 00:42:32 ] Completed Epoch: 13 batch 504: backward pass 30.512 ms, 35.55 s total
[ 2023-10-08 00:42:32 ] Completed Epoch: 13 batch 504: computing loss 140.288 ms, 35.69 s total
EPOCH: [13], BATCH: [504/889], loss: 0.400, loss_box_reg: 0.125, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 504
[ 2023-10-08 00:42:33 ] Completed saving temp checkpoint 1,325.884 ms, 37.01 s total
[ 2023-10-08 00:42:33 ] Completed replacing temp checkpoint with checkpoint 71.478 ms, 37.08 s total
[ 2023-10-08 00:42:33 ] Completed Epoch: 13 batch 505: moving batch data to device 7.846 ms, 37.09 s total
[ 2023-10-08 00:42:34 ] Completed Epoch: 13 batch 505: forward pass 103.516 ms, 37.19 s total
[ 2023-10-08 00:42:34 ] Completed Epoch: 13 batch 505: backward pass 75.698 ms, 37.27 s total
[ 2023-10-08 00:42:34 ] Completed Epoch: 13 batch 505: computing loss 122.202 ms, 37.39 s total
EPOCH: [13], BATCH: [505/889], loss: 0.413, loss_box_reg: 0.125, loss_classifier: 0.109, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 505
[ 2023-10-08 00:42:35 ] Completed saving temp checkpoint 1,227.207 ms, 38.62 s total
[ 2023-10-08 00:42:35 ] Completed replacing temp checkpoint with checkpoint 83.592 ms, 38.70 s total
[ 2023-10-08 00:42:35 ] Completed Epoch: 13 batch 506: moving batch data to device 6.786 ms, 38.71 s total
[ 2023-10-08 00:42:35 ] Completed Epoch: 13 batch 506: forward pass 109.051 ms, 38.82 s total
[ 2023-10-08 00:42:35 ] Completed Epoch: 13 batch 506: backward pass 46.630 ms, 38.87 s total
[ 2023-10-08 00:42:35 ] Completed Epoch: 13 batch 506: computing loss 153.740 ms, 39.02 s total
EPOCH: [13], BATCH: [506/889], loss: 0.377, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 506
[ 2023-10-08 00:42:37 ] Completed saving temp checkpoint 1,482.970 ms, 40.50 s total
[ 2023-10-08 00:42:37 ] Completed replacing temp checkpoint with checkpoint 35.826 ms, 40.54 s total
[ 2023-10-08 00:42:37 ] Completed Epoch: 13 batch 507: moving batch data to device 4.616 ms, 40.54 s total
[ 2023-10-08 00:42:37 ] Completed Epoch: 13 batch 507: forward pass 103.299 ms, 40.65 s total
[ 2023-10-08 00:42:37 ] Completed Epoch: 13 batch 507: backward pass 81.398 ms, 40.73 s total
[ 2023-10-08 00:42:37 ] Completed Epoch: 13 batch 507: computing loss 113.998 ms, 40.84 s total
EPOCH: [13], BATCH: [507/889], loss: 0.439, loss_box_reg: 0.139, loss_classifier: 0.114, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 507
[ 2023-10-08 00:42:38 ] Completed saving temp checkpoint 1,058.163 ms, 41.90 s total
[ 2023-10-08 00:42:38 ] Completed replacing temp checkpoint with checkpoint 58.492 ms, 41.96 s total
[ 2023-10-08 00:42:38 ] Completed Epoch: 13 batch 508: moving batch data to device 6.707 ms, 41.96 s total
[ 2023-10-08 00:42:38 ] Completed Epoch: 13 batch 508: forward pass 106.511 ms, 42.07 s total
[ 2023-10-08 00:42:38 ] Completed Epoch: 13 batch 508: backward pass 67.376 ms, 42.14 s total
[ 2023-10-08 00:42:39 ] Completed Epoch: 13 batch 508: computing loss 125.996 ms, 42.26 s total
EPOCH: [13], BATCH: [508/889], loss: 0.419, loss_box_reg: 0.125, loss_classifier: 0.104, loss_mask: 0.140, loss_objectness: 0.017, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 508
[ 2023-10-08 00:42:40 ] Completed saving temp checkpoint 1,225.608 ms, 43.49 s total
[ 2023-10-08 00:42:40 ] Completed replacing temp checkpoint with checkpoint 82.058 ms, 43.57 s total
[ 2023-10-08 00:42:40 ] Completed Epoch: 13 batch 509: moving batch data to device 10.219 ms, 43.58 s total
[ 2023-10-08 00:42:40 ] Completed Epoch: 13 batch 509: forward pass 109.128 ms, 43.69 s total
[ 2023-10-08 00:42:40 ] Completed Epoch: 13 batch 509: backward pass 72.945 ms, 43.76 s total
[ 2023-10-08 00:42:40 ] Completed Epoch: 13 batch 509: computing loss 127.390 ms, 43.89 s total
EPOCH: [13], BATCH: [509/889], loss: 0.374, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 509
[ 2023-10-08 00:42:41 ] Completed saving temp checkpoint 1,115.261 ms, 45.01 s total
[ 2023-10-08 00:42:41 ] Completed replacing temp checkpoint with checkpoint 63.046 ms, 45.07 s total
[ 2023-10-08 00:42:41 ] Completed Epoch: 13 batch 510: moving batch data to device 6.402 ms, 45.08 s total
[ 2023-10-08 00:42:42 ] Completed Epoch: 13 batch 510: forward pass 101.612 ms, 45.18 s total
[ 2023-10-08 00:42:42 ] Completed Epoch: 13 batch 510: backward pass 52.817 ms, 45.23 s total
[ 2023-10-08 00:42:42 ] Completed Epoch: 13 batch 510: computing loss 136.727 ms, 45.37 s total
EPOCH: [13], BATCH: [510/889], loss: 0.388, loss_box_reg: 0.115, loss_classifier: 0.103, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 510
[ 2023-10-08 00:42:43 ] Completed saving temp checkpoint 1,195.784 ms, 46.56 s total
[ 2023-10-08 00:42:43 ] Completed replacing temp checkpoint with checkpoint 85.613 ms, 46.65 s total
[ 2023-10-08 00:42:43 ] Completed Epoch: 13 batch 511: moving batch data to device 8.121 ms, 46.66 s total
[ 2023-10-08 00:42:43 ] Completed Epoch: 13 batch 511: forward pass 104.055 ms, 46.76 s total
[ 2023-10-08 00:42:43 ] Completed Epoch: 13 batch 511: backward pass 47.682 ms, 46.81 s total
[ 2023-10-08 00:42:43 ] Completed Epoch: 13 batch 511: computing loss 147.441 ms, 46.96 s total
EPOCH: [13], BATCH: [511/889], loss: 0.377, loss_box_reg: 0.112, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 511
[ 2023-10-08 00:42:44 ] Completed saving temp checkpoint 1,083.064 ms, 48.04 s total
[ 2023-10-08 00:42:44 ] Completed replacing temp checkpoint with checkpoint 65.687 ms, 48.11 s total
[ 2023-10-08 00:42:44 ] Completed Epoch: 13 batch 512: moving batch data to device 7.559 ms, 48.11 s total
[ 2023-10-08 00:42:45 ] Completed Epoch: 13 batch 512: forward pass 105.019 ms, 48.22 s total
[ 2023-10-08 00:42:45 ] Completed Epoch: 13 batch 512: backward pass 39.690 ms, 48.26 s total
[ 2023-10-08 00:42:45 ] Completed Epoch: 13 batch 512: computing loss 153.467 ms, 48.41 s total
EPOCH: [13], BATCH: [512/889], loss: 0.396, loss_box_reg: 0.114, loss_classifier: 0.096, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 512
[ 2023-10-08 00:42:46 ] Completed saving temp checkpoint 1,222.682 ms, 49.63 s total
[ 2023-10-08 00:42:46 ] Completed replacing temp checkpoint with checkpoint 82.054 ms, 49.72 s total
[ 2023-10-08 00:42:46 ] Completed Epoch: 13 batch 513: moving batch data to device 8.454 ms, 49.72 s total
[ 2023-10-08 00:42:46 ] Completed Epoch: 13 batch 513: forward pass 108.324 ms, 49.83 s total
[ 2023-10-08 00:42:46 ] Completed Epoch: 13 batch 513: backward pass 77.072 ms, 49.91 s total
[ 2023-10-08 00:42:46 ] Completed Epoch: 13 batch 513: computing loss 92.908 ms, 50.00 s total
EPOCH: [13], BATCH: [513/889], loss: 0.401, loss_box_reg: 0.123, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 513
[ 2023-10-08 00:42:48 ] Completed saving temp checkpoint 1,316.103 ms, 51.32 s total
[ 2023-10-08 00:42:48 ] Completed replacing temp checkpoint with checkpoint 50.884 ms, 51.37 s total
[ 2023-10-08 00:42:48 ] Completed Epoch: 13 batch 514: moving batch data to device 8.273 ms, 51.38 s total
[ 2023-10-08 00:42:48 ] Completed Epoch: 13 batch 514: forward pass 106.417 ms, 51.48 s total
[ 2023-10-08 00:42:48 ] Completed Epoch: 13 batch 514: backward pass 63.146 ms, 51.55 s total
[ 2023-10-08 00:42:48 ] Completed Epoch: 13 batch 514: computing loss 130.583 ms, 51.68 s total
EPOCH: [13], BATCH: [514/889], loss: 0.398, loss_box_reg: 0.123, loss_classifier: 0.109, loss_mask: 0.126, loss_objectness: 0.017, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 514
[ 2023-10-08 00:42:50 ] Completed saving temp checkpoint 1,904.603 ms, 53.58 s total
[ 2023-10-08 00:42:50 ] Completed replacing temp checkpoint with checkpoint 98.843 ms, 53.68 s total
[ 2023-10-08 00:42:50 ] Completed Epoch: 13 batch 515: moving batch data to device 7.151 ms, 53.69 s total
[ 2023-10-08 00:42:50 ] Completed Epoch: 13 batch 515: forward pass 108.963 ms, 53.80 s total
[ 2023-10-08 00:42:50 ] Completed Epoch: 13 batch 515: backward pass 74.205 ms, 53.87 s total
[ 2023-10-08 00:42:50 ] Completed Epoch: 13 batch 515: computing loss 111.873 ms, 53.98 s total
EPOCH: [13], BATCH: [515/889], loss: 0.365, loss_box_reg: 0.109, loss_classifier: 0.089, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 515
[ 2023-10-08 00:42:52 ] Completed saving temp checkpoint 1,215.095 ms, 55.20 s total
[ 2023-10-08 00:42:52 ] Completed replacing temp checkpoint with checkpoint 54.084 ms, 55.25 s total
[ 2023-10-08 00:42:52 ] Completed Epoch: 13 batch 516: moving batch data to device 4.724 ms, 55.26 s total
[ 2023-10-08 00:42:52 ] Completed Epoch: 13 batch 516: forward pass 101.875 ms, 55.36 s total
[ 2023-10-08 00:42:52 ] Completed Epoch: 13 batch 516: backward pass 48.223 ms, 55.41 s total
[ 2023-10-08 00:42:52 ] Completed Epoch: 13 batch 516: computing loss 136.150 ms, 55.54 s total
EPOCH: [13], BATCH: [516/889], loss: 0.346, loss_box_reg: 0.102, loss_classifier: 0.083, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 516
[ 2023-10-08 00:42:53 ] Completed saving temp checkpoint 1,364.489 ms, 56.91 s total
[ 2023-10-08 00:42:53 ] Completed replacing temp checkpoint with checkpoint 63.930 ms, 56.97 s total
[ 2023-10-08 00:42:53 ] Completed Epoch: 13 batch 517: moving batch data to device 10.240 ms, 56.98 s total
[ 2023-10-08 00:42:53 ] Completed Epoch: 13 batch 517: forward pass 107.529 ms, 57.09 s total
[ 2023-10-08 00:42:54 ] Completed Epoch: 13 batch 517: backward pass 75.998 ms, 57.17 s total
[ 2023-10-08 00:42:54 ] Completed Epoch: 13 batch 517: computing loss 118.099 ms, 57.28 s total
EPOCH: [13], BATCH: [517/889], loss: 0.380, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 517
[ 2023-10-08 00:42:55 ] Completed saving temp checkpoint 1,031.929 ms, 58.32 s total
[ 2023-10-08 00:42:55 ] Completed replacing temp checkpoint with checkpoint 51.735 ms, 58.37 s total
[ 2023-10-08 00:42:55 ] Completed Epoch: 13 batch 518: moving batch data to device 4.605 ms, 58.37 s total
[ 2023-10-08 00:42:55 ] Completed Epoch: 13 batch 518: forward pass 101.053 ms, 58.47 s total
[ 2023-10-08 00:42:55 ] Completed Epoch: 13 batch 518: backward pass 35.259 ms, 58.51 s total
[ 2023-10-08 00:42:55 ] Completed Epoch: 13 batch 518: computing loss 165.083 ms, 58.67 s total
EPOCH: [13], BATCH: [518/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 518
[ 2023-10-08 00:42:57 ] Completed saving temp checkpoint 1,608.112 ms, 60.28 s total
[ 2023-10-08 00:42:57 ] Completed replacing temp checkpoint with checkpoint 62.119 ms, 60.34 s total
[ 2023-10-08 00:42:57 ] Completed Epoch: 13 batch 519: moving batch data to device 8.083 ms, 60.35 s total
[ 2023-10-08 00:42:57 ] Completed Epoch: 13 batch 519: forward pass 106.191 ms, 60.46 s total
[ 2023-10-08 00:42:57 ] Completed Epoch: 13 batch 519: backward pass 36.280 ms, 60.49 s total
[ 2023-10-08 00:42:57 ] Completed Epoch: 13 batch 519: computing loss 160.835 ms, 60.66 s total
EPOCH: [13], BATCH: [519/889], loss: 0.381, loss_box_reg: 0.111, loss_classifier: 0.094, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 519
[ 2023-10-08 00:42:58 ] Completed saving temp checkpoint 970.089 ms, 61.63 s total
[ 2023-10-08 00:42:58 ] Completed replacing temp checkpoint with checkpoint 47.517 ms, 61.67 s total
[ 2023-10-08 00:42:58 ] Completed Epoch: 13 batch 520: moving batch data to device 4.525 ms, 61.68 s total
[ 2023-10-08 00:42:58 ] Completed Epoch: 13 batch 520: forward pass 101.770 ms, 61.78 s total
[ 2023-10-08 00:42:58 ] Completed Epoch: 13 batch 520: backward pass 55.008 ms, 61.83 s total
[ 2023-10-08 00:42:58 ] Completed Epoch: 13 batch 520: computing loss 130.433 ms, 61.96 s total
EPOCH: [13], BATCH: [520/889], loss: 0.367, loss_box_reg: 0.110, loss_classifier: 0.092, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 520
[ 2023-10-08 00:42:59 ] Completed saving temp checkpoint 1,147.475 ms, 63.11 s total
[ 2023-10-08 00:43:00 ] Completed replacing temp checkpoint with checkpoint 69.396 ms, 63.18 s total
[ 2023-10-08 00:43:00 ] Completed Epoch: 13 batch 521: moving batch data to device 8.175 ms, 63.19 s total
[ 2023-10-08 00:43:00 ] Completed Epoch: 13 batch 521: forward pass 107.647 ms, 63.30 s total
[ 2023-10-08 00:43:00 ] Completed Epoch: 13 batch 521: backward pass 45.757 ms, 63.34 s total
[ 2023-10-08 00:43:00 ] Completed Epoch: 13 batch 521: computing loss 123.815 ms, 63.47 s total
EPOCH: [13], BATCH: [521/889], loss: 0.362, loss_box_reg: 0.109, loss_classifier: 0.086, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 521
[ 2023-10-08 00:43:01 ] Completed saving temp checkpoint 1,045.285 ms, 64.51 s total
[ 2023-10-08 00:43:01 ] Completed replacing temp checkpoint with checkpoint 72.868 ms, 64.59 s total
[ 2023-10-08 00:43:01 ] Completed Epoch: 13 batch 522: moving batch data to device 6.849 ms, 64.59 s total
[ 2023-10-08 00:43:01 ] Completed Epoch: 13 batch 522: forward pass 106.163 ms, 64.70 s total
[ 2023-10-08 00:43:01 ] Completed Epoch: 13 batch 522: backward pass 73.907 ms, 64.77 s total
[ 2023-10-08 00:43:01 ] Completed Epoch: 13 batch 522: computing loss 94.983 ms, 64.87 s total
EPOCH: [13], BATCH: [522/889], loss: 0.415, loss_box_reg: 0.127, loss_classifier: 0.101, loss_mask: 0.142, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 522
[ 2023-10-08 00:43:02 ] Completed saving temp checkpoint 1,176.786 ms, 66.04 s total
[ 2023-10-08 00:43:02 ] Completed replacing temp checkpoint with checkpoint 69.862 ms, 66.11 s total
[ 2023-10-08 00:43:02 ] Completed Epoch: 13 batch 523: moving batch data to device 7.914 ms, 66.12 s total
[ 2023-10-08 00:43:03 ] Completed Epoch: 13 batch 523: forward pass 102.742 ms, 66.22 s total
[ 2023-10-08 00:43:03 ] Completed Epoch: 13 batch 523: backward pass 42.184 ms, 66.27 s total
[ 2023-10-08 00:43:03 ] Completed Epoch: 13 batch 523: computing loss 156.693 ms, 66.42 s total
EPOCH: [13], BATCH: [523/889], loss: 0.386, loss_box_reg: 0.117, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 523
[ 2023-10-08 00:43:04 ] Completed saving temp checkpoint 1,071.206 ms, 67.49 s total
[ 2023-10-08 00:43:04 ] Completed replacing temp checkpoint with checkpoint 75.062 ms, 67.57 s total
[ 2023-10-08 00:43:04 ] Completed Epoch: 13 batch 524: moving batch data to device 8.157 ms, 67.58 s total
[ 2023-10-08 00:43:04 ] Completed Epoch: 13 batch 524: forward pass 104.333 ms, 67.68 s total
[ 2023-10-08 00:43:04 ] Completed Epoch: 13 batch 524: backward pass 38.717 ms, 67.72 s total
[ 2023-10-08 00:43:04 ] Completed Epoch: 13 batch 524: computing loss 145.239 ms, 67.87 s total
EPOCH: [13], BATCH: [524/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 524
[ 2023-10-08 00:43:05 ] Completed saving temp checkpoint 1,186.759 ms, 69.05 s total
[ 2023-10-08 00:43:05 ] Completed replacing temp checkpoint with checkpoint 48.991 ms, 69.10 s total
[ 2023-10-08 00:43:05 ] Completed Epoch: 13 batch 525: moving batch data to device 5.626 ms, 69.11 s total
[ 2023-10-08 00:43:06 ] Completed Epoch: 13 batch 525: forward pass 104.628 ms, 69.21 s total
[ 2023-10-08 00:43:06 ] Completed Epoch: 13 batch 525: backward pass 80.315 ms, 69.29 s total
[ 2023-10-08 00:43:06 ] Completed Epoch: 13 batch 525: computing loss 92.816 ms, 69.38 s total
EPOCH: [13], BATCH: [525/889], loss: 0.381, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 525
[ 2023-10-08 00:43:07 ] Completed saving temp checkpoint 1,009.221 ms, 70.39 s total
[ 2023-10-08 00:43:07 ] Completed replacing temp checkpoint with checkpoint 80.929 ms, 70.48 s total
[ 2023-10-08 00:43:07 ] Completed Epoch: 13 batch 526: moving batch data to device 6.484 ms, 70.48 s total
[ 2023-10-08 00:43:07 ] Completed Epoch: 13 batch 526: forward pass 107.301 ms, 70.59 s total
[ 2023-10-08 00:43:07 ] Completed Epoch: 13 batch 526: backward pass 74.561 ms, 70.66 s total
[ 2023-10-08 00:43:07 ] Completed Epoch: 13 batch 526: computing loss 96.025 ms, 70.76 s total
EPOCH: [13], BATCH: [526/889], loss: 0.404, loss_box_reg: 0.126, loss_classifier: 0.102, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 526
[ 2023-10-08 00:43:09 ] Completed saving temp checkpoint 1,672.849 ms, 72.43 s total
[ 2023-10-08 00:43:09 ] Completed replacing temp checkpoint with checkpoint 93.058 ms, 72.53 s total
[ 2023-10-08 00:43:09 ] Completed Epoch: 13 batch 527: moving batch data to device 8.239 ms, 72.53 s total
[ 2023-10-08 00:43:09 ] Completed Epoch: 13 batch 527: forward pass 103.693 ms, 72.64 s total
[ 2023-10-08 00:43:09 ] Completed Epoch: 13 batch 527: backward pass 43.616 ms, 72.68 s total
[ 2023-10-08 00:43:09 ] Completed Epoch: 13 batch 527: computing loss 148.418 ms, 72.83 s total
EPOCH: [13], BATCH: [527/889], loss: 0.384, loss_box_reg: 0.116, loss_classifier: 0.094, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 527
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 00:56:24 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 00:56:24 ] Completed importing Timer 0.022 ms, 0.00 s total
[ 2023-10-08 00:56:25 ] Completed importing everything else 526.947 ms, 0.53 s total
[ 2023-10-08 00:56:25 ] Completed defined other functions 0.023 ms, 0.53 s total
| distributed init (rank 5): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 4): env://
| distributed init (rank 3): env://
| distributed init (rank 2): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 00:56:32 ] Completed main preliminaries 7,586.647 ms, 8.11 s total
loading annotations into memory...
Done (t=10.50s)
creating index...
index created!
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
[ 2023-10-08 00:56:44 ] Completed loading data 12,201.541 ms, 20.32 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 00:56:44 ] Completed creating data samplers 90.325 ms, 20.41 s total
[ 2023-10-08 00:56:44 ] Completed creating data loaders 0.196 ms, 20.41 s total
[ 2023-10-08 00:56:45 ] Completed creating model and .to(device) 650.232 ms, 21.06 s total
[ 2023-10-08 00:56:47 ] Completed preparing model for distributed training 2,093.232 ms, 23.15 s total
[ 2023-10-08 00:56:47 ] Completed optimizer and scaler 0.610 ms, 23.15 s total
[ 2023-10-08 00:56:47 ] Completed learning rate schedulers 0.250 ms, 23.15 s total
[ 2023-10-08 00:56:48 ] Completed init coco evaluator 936.205 ms, 24.09 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 00:56:49 ] Completed retrieving checkpoint 850.377 ms, 24.94 s total
EPOCH :: 13
[ 2023-10-08 00:56:49 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 00:56:49 ] Completed training preliminaries 1.227 ms, 0.00 s total
Training / resuming epoch 13 from training step 527
[ 2023-10-08 00:56:49 ] Completed Epoch: 13 batch 527: moving batch data to device 520.832 ms, 0.52 s total
[ 2023-10-08 00:56:50 ] Completed Epoch: 13 batch 527: forward pass 915.714 ms, 1.44 s total
[ 2023-10-08 00:56:51 ] Completed Epoch: 13 batch 527: backward pass 178.502 ms, 1.62 s total
[ 2023-10-08 00:56:51 ] Completed Epoch: 13 batch 527: computing loss 533.326 ms, 2.15 s total
EPOCH: [13], BATCH: [527/889], loss: 0.385, loss_box_reg: 0.116, loss_classifier: 0.094, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 527
[ 2023-10-08 00:56:52 ] Completed saving temp checkpoint 888.979 ms, 3.04 s total
[ 2023-10-08 00:56:52 ] Completed replacing temp checkpoint with checkpoint 148.237 ms, 3.19 s total
[ 2023-10-08 00:56:52 ] Completed Epoch: 13 batch 528: moving batch data to device 4.580 ms, 3.19 s total
[ 2023-10-08 00:56:52 ] Completed Epoch: 13 batch 528: forward pass 111.189 ms, 3.30 s total
[ 2023-10-08 00:56:52 ] Completed Epoch: 13 batch 528: backward pass 121.660 ms, 3.42 s total
[ 2023-10-08 00:56:52 ] Completed Epoch: 13 batch 528: computing loss 102.731 ms, 3.53 s total
EPOCH: [13], BATCH: [528/889], loss: 0.373, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 528
[ 2023-10-08 00:56:54 ] Completed saving temp checkpoint 1,102.679 ms, 4.63 s total
[ 2023-10-08 00:56:54 ] Completed replacing temp checkpoint with checkpoint 58.591 ms, 4.69 s total
[ 2023-10-08 00:56:54 ] Completed Epoch: 13 batch 529: moving batch data to device 4.364 ms, 4.69 s total
[ 2023-10-08 00:56:54 ] Completed Epoch: 13 batch 529: forward pass 108.761 ms, 4.80 s total
[ 2023-10-08 00:56:54 ] Completed Epoch: 13 batch 529: backward pass 43.333 ms, 4.84 s total
[ 2023-10-08 00:56:54 ] Completed Epoch: 13 batch 529: computing loss 177.924 ms, 5.02 s total
EPOCH: [13], BATCH: [529/889], loss: 0.386, loss_box_reg: 0.118, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 529
[ 2023-10-08 00:56:55 ] Completed saving temp checkpoint 1,014.975 ms, 6.04 s total
[ 2023-10-08 00:56:55 ] Completed replacing temp checkpoint with checkpoint 62.629 ms, 6.10 s total
[ 2023-10-08 00:56:55 ] Completed Epoch: 13 batch 530: moving batch data to device 12.550 ms, 6.11 s total
[ 2023-10-08 00:56:55 ] Completed Epoch: 13 batch 530: forward pass 104.637 ms, 6.22 s total
[ 2023-10-08 00:56:55 ] Completed Epoch: 13 batch 530: backward pass 123.873 ms, 6.34 s total
[ 2023-10-08 00:56:55 ] Completed Epoch: 13 batch 530: computing loss 91.857 ms, 6.43 s total
EPOCH: [13], BATCH: [530/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 530
[ 2023-10-08 00:56:56 ] Completed saving temp checkpoint 1,071.772 ms, 7.50 s total
[ 2023-10-08 00:56:57 ] Completed replacing temp checkpoint with checkpoint 52.733 ms, 7.56 s total
[ 2023-10-08 00:56:57 ] Completed Epoch: 13 batch 531: moving batch data to device 3.571 ms, 7.56 s total
[ 2023-10-08 00:56:57 ] Completed Epoch: 13 batch 531: forward pass 108.553 ms, 7.67 s total
[ 2023-10-08 00:56:57 ] Completed Epoch: 13 batch 531: backward pass 111.824 ms, 7.78 s total
[ 2023-10-08 00:56:57 ] Completed Epoch: 13 batch 531: computing loss 98.785 ms, 7.88 s total
EPOCH: [13], BATCH: [531/889], loss: 0.371, loss_box_reg: 0.114, loss_classifier: 0.093, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 531
[ 2023-10-08 00:56:58 ] Completed saving temp checkpoint 872.243 ms, 8.75 s total
[ 2023-10-08 00:56:58 ] Completed replacing temp checkpoint with checkpoint 65.080 ms, 8.82 s total
[ 2023-10-08 00:56:58 ] Completed Epoch: 13 batch 532: moving batch data to device 6.035 ms, 8.82 s total
[ 2023-10-08 00:56:58 ] Completed Epoch: 13 batch 532: forward pass 108.768 ms, 8.93 s total
[ 2023-10-08 00:56:58 ] Completed Epoch: 13 batch 532: backward pass 78.861 ms, 9.01 s total
[ 2023-10-08 00:56:58 ] Completed Epoch: 13 batch 532: computing loss 125.088 ms, 9.14 s total
EPOCH: [13], BATCH: [532/889], loss: 0.384, loss_box_reg: 0.115, loss_classifier: 0.097, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 532
[ 2023-10-08 00:56:59 ] Completed saving temp checkpoint 991.307 ms, 10.13 s total
[ 2023-10-08 00:56:59 ] Completed replacing temp checkpoint with checkpoint 36.727 ms, 10.16 s total
[ 2023-10-08 00:56:59 ] Completed Epoch: 13 batch 533: moving batch data to device 2.470 ms, 10.17 s total
[ 2023-10-08 00:56:59 ] Completed Epoch: 13 batch 533: forward pass 104.850 ms, 10.27 s total
[ 2023-10-08 00:56:59 ] Completed Epoch: 13 batch 533: backward pass 73.387 ms, 10.35 s total
[ 2023-10-08 00:56:59 ] Completed Epoch: 13 batch 533: computing loss 142.174 ms, 10.49 s total
EPOCH: [13], BATCH: [533/889], loss: 0.387, loss_box_reg: 0.115, loss_classifier: 0.098, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 533
[ 2023-10-08 00:57:00 ] Completed saving temp checkpoint 765.547 ms, 11.25 s total
[ 2023-10-08 00:57:00 ] Completed replacing temp checkpoint with checkpoint 66.476 ms, 11.32 s total
[ 2023-10-08 00:57:00 ] Completed Epoch: 13 batch 534: moving batch data to device 11.938 ms, 11.33 s total
[ 2023-10-08 00:57:00 ] Completed Epoch: 13 batch 534: forward pass 182.389 ms, 11.51 s total
[ 2023-10-08 00:57:01 ] Completed Epoch: 13 batch 534: backward pass 79.228 ms, 11.59 s total
[ 2023-10-08 00:57:01 ] Completed Epoch: 13 batch 534: computing loss 132.497 ms, 11.73 s total
EPOCH: [13], BATCH: [534/889], loss: 0.429, loss_box_reg: 0.132, loss_classifier: 0.112, loss_mask: 0.135, loss_objectness: 0.016, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 534
[ 2023-10-08 00:57:02 ] Completed saving temp checkpoint 1,008.773 ms, 12.73 s total
[ 2023-10-08 00:57:02 ] Completed replacing temp checkpoint with checkpoint 70.881 ms, 12.81 s total
[ 2023-10-08 00:57:02 ] Completed Epoch: 13 batch 535: moving batch data to device 3.944 ms, 12.81 s total
[ 2023-10-08 00:57:02 ] Completed Epoch: 13 batch 535: forward pass 107.682 ms, 12.92 s total
[ 2023-10-08 00:57:02 ] Completed Epoch: 13 batch 535: backward pass 81.121 ms, 13.00 s total
[ 2023-10-08 00:57:02 ] Completed Epoch: 13 batch 535: computing loss 113.579 ms, 13.11 s total
EPOCH: [13], BATCH: [535/889], loss: 0.355, loss_box_reg: 0.106, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.013, loss_rpn_box_reg: 0.018
Saving checkpoint at epoch 13 train batch 535
[ 2023-10-08 00:57:03 ] Completed saving temp checkpoint 1,044.111 ms, 14.16 s total
[ 2023-10-08 00:57:03 ] Completed replacing temp checkpoint with checkpoint 39.285 ms, 14.19 s total
[ 2023-10-08 00:57:03 ] Completed Epoch: 13 batch 536: moving batch data to device 5.304 ms, 14.20 s total
[ 2023-10-08 00:57:03 ] Completed Epoch: 13 batch 536: forward pass 108.829 ms, 14.31 s total
[ 2023-10-08 00:57:03 ] Completed Epoch: 13 batch 536: backward pass 49.005 ms, 14.36 s total
[ 2023-10-08 00:57:03 ] Completed Epoch: 13 batch 536: computing loss 143.953 ms, 14.50 s total
EPOCH: [13], BATCH: [536/889], loss: 0.383, loss_box_reg: 0.117, loss_classifier: 0.098, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 536
[ 2023-10-08 00:57:05 ] Completed saving temp checkpoint 1,358.229 ms, 15.86 s total
[ 2023-10-08 00:57:05 ] Completed replacing temp checkpoint with checkpoint 90.491 ms, 15.95 s total
[ 2023-10-08 00:57:05 ] Completed Epoch: 13 batch 537: moving batch data to device 6.905 ms, 15.96 s total
[ 2023-10-08 00:57:05 ] Completed Epoch: 13 batch 537: forward pass 107.992 ms, 16.07 s total
[ 2023-10-08 00:57:05 ] Completed Epoch: 13 batch 537: backward pass 77.719 ms, 16.14 s total
[ 2023-10-08 00:57:05 ] Completed Epoch: 13 batch 537: computing loss 115.172 ms, 16.26 s total
EPOCH: [13], BATCH: [537/889], loss: 0.357, loss_box_reg: 0.106, loss_classifier: 0.083, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 537
[ 2023-10-08 00:57:07 ] Completed saving temp checkpoint 1,461.423 ms, 17.72 s total
[ 2023-10-08 00:57:07 ] Completed replacing temp checkpoint with checkpoint 87.624 ms, 17.81 s total
[ 2023-10-08 00:57:07 ] Completed Epoch: 13 batch 538: moving batch data to device 6.634 ms, 17.81 s total
[ 2023-10-08 00:57:07 ] Completed Epoch: 13 batch 538: forward pass 109.444 ms, 17.92 s total
[ 2023-10-08 00:57:07 ] Completed Epoch: 13 batch 538: backward pass 47.971 ms, 17.97 s total
[ 2023-10-08 00:57:07 ] Completed Epoch: 13 batch 538: computing loss 150.146 ms, 18.12 s total
EPOCH: [13], BATCH: [538/889], loss: 0.375, loss_box_reg: 0.113, loss_classifier: 0.090, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 538
[ 2023-10-08 00:57:09 ] Completed saving temp checkpoint 1,717.518 ms, 19.84 s total
[ 2023-10-08 00:57:09 ] Completed replacing temp checkpoint with checkpoint 73.948 ms, 19.91 s total
[ 2023-10-08 00:57:09 ] Completed Epoch: 13 batch 539: moving batch data to device 8.644 ms, 19.92 s total
[ 2023-10-08 00:57:09 ] Completed Epoch: 13 batch 539: forward pass 107.199 ms, 20.03 s total
[ 2023-10-08 00:57:09 ] Completed Epoch: 13 batch 539: backward pass 40.085 ms, 20.07 s total
[ 2023-10-08 00:57:09 ] Completed Epoch: 13 batch 539: computing loss 145.964 ms, 20.22 s total
EPOCH: [13], BATCH: [539/889], loss: 0.385, loss_box_reg: 0.117, loss_classifier: 0.096, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 539
[ 2023-10-08 00:57:10 ] Completed saving temp checkpoint 1,021.601 ms, 21.24 s total
[ 2023-10-08 00:57:10 ] Completed replacing temp checkpoint with checkpoint 61.268 ms, 21.30 s total
[ 2023-10-08 00:57:10 ] Completed Epoch: 13 batch 540: moving batch data to device 6.949 ms, 21.30 s total
[ 2023-10-08 00:57:10 ] Completed Epoch: 13 batch 540: forward pass 104.998 ms, 21.41 s total
[ 2023-10-08 00:57:10 ] Completed Epoch: 13 batch 540: backward pass 46.221 ms, 21.46 s total
[ 2023-10-08 00:57:11 ] Completed Epoch: 13 batch 540: computing loss 140.599 ms, 21.60 s total
EPOCH: [13], BATCH: [540/889], loss: 0.377, loss_box_reg: 0.119, loss_classifier: 0.098, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.019
Saving checkpoint at epoch 13 train batch 540
[ 2023-10-08 00:57:12 ] Completed saving temp checkpoint 1,501.016 ms, 23.10 s total
[ 2023-10-08 00:57:12 ] Completed replacing temp checkpoint with checkpoint 47.843 ms, 23.15 s total
[ 2023-10-08 00:57:12 ] Completed Epoch: 13 batch 541: moving batch data to device 4.997 ms, 23.15 s total
[ 2023-10-08 00:57:12 ] Completed Epoch: 13 batch 541: forward pass 103.622 ms, 23.25 s total
[ 2023-10-08 00:57:12 ] Completed Epoch: 13 batch 541: backward pass 73.379 ms, 23.33 s total
[ 2023-10-08 00:57:12 ] Completed Epoch: 13 batch 541: computing loss 117.636 ms, 23.45 s total
EPOCH: [13], BATCH: [541/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.123, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 541
[ 2023-10-08 00:57:14 ] Completed saving temp checkpoint 1,564.629 ms, 25.01 s total
[ 2023-10-08 00:57:14 ] Completed replacing temp checkpoint with checkpoint 82.025 ms, 25.09 s total
[ 2023-10-08 00:57:14 ] Completed Epoch: 13 batch 542: moving batch data to device 7.626 ms, 25.10 s total
[ 2023-10-08 00:57:14 ] Completed Epoch: 13 batch 542: forward pass 104.113 ms, 25.20 s total
[ 2023-10-08 00:57:14 ] Completed Epoch: 13 batch 542: backward pass 53.004 ms, 25.26 s total
[ 2023-10-08 00:57:14 ] Completed Epoch: 13 batch 542: computing loss 134.150 ms, 25.39 s total
EPOCH: [13], BATCH: [542/889], loss: 0.381, loss_box_reg: 0.115, loss_classifier: 0.089, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 542
[ 2023-10-08 00:57:16 ] Completed saving temp checkpoint 1,387.018 ms, 26.78 s total
[ 2023-10-08 00:57:16 ] Completed replacing temp checkpoint with checkpoint 86.421 ms, 26.86 s total
[ 2023-10-08 00:57:16 ] Completed Epoch: 13 batch 543: moving batch data to device 7.004 ms, 26.87 s total
[ 2023-10-08 00:57:16 ] Completed Epoch: 13 batch 543: forward pass 108.692 ms, 26.98 s total
[ 2023-10-08 00:57:16 ] Completed Epoch: 13 batch 543: backward pass 74.890 ms, 27.05 s total
[ 2023-10-08 00:57:16 ] Completed Epoch: 13 batch 543: computing loss 109.897 ms, 27.16 s total
EPOCH: [13], BATCH: [543/889], loss: 0.391, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 543
[ 2023-10-08 00:57:17 ] Completed saving temp checkpoint 1,010.338 ms, 28.17 s total
[ 2023-10-08 00:57:17 ] Completed replacing temp checkpoint with checkpoint 53.616 ms, 28.23 s total
[ 2023-10-08 00:57:17 ] Completed Epoch: 13 batch 544: moving batch data to device 8.224 ms, 28.24 s total
[ 2023-10-08 00:57:17 ] Completed Epoch: 13 batch 544: forward pass 102.136 ms, 28.34 s total
[ 2023-10-08 00:57:17 ] Completed Epoch: 13 batch 544: backward pass 46.314 ms, 28.39 s total
[ 2023-10-08 00:57:17 ] Completed Epoch: 13 batch 544: computing loss 145.723 ms, 28.53 s total
EPOCH: [13], BATCH: [544/889], loss: 0.392, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 544
[ 2023-10-08 00:57:19 ] Completed saving temp checkpoint 1,207.610 ms, 29.74 s total
[ 2023-10-08 00:57:19 ] Completed replacing temp checkpoint with checkpoint 79.166 ms, 29.82 s total
[ 2023-10-08 00:57:19 ] Completed Epoch: 13 batch 545: moving batch data to device 5.891 ms, 29.82 s total
[ 2023-10-08 00:57:19 ] Completed Epoch: 13 batch 545: forward pass 104.219 ms, 29.93 s total
[ 2023-10-08 00:57:19 ] Completed Epoch: 13 batch 545: backward pass 79.617 ms, 30.01 s total
[ 2023-10-08 00:57:19 ] Completed Epoch: 13 batch 545: computing loss 120.084 ms, 30.13 s total
EPOCH: [13], BATCH: [545/889], loss: 0.406, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.138, loss_objectness: 0.019, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 545
[ 2023-10-08 00:57:20 ] Completed saving temp checkpoint 957.788 ms, 31.09 s total
[ 2023-10-08 00:57:20 ] Completed replacing temp checkpoint with checkpoint 48.076 ms, 31.13 s total
[ 2023-10-08 00:57:20 ] Completed Epoch: 13 batch 546: moving batch data to device 6.862 ms, 31.14 s total
[ 2023-10-08 00:57:20 ] Completed Epoch: 13 batch 546: forward pass 161.391 ms, 31.30 s total
[ 2023-10-08 00:57:20 ] Completed Epoch: 13 batch 546: backward pass 34.958 ms, 31.34 s total
[ 2023-10-08 00:57:20 ] Completed Epoch: 13 batch 546: computing loss 149.884 ms, 31.49 s total
EPOCH: [13], BATCH: [546/889], loss: 0.390, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 546
[ 2023-10-08 00:57:22 ] Completed saving temp checkpoint 1,060.030 ms, 32.55 s total
[ 2023-10-08 00:57:22 ] Completed replacing temp checkpoint with checkpoint 71.367 ms, 32.62 s total
[ 2023-10-08 00:57:22 ] Completed Epoch: 13 batch 547: moving batch data to device 8.914 ms, 32.63 s total
[ 2023-10-08 00:57:22 ] Completed Epoch: 13 batch 547: forward pass 105.323 ms, 32.73 s total
[ 2023-10-08 00:57:22 ] Completed Epoch: 13 batch 547: backward pass 78.143 ms, 32.81 s total
[ 2023-10-08 00:57:22 ] Completed Epoch: 13 batch 547: computing loss 111.627 ms, 32.92 s total
EPOCH: [13], BATCH: [547/889], loss: 0.413, loss_box_reg: 0.129, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 547
[ 2023-10-08 00:57:23 ] Completed saving temp checkpoint 961.736 ms, 33.88 s total
[ 2023-10-08 00:57:23 ] Completed replacing temp checkpoint with checkpoint 68.827 ms, 33.95 s total
[ 2023-10-08 00:57:23 ] Completed Epoch: 13 batch 548: moving batch data to device 8.212 ms, 33.96 s total
[ 2023-10-08 00:57:23 ] Completed Epoch: 13 batch 548: forward pass 107.932 ms, 34.07 s total
[ 2023-10-08 00:57:23 ] Completed Epoch: 13 batch 548: backward pass 73.142 ms, 34.14 s total
[ 2023-10-08 00:57:23 ] Completed Epoch: 13 batch 548: computing loss 97.996 ms, 34.24 s total
EPOCH: [13], BATCH: [548/889], loss: 0.391, loss_box_reg: 0.124, loss_classifier: 0.094, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 548
[ 2023-10-08 00:57:24 ] Completed saving temp checkpoint 1,197.097 ms, 35.44 s total
[ 2023-10-08 00:57:24 ] Completed replacing temp checkpoint with checkpoint 70.085 ms, 35.51 s total
[ 2023-10-08 00:57:24 ] Completed Epoch: 13 batch 549: moving batch data to device 7.266 ms, 35.51 s total
[ 2023-10-08 00:57:25 ] Completed Epoch: 13 batch 549: forward pass 113.086 ms, 35.63 s total
[ 2023-10-08 00:57:25 ] Completed Epoch: 13 batch 549: backward pass 79.516 ms, 35.71 s total
[ 2023-10-08 00:57:25 ] Completed Epoch: 13 batch 549: computing loss 118.107 ms, 35.82 s total
EPOCH: [13], BATCH: [549/889], loss: 0.404, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 549
[ 2023-10-08 00:57:26 ] Completed saving temp checkpoint 1,013.694 ms, 36.84 s total
[ 2023-10-08 00:57:26 ] Completed replacing temp checkpoint with checkpoint 71.777 ms, 36.91 s total
[ 2023-10-08 00:57:26 ] Completed Epoch: 13 batch 550: moving batch data to device 8.742 ms, 36.92 s total
[ 2023-10-08 00:57:26 ] Completed Epoch: 13 batch 550: forward pass 101.601 ms, 37.02 s total
[ 2023-10-08 00:57:26 ] Completed Epoch: 13 batch 550: backward pass 46.305 ms, 37.07 s total
[ 2023-10-08 00:57:26 ] Completed Epoch: 13 batch 550: computing loss 122.384 ms, 37.19 s total
EPOCH: [13], BATCH: [550/889], loss: 0.363, loss_box_reg: 0.109, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 550
[ 2023-10-08 00:57:27 ] Completed saving temp checkpoint 1,150.058 ms, 38.34 s total
[ 2023-10-08 00:57:27 ] Completed replacing temp checkpoint with checkpoint 65.687 ms, 38.41 s total
[ 2023-10-08 00:57:27 ] Completed Epoch: 13 batch 551: moving batch data to device 6.235 ms, 38.41 s total
[ 2023-10-08 00:57:27 ] Completed Epoch: 13 batch 551: forward pass 105.059 ms, 38.52 s total
[ 2023-10-08 00:57:28 ] Completed Epoch: 13 batch 551: backward pass 52.921 ms, 38.57 s total
[ 2023-10-08 00:57:28 ] Completed Epoch: 13 batch 551: computing loss 144.337 ms, 38.71 s total
EPOCH: [13], BATCH: [551/889], loss: 0.381, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 551
[ 2023-10-08 00:57:29 ] Completed saving temp checkpoint 1,007.947 ms, 39.72 s total
[ 2023-10-08 00:57:29 ] Completed replacing temp checkpoint with checkpoint 70.279 ms, 39.79 s total
[ 2023-10-08 00:57:29 ] Completed Epoch: 13 batch 552: moving batch data to device 7.367 ms, 39.80 s total
[ 2023-10-08 00:57:29 ] Completed Epoch: 13 batch 552: forward pass 103.372 ms, 39.90 s total
[ 2023-10-08 00:57:29 ] Completed Epoch: 13 batch 552: backward pass 76.847 ms, 39.98 s total
[ 2023-10-08 00:57:29 ] Completed Epoch: 13 batch 552: computing loss 93.294 ms, 40.07 s total
EPOCH: [13], BATCH: [552/889], loss: 0.402, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.036
Saving checkpoint at epoch 13 train batch 552
[ 2023-10-08 00:57:30 ] Completed saving temp checkpoint 1,431.680 ms, 41.50 s total
[ 2023-10-08 00:57:31 ] Completed replacing temp checkpoint with checkpoint 79.337 ms, 41.58 s total
[ 2023-10-08 00:57:31 ] Completed Epoch: 13 batch 553: moving batch data to device 8.929 ms, 41.59 s total
[ 2023-10-08 00:57:31 ] Completed Epoch: 13 batch 553: forward pass 105.606 ms, 41.70 s total
[ 2023-10-08 00:57:31 ] Completed Epoch: 13 batch 553: backward pass 53.821 ms, 41.75 s total
[ 2023-10-08 00:57:31 ] Completed Epoch: 13 batch 553: computing loss 145.301 ms, 41.90 s total
EPOCH: [13], BATCH: [553/889], loss: 0.372, loss_box_reg: 0.117, loss_classifier: 0.087, loss_mask: 0.128, loss_objectness: 0.012, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 553
[ 2023-10-08 00:57:32 ] Completed saving temp checkpoint 1,531.088 ms, 43.43 s total
[ 2023-10-08 00:57:32 ] Completed replacing temp checkpoint with checkpoint 74.524 ms, 43.50 s total
[ 2023-10-08 00:57:32 ] Completed Epoch: 13 batch 554: moving batch data to device 6.400 ms, 43.51 s total
[ 2023-10-08 00:57:33 ] Completed Epoch: 13 batch 554: forward pass 104.496 ms, 43.61 s total
[ 2023-10-08 00:57:33 ] Completed Epoch: 13 batch 554: backward pass 69.084 ms, 43.68 s total
[ 2023-10-08 00:57:33 ] Completed Epoch: 13 batch 554: computing loss 122.880 ms, 43.81 s total
EPOCH: [13], BATCH: [554/889], loss: 0.364, loss_box_reg: 0.109, loss_classifier: 0.088, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 554
[ 2023-10-08 00:57:34 ] Completed saving temp checkpoint 1,600.222 ms, 45.41 s total
[ 2023-10-08 00:57:34 ] Completed replacing temp checkpoint with checkpoint 75.219 ms, 45.48 s total
[ 2023-10-08 00:57:34 ] Completed Epoch: 13 batch 555: moving batch data to device 6.236 ms, 45.49 s total
[ 2023-10-08 00:57:35 ] Completed Epoch: 13 batch 555: forward pass 106.072 ms, 45.59 s total
[ 2023-10-08 00:57:35 ] Completed Epoch: 13 batch 555: backward pass 78.416 ms, 45.67 s total
[ 2023-10-08 00:57:35 ] Completed Epoch: 13 batch 555: computing loss 94.276 ms, 45.77 s total
EPOCH: [13], BATCH: [555/889], loss: 0.387, loss_box_reg: 0.116, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 555
[ 2023-10-08 00:57:36 ] Completed saving temp checkpoint 1,273.717 ms, 47.04 s total
[ 2023-10-08 00:57:36 ] Completed replacing temp checkpoint with checkpoint 54.922 ms, 47.10 s total
[ 2023-10-08 00:57:36 ] Completed Epoch: 13 batch 556: moving batch data to device 7.510 ms, 47.10 s total
[ 2023-10-08 00:57:36 ] Completed Epoch: 13 batch 556: forward pass 105.232 ms, 47.21 s total
[ 2023-10-08 00:57:36 ] Completed Epoch: 13 batch 556: backward pass 48.763 ms, 47.26 s total
[ 2023-10-08 00:57:36 ] Completed Epoch: 13 batch 556: computing loss 145.268 ms, 47.40 s total
EPOCH: [13], BATCH: [556/889], loss: 0.388, loss_box_reg: 0.119, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 556
[ 2023-10-08 00:57:38 ] Completed saving temp checkpoint 2,038.376 ms, 49.44 s total
[ 2023-10-08 00:57:38 ] Completed replacing temp checkpoint with checkpoint 76.533 ms, 49.52 s total
[ 2023-10-08 00:57:38 ] Completed Epoch: 13 batch 557: moving batch data to device 5.730 ms, 49.52 s total
[ 2023-10-08 00:57:39 ] Completed Epoch: 13 batch 557: forward pass 105.052 ms, 49.63 s total
[ 2023-10-08 00:57:39 ] Completed Epoch: 13 batch 557: backward pass 60.647 ms, 49.69 s total
[ 2023-10-08 00:57:39 ] Completed Epoch: 13 batch 557: computing loss 136.984 ms, 49.83 s total
EPOCH: [13], BATCH: [557/889], loss: 0.403, loss_box_reg: 0.126, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 557
[ 2023-10-08 00:57:40 ] Completed saving temp checkpoint 1,156.396 ms, 50.98 s total
[ 2023-10-08 00:57:40 ] Completed replacing temp checkpoint with checkpoint 59.172 ms, 51.04 s total
[ 2023-10-08 00:57:40 ] Completed Epoch: 13 batch 558: moving batch data to device 8.366 ms, 51.05 s total
[ 2023-10-08 00:57:40 ] Completed Epoch: 13 batch 558: forward pass 104.926 ms, 51.15 s total
[ 2023-10-08 00:57:40 ] Completed Epoch: 13 batch 558: backward pass 73.769 ms, 51.23 s total
[ 2023-10-08 00:57:40 ] Completed Epoch: 13 batch 558: computing loss 116.492 ms, 51.34 s total
EPOCH: [13], BATCH: [558/889], loss: 0.392, loss_box_reg: 0.116, loss_classifier: 0.105, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 558
[ 2023-10-08 00:57:42 ] Completed saving temp checkpoint 1,252.143 ms, 52.60 s total
[ 2023-10-08 00:57:42 ] Completed replacing temp checkpoint with checkpoint 76.524 ms, 52.67 s total
[ 2023-10-08 00:57:42 ] Completed Epoch: 13 batch 559: moving batch data to device 6.568 ms, 52.68 s total
[ 2023-10-08 00:57:42 ] Completed Epoch: 13 batch 559: forward pass 102.443 ms, 52.78 s total
[ 2023-10-08 00:57:42 ] Completed Epoch: 13 batch 559: backward pass 67.602 ms, 52.85 s total
[ 2023-10-08 00:57:42 ] Completed Epoch: 13 batch 559: computing loss 119.724 ms, 52.97 s total
EPOCH: [13], BATCH: [559/889], loss: 0.378, loss_box_reg: 0.113, loss_classifier: 0.098, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 559
[ 2023-10-08 00:57:43 ] Completed saving temp checkpoint 1,103.174 ms, 54.07 s total
[ 2023-10-08 00:57:43 ] Completed replacing temp checkpoint with checkpoint 58.563 ms, 54.13 s total
[ 2023-10-08 00:57:43 ] Completed Epoch: 13 batch 560: moving batch data to device 5.359 ms, 54.14 s total
[ 2023-10-08 00:57:43 ] Completed Epoch: 13 batch 560: forward pass 106.597 ms, 54.24 s total
[ 2023-10-08 00:57:43 ] Completed Epoch: 13 batch 560: backward pass 34.179 ms, 54.28 s total
[ 2023-10-08 00:57:43 ] Completed Epoch: 13 batch 560: computing loss 133.553 ms, 54.41 s total
EPOCH: [13], BATCH: [560/889], loss: 0.354, loss_box_reg: 0.103, loss_classifier: 0.089, loss_mask: 0.121, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 560
[ 2023-10-08 00:57:45 ] Completed saving temp checkpoint 1,212.498 ms, 55.62 s total
[ 2023-10-08 00:57:45 ] Completed replacing temp checkpoint with checkpoint 78.619 ms, 55.70 s total
[ 2023-10-08 00:57:45 ] Completed Epoch: 13 batch 561: moving batch data to device 6.335 ms, 55.71 s total
[ 2023-10-08 00:57:45 ] Completed Epoch: 13 batch 561: forward pass 103.083 ms, 55.81 s total
[ 2023-10-08 00:57:45 ] Completed Epoch: 13 batch 561: backward pass 47.007 ms, 55.86 s total
[ 2023-10-08 00:57:45 ] Completed Epoch: 13 batch 561: computing loss 121.612 ms, 55.98 s total
EPOCH: [13], BATCH: [561/889], loss: 0.389, loss_box_reg: 0.112, loss_classifier: 0.097, loss_mask: 0.137, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 561
[ 2023-10-08 00:57:46 ] Completed saving temp checkpoint 1,094.193 ms, 57.07 s total
[ 2023-10-08 00:57:46 ] Completed replacing temp checkpoint with checkpoint 82.993 ms, 57.16 s total
[ 2023-10-08 00:57:46 ] Completed Epoch: 13 batch 562: moving batch data to device 8.324 ms, 57.17 s total
[ 2023-10-08 00:57:46 ] Completed Epoch: 13 batch 562: forward pass 104.639 ms, 57.27 s total
[ 2023-10-08 00:57:46 ] Completed Epoch: 13 batch 562: backward pass 75.239 ms, 57.35 s total
[ 2023-10-08 00:57:46 ] Completed Epoch: 13 batch 562: computing loss 115.966 ms, 57.46 s total
EPOCH: [13], BATCH: [562/889], loss: 0.386, loss_box_reg: 0.114, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.017, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 562
[ 2023-10-08 00:57:48 ] Completed saving temp checkpoint 1,239.831 ms, 58.70 s total
[ 2023-10-08 00:57:48 ] Completed replacing temp checkpoint with checkpoint 68.087 ms, 58.77 s total
[ 2023-10-08 00:57:48 ] Completed Epoch: 13 batch 563: moving batch data to device 4.628 ms, 58.77 s total
[ 2023-10-08 00:57:48 ] Completed Epoch: 13 batch 563: forward pass 104.559 ms, 58.88 s total
[ 2023-10-08 00:57:48 ] Completed Epoch: 13 batch 563: backward pass 80.304 ms, 58.96 s total
[ 2023-10-08 00:57:48 ] Completed Epoch: 13 batch 563: computing loss 115.155 ms, 59.07 s total
EPOCH: [13], BATCH: [563/889], loss: 0.377, loss_box_reg: 0.117, loss_classifier: 0.094, loss_mask: 0.123, loss_objectness: 0.018, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 563
[ 2023-10-08 00:57:49 ] Completed saving temp checkpoint 1,110.223 ms, 60.18 s total
[ 2023-10-08 00:57:49 ] Completed replacing temp checkpoint with checkpoint 47.891 ms, 60.23 s total
[ 2023-10-08 00:57:49 ] Completed Epoch: 13 batch 564: moving batch data to device 6.752 ms, 60.24 s total
[ 2023-10-08 00:57:49 ] Completed Epoch: 13 batch 564: forward pass 113.005 ms, 60.35 s total
[ 2023-10-08 00:57:49 ] Completed Epoch: 13 batch 564: backward pass 70.761 ms, 60.42 s total
[ 2023-10-08 00:57:49 ] Completed Epoch: 13 batch 564: computing loss 103.301 ms, 60.53 s total
EPOCH: [13], BATCH: [564/889], loss: 0.359, loss_box_reg: 0.107, loss_classifier: 0.087, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 564
[ 2023-10-08 00:57:51 ] Completed saving temp checkpoint 1,274.329 ms, 61.80 s total
[ 2023-10-08 00:57:51 ] Completed replacing temp checkpoint with checkpoint 89.751 ms, 61.89 s total
[ 2023-10-08 00:57:51 ] Completed Epoch: 13 batch 565: moving batch data to device 7.436 ms, 61.90 s total
[ 2023-10-08 00:57:51 ] Completed Epoch: 13 batch 565: forward pass 104.396 ms, 62.00 s total
[ 2023-10-08 00:57:51 ] Completed Epoch: 13 batch 565: backward pass 75.271 ms, 62.08 s total
[ 2023-10-08 00:57:51 ] Completed Epoch: 13 batch 565: computing loss 122.076 ms, 62.20 s total
EPOCH: [13], BATCH: [565/889], loss: 0.391, loss_box_reg: 0.122, loss_classifier: 0.099, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 565
[ 2023-10-08 00:57:53 ] Completed saving temp checkpoint 1,628.517 ms, 63.83 s total
[ 2023-10-08 00:57:53 ] Completed replacing temp checkpoint with checkpoint 87.532 ms, 63.92 s total
[ 2023-10-08 00:57:53 ] Completed Epoch: 13 batch 566: moving batch data to device 8.162 ms, 63.92 s total
[ 2023-10-08 00:57:53 ] Completed Epoch: 13 batch 566: forward pass 103.842 ms, 64.03 s total
[ 2023-10-08 00:57:53 ] Completed Epoch: 13 batch 566: backward pass 75.459 ms, 64.10 s total
[ 2023-10-08 00:57:53 ] Completed Epoch: 13 batch 566: computing loss 92.419 ms, 64.19 s total
EPOCH: [13], BATCH: [566/889], loss: 0.385, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 566
[ 2023-10-08 00:57:55 ] Completed saving temp checkpoint 1,939.362 ms, 66.13 s total
[ 2023-10-08 00:57:55 ] Completed replacing temp checkpoint with checkpoint 94.713 ms, 66.23 s total
[ 2023-10-08 00:57:55 ] Completed Epoch: 13 batch 567: moving batch data to device 8.366 ms, 66.24 s total
[ 2023-10-08 00:57:55 ] Completed Epoch: 13 batch 567: forward pass 107.416 ms, 66.34 s total
[ 2023-10-08 00:57:55 ] Completed Epoch: 13 batch 567: backward pass 73.205 ms, 66.42 s total
[ 2023-10-08 00:57:55 ] Completed Epoch: 13 batch 567: computing loss 113.341 ms, 66.53 s total
EPOCH: [13], BATCH: [567/889], loss: 0.412, loss_box_reg: 0.124, loss_classifier: 0.105, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 567
[ 2023-10-08 00:57:57 ] Completed saving temp checkpoint 1,456.273 ms, 67.99 s total
[ 2023-10-08 00:57:57 ] Completed replacing temp checkpoint with checkpoint 97.677 ms, 68.09 s total
[ 2023-10-08 00:57:57 ] Completed Epoch: 13 batch 568: moving batch data to device 8.245 ms, 68.09 s total
[ 2023-10-08 00:57:57 ] Completed Epoch: 13 batch 568: forward pass 104.925 ms, 68.20 s total
[ 2023-10-08 00:57:57 ] Completed Epoch: 13 batch 568: backward pass 70.493 ms, 68.27 s total
[ 2023-10-08 00:57:57 ] Completed Epoch: 13 batch 568: computing loss 126.705 ms, 68.40 s total
EPOCH: [13], BATCH: [568/889], loss: 0.375, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.131, loss_objectness: 0.013, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 568
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 01:11:11 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 01:11:11 ] Completed importing Timer 0.021 ms, 0.00 s total
[ 2023-10-08 01:11:12 ] Completed importing everything else 534.423 ms, 0.53 s total
[ 2023-10-08 01:11:12 ] Completed defined other functions 0.021 ms, 0.53 s total
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 5): env://
| distributed init (rank 3): env://
| distributed init (rank 2): env://
| distributed init (rank 4): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 01:11:14 ] Completed main preliminaries 2,915.570 ms, 3.45 s total
loading annotations into memory...
Done (t=11.25s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-08 01:11:28 ] Completed loading data 13,181.690 ms, 16.63 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 01:11:28 ] Completed creating data samplers 110.381 ms, 16.74 s total
[ 2023-10-08 01:11:28 ] Completed creating data loaders 0.228 ms, 16.74 s total
[ 2023-10-08 01:11:28 ] Completed creating model and .to(device) 667.427 ms, 17.41 s total
[ 2023-10-08 01:11:30 ] Completed preparing model for distributed training 1,531.996 ms, 18.94 s total
[ 2023-10-08 01:11:30 ] Completed optimizer and scaler 0.627 ms, 18.94 s total
[ 2023-10-08 01:11:30 ] Completed learning rate schedulers 0.279 ms, 18.94 s total
[ 2023-10-08 01:11:31 ] Completed init coco evaluator 972.331 ms, 19.91 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 01:11:32 ] Completed retrieving checkpoint 817.981 ms, 20.73 s total
EPOCH :: 13
[ 2023-10-08 01:11:32 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 01:11:32 ] Completed training preliminaries 0.880 ms, 0.00 s total
Training / resuming epoch 13 from training step 568
[ 2023-10-08 01:11:32 ] Completed Epoch: 13 batch 568: moving batch data to device 557.138 ms, 0.56 s total
[ 2023-10-08 01:11:33 ] Completed Epoch: 13 batch 568: forward pass 1,197.166 ms, 1.76 s total
[ 2023-10-08 01:11:34 ] Completed Epoch: 13 batch 568: backward pass 181.581 ms, 1.94 s total
[ 2023-10-08 01:11:34 ] Completed Epoch: 13 batch 568: computing loss 160.286 ms, 2.10 s total
EPOCH: [13], BATCH: [568/889], loss: 0.374, loss_box_reg: 0.112, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 568
[ 2023-10-08 01:11:35 ] Completed saving temp checkpoint 1,027.241 ms, 3.12 s total
[ 2023-10-08 01:11:35 ] Completed replacing temp checkpoint with checkpoint 150.079 ms, 3.27 s total
[ 2023-10-08 01:11:35 ] Completed Epoch: 13 batch 569: moving batch data to device 3.561 ms, 3.28 s total
[ 2023-10-08 01:11:35 ] Completed Epoch: 13 batch 569: forward pass 108.663 ms, 3.39 s total
[ 2023-10-08 01:11:35 ] Completed Epoch: 13 batch 569: backward pass 123.645 ms, 3.51 s total
[ 2023-10-08 01:11:35 ] Completed Epoch: 13 batch 569: computing loss 99.372 ms, 3.61 s total
EPOCH: [13], BATCH: [569/889], loss: 0.380, loss_box_reg: 0.116, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 569
[ 2023-10-08 01:11:36 ] Completed saving temp checkpoint 1,039.206 ms, 4.65 s total
[ 2023-10-08 01:11:36 ] Completed replacing temp checkpoint with checkpoint 70.962 ms, 4.72 s total
[ 2023-10-08 01:11:36 ] Completed Epoch: 13 batch 570: moving batch data to device 3.317 ms, 4.72 s total
[ 2023-10-08 01:11:37 ] Completed Epoch: 13 batch 570: forward pass 109.464 ms, 4.83 s total
[ 2023-10-08 01:11:37 ] Completed Epoch: 13 batch 570: backward pass 82.495 ms, 4.92 s total
[ 2023-10-08 01:11:37 ] Completed Epoch: 13 batch 570: computing loss 138.979 ms, 5.05 s total
EPOCH: [13], BATCH: [570/889], loss: 0.415, loss_box_reg: 0.127, loss_classifier: 0.099, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 570
[ 2023-10-08 01:11:38 ] Completed saving temp checkpoint 1,118.625 ms, 6.17 s total
[ 2023-10-08 01:11:38 ] Completed replacing temp checkpoint with checkpoint 47.203 ms, 6.22 s total
[ 2023-10-08 01:11:38 ] Completed Epoch: 13 batch 571: moving batch data to device 7.374 ms, 6.23 s total
[ 2023-10-08 01:11:38 ] Completed Epoch: 13 batch 571: forward pass 103.848 ms, 6.33 s total
[ 2023-10-08 01:11:38 ] Completed Epoch: 13 batch 571: backward pass 49.906 ms, 6.38 s total
[ 2023-10-08 01:11:38 ] Completed Epoch: 13 batch 571: computing loss 201.311 ms, 6.58 s total
EPOCH: [13], BATCH: [571/889], loss: 0.364, loss_box_reg: 0.105, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.012, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 571
[ 2023-10-08 01:11:39 ] Completed saving temp checkpoint 934.523 ms, 7.52 s total
[ 2023-10-08 01:11:39 ] Completed replacing temp checkpoint with checkpoint 66.131 ms, 7.58 s total
[ 2023-10-08 01:11:39 ] Completed Epoch: 13 batch 572: moving batch data to device 11.880 ms, 7.59 s total
[ 2023-10-08 01:11:39 ] Completed Epoch: 13 batch 572: forward pass 114.110 ms, 7.71 s total
[ 2023-10-08 01:11:40 ] Completed Epoch: 13 batch 572: backward pass 76.141 ms, 7.79 s total
[ 2023-10-08 01:11:40 ] Completed Epoch: 13 batch 572: computing loss 142.399 ms, 7.93 s total
EPOCH: [13], BATCH: [572/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.094, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 572
[ 2023-10-08 01:11:41 ] Completed saving temp checkpoint 1,104.875 ms, 9.03 s total
[ 2023-10-08 01:11:41 ] Completed replacing temp checkpoint with checkpoint 75.413 ms, 9.11 s total
[ 2023-10-08 01:11:41 ] Completed Epoch: 13 batch 573: moving batch data to device 10.309 ms, 9.12 s total
[ 2023-10-08 01:11:41 ] Completed Epoch: 13 batch 573: forward pass 108.839 ms, 9.23 s total
[ 2023-10-08 01:11:41 ] Completed Epoch: 13 batch 573: backward pass 79.659 ms, 9.31 s total
[ 2023-10-08 01:11:41 ] Completed Epoch: 13 batch 573: computing loss 115.833 ms, 9.42 s total
EPOCH: [13], BATCH: [573/889], loss: 0.402, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.129, loss_objectness: 0.020, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 573
[ 2023-10-08 01:11:42 ] Completed saving temp checkpoint 867.507 ms, 10.29 s total
[ 2023-10-08 01:11:42 ] Completed replacing temp checkpoint with checkpoint 61.220 ms, 10.35 s total
[ 2023-10-08 01:11:42 ] Completed Epoch: 13 batch 574: moving batch data to device 4.876 ms, 10.36 s total
[ 2023-10-08 01:11:42 ] Completed Epoch: 13 batch 574: forward pass 105.876 ms, 10.46 s total
[ 2023-10-08 01:11:42 ] Completed Epoch: 13 batch 574: backward pass 81.307 ms, 10.54 s total
[ 2023-10-08 01:11:42 ] Completed Epoch: 13 batch 574: computing loss 130.309 ms, 10.67 s total
EPOCH: [13], BATCH: [574/889], loss: 0.394, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.014, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 574
[ 2023-10-08 01:11:43 ] Completed saving temp checkpoint 990.162 ms, 11.66 s total
[ 2023-10-08 01:11:43 ] Completed replacing temp checkpoint with checkpoint 69.417 ms, 11.73 s total
[ 2023-10-08 01:11:43 ] Completed Epoch: 13 batch 575: moving batch data to device 9.214 ms, 11.74 s total
[ 2023-10-08 01:11:44 ] Completed Epoch: 13 batch 575: forward pass 107.722 ms, 11.85 s total
[ 2023-10-08 01:11:44 ] Completed Epoch: 13 batch 575: backward pass 77.984 ms, 11.93 s total
[ 2023-10-08 01:11:44 ] Completed Epoch: 13 batch 575: computing loss 117.707 ms, 12.05 s total
EPOCH: [13], BATCH: [575/889], loss: 0.346, loss_box_reg: 0.102, loss_classifier: 0.090, loss_mask: 0.120, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 575
[ 2023-10-08 01:11:45 ] Completed saving temp checkpoint 750.570 ms, 12.80 s total
[ 2023-10-08 01:11:45 ] Completed replacing temp checkpoint with checkpoint 58.775 ms, 12.86 s total
[ 2023-10-08 01:11:45 ] Completed Epoch: 13 batch 576: moving batch data to device 3.474 ms, 12.86 s total
[ 2023-10-08 01:11:45 ] Completed Epoch: 13 batch 576: forward pass 105.797 ms, 12.96 s total
[ 2023-10-08 01:11:45 ] Completed Epoch: 13 batch 576: backward pass 48.613 ms, 13.01 s total
[ 2023-10-08 01:11:45 ] Completed Epoch: 13 batch 576: computing loss 148.538 ms, 13.16 s total
EPOCH: [13], BATCH: [576/889], loss: 0.419, loss_box_reg: 0.128, loss_classifier: 0.109, loss_mask: 0.141, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 576
[ 2023-10-08 01:11:46 ] Completed saving temp checkpoint 1,010.494 ms, 14.17 s total
[ 2023-10-08 01:11:46 ] Completed replacing temp checkpoint with checkpoint 26.290 ms, 14.20 s total
[ 2023-10-08 01:11:46 ] Completed Epoch: 13 batch 577: moving batch data to device 5.020 ms, 14.20 s total
[ 2023-10-08 01:11:46 ] Completed Epoch: 13 batch 577: forward pass 104.438 ms, 14.31 s total
[ 2023-10-08 01:11:46 ] Completed Epoch: 13 batch 577: backward pass 32.365 ms, 14.34 s total
[ 2023-10-08 01:11:46 ] Completed Epoch: 13 batch 577: computing loss 177.551 ms, 14.52 s total
EPOCH: [13], BATCH: [577/889], loss: 0.380, loss_box_reg: 0.112, loss_classifier: 0.093, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 577
[ 2023-10-08 01:11:47 ] Completed saving temp checkpoint 766.758 ms, 15.28 s total
[ 2023-10-08 01:11:47 ] Completed replacing temp checkpoint with checkpoint 67.515 ms, 15.35 s total
[ 2023-10-08 01:11:47 ] Completed Epoch: 13 batch 578: moving batch data to device 9.923 ms, 15.36 s total
[ 2023-10-08 01:11:47 ] Completed Epoch: 13 batch 578: forward pass 107.061 ms, 15.47 s total
[ 2023-10-08 01:11:47 ] Completed Epoch: 13 batch 578: backward pass 80.133 ms, 15.55 s total
[ 2023-10-08 01:11:47 ] Completed Epoch: 13 batch 578: computing loss 112.669 ms, 15.66 s total
EPOCH: [13], BATCH: [578/889], loss: 0.394, loss_box_reg: 0.121, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 578
[ 2023-10-08 01:11:49 ] Completed saving temp checkpoint 1,131.025 ms, 16.79 s total
[ 2023-10-08 01:11:49 ] Completed replacing temp checkpoint with checkpoint 78.804 ms, 16.87 s total
[ 2023-10-08 01:11:49 ] Completed Epoch: 13 batch 579: moving batch data to device 8.305 ms, 16.88 s total
[ 2023-10-08 01:11:49 ] Completed Epoch: 13 batch 579: forward pass 104.908 ms, 16.98 s total
[ 2023-10-08 01:11:49 ] Completed Epoch: 13 batch 579: backward pass 84.302 ms, 17.07 s total
[ 2023-10-08 01:11:49 ] Completed Epoch: 13 batch 579: computing loss 113.865 ms, 17.18 s total
EPOCH: [13], BATCH: [579/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.112, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 579
[ 2023-10-08 01:11:51 ] Completed saving temp checkpoint 1,611.714 ms, 18.79 s total
[ 2023-10-08 01:11:51 ] Completed replacing temp checkpoint with checkpoint 90.917 ms, 18.89 s total
[ 2023-10-08 01:11:51 ] Completed Epoch: 13 batch 580: moving batch data to device 10.034 ms, 18.90 s total
[ 2023-10-08 01:11:51 ] Completed Epoch: 13 batch 580: forward pass 107.275 ms, 19.00 s total
[ 2023-10-08 01:11:51 ] Completed Epoch: 13 batch 580: backward pass 59.885 ms, 19.06 s total
[ 2023-10-08 01:11:51 ] Completed Epoch: 13 batch 580: computing loss 132.287 ms, 19.20 s total
EPOCH: [13], BATCH: [580/889], loss: 0.389, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 580
[ 2023-10-08 01:11:53 ] Completed saving temp checkpoint 1,644.729 ms, 20.84 s total
[ 2023-10-08 01:11:53 ] Completed replacing temp checkpoint with checkpoint 93.452 ms, 20.93 s total
[ 2023-10-08 01:11:53 ] Completed Epoch: 13 batch 581: moving batch data to device 7.687 ms, 20.94 s total
[ 2023-10-08 01:11:53 ] Completed Epoch: 13 batch 581: forward pass 111.275 ms, 21.05 s total
[ 2023-10-08 01:11:53 ] Completed Epoch: 13 batch 581: backward pass 77.631 ms, 21.13 s total
[ 2023-10-08 01:11:53 ] Completed Epoch: 13 batch 581: computing loss 122.912 ms, 21.25 s total
EPOCH: [13], BATCH: [581/889], loss: 0.387, loss_box_reg: 0.114, loss_classifier: 0.101, loss_mask: 0.127, loss_objectness: 0.019, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 581
[ 2023-10-08 01:11:55 ] Completed saving temp checkpoint 1,673.379 ms, 22.93 s total
[ 2023-10-08 01:11:55 ] Completed replacing temp checkpoint with checkpoint 94.777 ms, 23.02 s total
[ 2023-10-08 01:11:55 ] Completed Epoch: 13 batch 582: moving batch data to device 8.137 ms, 23.03 s total
[ 2023-10-08 01:11:55 ] Completed Epoch: 13 batch 582: forward pass 105.868 ms, 23.13 s total
[ 2023-10-08 01:11:55 ] Completed Epoch: 13 batch 582: backward pass 75.883 ms, 23.21 s total
[ 2023-10-08 01:11:55 ] Completed Epoch: 13 batch 582: computing loss 113.943 ms, 23.32 s total
EPOCH: [13], BATCH: [582/889], loss: 0.377, loss_box_reg: 0.115, loss_classifier: 0.095, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 582
[ 2023-10-08 01:11:57 ] Completed saving temp checkpoint 1,728.639 ms, 25.05 s total
[ 2023-10-08 01:11:57 ] Completed replacing temp checkpoint with checkpoint 110.835 ms, 25.16 s total
[ 2023-10-08 01:11:57 ] Completed Epoch: 13 batch 583: moving batch data to device 6.112 ms, 25.17 s total
[ 2023-10-08 01:11:57 ] Completed Epoch: 13 batch 583: forward pass 102.162 ms, 25.27 s total
[ 2023-10-08 01:11:57 ] Completed Epoch: 13 batch 583: backward pass 45.952 ms, 25.32 s total
[ 2023-10-08 01:11:57 ] Completed Epoch: 13 batch 583: computing loss 127.159 ms, 25.45 s total
EPOCH: [13], BATCH: [583/889], loss: 0.389, loss_box_reg: 0.119, loss_classifier: 0.102, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 583
[ 2023-10-08 01:11:58 ] Completed saving temp checkpoint 1,247.320 ms, 26.69 s total
[ 2023-10-08 01:11:58 ] Completed replacing temp checkpoint with checkpoint 77.278 ms, 26.77 s total
[ 2023-10-08 01:11:59 ] Completed Epoch: 13 batch 584: moving batch data to device 9.156 ms, 26.78 s total
[ 2023-10-08 01:11:59 ] Completed Epoch: 13 batch 584: forward pass 105.135 ms, 26.88 s total
[ 2023-10-08 01:11:59 ] Completed Epoch: 13 batch 584: backward pass 76.140 ms, 26.96 s total
[ 2023-10-08 01:11:59 ] Completed Epoch: 13 batch 584: computing loss 120.720 ms, 27.08 s total
EPOCH: [13], BATCH: [584/889], loss: 0.361, loss_box_reg: 0.111, loss_classifier: 0.089, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 584
[ 2023-10-08 01:12:00 ] Completed saving temp checkpoint 1,316.857 ms, 28.40 s total
[ 2023-10-08 01:12:00 ] Completed replacing temp checkpoint with checkpoint 67.737 ms, 28.47 s total
[ 2023-10-08 01:12:00 ] Completed Epoch: 13 batch 585: moving batch data to device 9.579 ms, 28.48 s total
[ 2023-10-08 01:12:00 ] Completed Epoch: 13 batch 585: forward pass 107.895 ms, 28.58 s total
[ 2023-10-08 01:12:00 ] Completed Epoch: 13 batch 585: backward pass 78.504 ms, 28.66 s total
[ 2023-10-08 01:12:01 ] Completed Epoch: 13 batch 585: computing loss 113.883 ms, 28.78 s total
EPOCH: [13], BATCH: [585/889], loss: 0.357, loss_box_reg: 0.106, loss_classifier: 0.087, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 585
[ 2023-10-08 01:12:02 ] Completed saving temp checkpoint 1,357.861 ms, 30.13 s total
[ 2023-10-08 01:12:02 ] Completed replacing temp checkpoint with checkpoint 90.047 ms, 30.22 s total
[ 2023-10-08 01:12:02 ] Completed Epoch: 13 batch 586: moving batch data to device 10.242 ms, 30.23 s total
[ 2023-10-08 01:12:02 ] Completed Epoch: 13 batch 586: forward pass 110.906 ms, 30.34 s total
[ 2023-10-08 01:12:02 ] Completed Epoch: 13 batch 586: backward pass 76.100 ms, 30.42 s total
[ 2023-10-08 01:12:02 ] Completed Epoch: 13 batch 586: computing loss 120.519 ms, 30.54 s total
EPOCH: [13], BATCH: [586/889], loss: 0.417, loss_box_reg: 0.125, loss_classifier: 0.106, loss_mask: 0.139, loss_objectness: 0.019, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 586
[ 2023-10-08 01:12:04 ] Completed saving temp checkpoint 1,891.706 ms, 32.43 s total
[ 2023-10-08 01:12:04 ] Completed replacing temp checkpoint with checkpoint 81.538 ms, 32.51 s total
[ 2023-10-08 01:12:04 ] Completed Epoch: 13 batch 587: moving batch data to device 8.589 ms, 32.52 s total
[ 2023-10-08 01:12:04 ] Completed Epoch: 13 batch 587: forward pass 110.865 ms, 32.63 s total
[ 2023-10-08 01:12:04 ] Completed Epoch: 13 batch 587: backward pass 37.667 ms, 32.67 s total
[ 2023-10-08 01:12:05 ] Completed Epoch: 13 batch 587: computing loss 152.509 ms, 32.82 s total
EPOCH: [13], BATCH: [587/889], loss: 0.374, loss_box_reg: 0.105, loss_classifier: 0.090, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 587
[ 2023-10-08 01:12:06 ] Completed saving temp checkpoint 1,229.002 ms, 34.05 s total
[ 2023-10-08 01:12:06 ] Completed replacing temp checkpoint with checkpoint 84.910 ms, 34.14 s total
[ 2023-10-08 01:12:06 ] Completed Epoch: 13 batch 588: moving batch data to device 8.266 ms, 34.15 s total
[ 2023-10-08 01:12:06 ] Completed Epoch: 13 batch 588: forward pass 109.315 ms, 34.26 s total
[ 2023-10-08 01:12:06 ] Completed Epoch: 13 batch 588: backward pass 91.337 ms, 34.35 s total
[ 2023-10-08 01:12:06 ] Completed Epoch: 13 batch 588: computing loss 109.022 ms, 34.46 s total
EPOCH: [13], BATCH: [588/889], loss: 0.357, loss_box_reg: 0.102, loss_classifier: 0.090, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 588
[ 2023-10-08 01:12:08 ] Completed saving temp checkpoint 1,324.829 ms, 35.78 s total
[ 2023-10-08 01:12:08 ] Completed replacing temp checkpoint with checkpoint 82.453 ms, 35.86 s total
[ 2023-10-08 01:12:08 ] Completed Epoch: 13 batch 589: moving batch data to device 8.088 ms, 35.87 s total
[ 2023-10-08 01:12:08 ] Completed Epoch: 13 batch 589: forward pass 103.242 ms, 35.97 s total
[ 2023-10-08 01:12:08 ] Completed Epoch: 13 batch 589: backward pass 75.214 ms, 36.05 s total
[ 2023-10-08 01:12:08 ] Completed Epoch: 13 batch 589: computing loss 121.111 ms, 36.17 s total
EPOCH: [13], BATCH: [589/889], loss: 0.359, loss_box_reg: 0.103, loss_classifier: 0.089, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 589
[ 2023-10-08 01:12:09 ] Completed saving temp checkpoint 1,186.674 ms, 37.36 s total
[ 2023-10-08 01:12:09 ] Completed replacing temp checkpoint with checkpoint 59.718 ms, 37.42 s total
[ 2023-10-08 01:12:09 ] Completed Epoch: 13 batch 590: moving batch data to device 7.829 ms, 37.43 s total
[ 2023-10-08 01:12:09 ] Completed Epoch: 13 batch 590: forward pass 109.138 ms, 37.53 s total
[ 2023-10-08 01:12:09 ] Completed Epoch: 13 batch 590: backward pass 40.407 ms, 37.57 s total
[ 2023-10-08 01:12:09 ] Completed Epoch: 13 batch 590: computing loss 154.567 ms, 37.73 s total
EPOCH: [13], BATCH: [590/889], loss: 0.408, loss_box_reg: 0.122, loss_classifier: 0.107, loss_mask: 0.135, loss_objectness: 0.019, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 590
[ 2023-10-08 01:12:11 ] Completed saving temp checkpoint 1,306.854 ms, 39.04 s total
[ 2023-10-08 01:12:11 ] Completed replacing temp checkpoint with checkpoint 87.718 ms, 39.12 s total
[ 2023-10-08 01:12:11 ] Completed Epoch: 13 batch 591: moving batch data to device 7.465 ms, 39.13 s total
[ 2023-10-08 01:12:11 ] Completed Epoch: 13 batch 591: forward pass 101.864 ms, 39.23 s total
[ 2023-10-08 01:12:11 ] Completed Epoch: 13 batch 591: backward pass 70.738 ms, 39.30 s total
[ 2023-10-08 01:12:11 ] Completed Epoch: 13 batch 591: computing loss 127.863 ms, 39.43 s total
EPOCH: [13], BATCH: [591/889], loss: 0.354, loss_box_reg: 0.106, loss_classifier: 0.088, loss_mask: 0.122, loss_objectness: 0.017, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 591
[ 2023-10-08 01:12:12 ] Completed saving temp checkpoint 1,182.973 ms, 40.61 s total
[ 2023-10-08 01:12:12 ] Completed replacing temp checkpoint with checkpoint 73.187 ms, 40.69 s total
[ 2023-10-08 01:12:12 ] Completed Epoch: 13 batch 592: moving batch data to device 11.136 ms, 40.70 s total
[ 2023-10-08 01:12:13 ] Completed Epoch: 13 batch 592: forward pass 107.166 ms, 40.81 s total
[ 2023-10-08 01:12:13 ] Completed Epoch: 13 batch 592: backward pass 48.256 ms, 40.85 s total
[ 2023-10-08 01:12:13 ] Completed Epoch: 13 batch 592: computing loss 122.000 ms, 40.98 s total
EPOCH: [13], BATCH: [592/889], loss: 0.393, loss_box_reg: 0.115, loss_classifier: 0.094, loss_mask: 0.138, loss_objectness: 0.013, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 592
[ 2023-10-08 01:12:14 ] Completed saving temp checkpoint 1,339.093 ms, 42.32 s total
[ 2023-10-08 01:12:14 ] Completed replacing temp checkpoint with checkpoint 63.247 ms, 42.38 s total
[ 2023-10-08 01:12:14 ] Completed Epoch: 13 batch 593: moving batch data to device 7.692 ms, 42.39 s total
[ 2023-10-08 01:12:14 ] Completed Epoch: 13 batch 593: forward pass 105.662 ms, 42.49 s total
[ 2023-10-08 01:12:14 ] Completed Epoch: 13 batch 593: backward pass 70.429 ms, 42.56 s total
[ 2023-10-08 01:12:14 ] Completed Epoch: 13 batch 593: computing loss 123.564 ms, 42.69 s total
EPOCH: [13], BATCH: [593/889], loss: 0.377, loss_box_reg: 0.111, loss_classifier: 0.096, loss_mask: 0.124, loss_objectness: 0.017, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 593
[ 2023-10-08 01:12:16 ] Completed saving temp checkpoint 1,306.865 ms, 43.99 s total
[ 2023-10-08 01:12:16 ] Completed replacing temp checkpoint with checkpoint 88.008 ms, 44.08 s total
[ 2023-10-08 01:12:16 ] Completed Epoch: 13 batch 594: moving batch data to device 6.826 ms, 44.09 s total
[ 2023-10-08 01:12:16 ] Completed Epoch: 13 batch 594: forward pass 112.707 ms, 44.20 s total
[ 2023-10-08 01:12:16 ] Completed Epoch: 13 batch 594: backward pass 81.069 ms, 44.28 s total
[ 2023-10-08 01:12:16 ] Completed Epoch: 13 batch 594: computing loss 107.658 ms, 44.39 s total
EPOCH: [13], BATCH: [594/889], loss: 0.400, loss_box_reg: 0.118, loss_classifier: 0.101, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 594
[ 2023-10-08 01:12:18 ] Completed saving temp checkpoint 1,589.597 ms, 45.98 s total
[ 2023-10-08 01:12:18 ] Completed replacing temp checkpoint with checkpoint 74.895 ms, 46.05 s total
[ 2023-10-08 01:12:18 ] Completed Epoch: 13 batch 595: moving batch data to device 8.955 ms, 46.06 s total
[ 2023-10-08 01:12:18 ] Completed Epoch: 13 batch 595: forward pass 102.294 ms, 46.17 s total
[ 2023-10-08 01:12:18 ] Completed Epoch: 13 batch 595: backward pass 53.456 ms, 46.22 s total
[ 2023-10-08 01:12:18 ] Completed Epoch: 13 batch 595: computing loss 136.591 ms, 46.36 s total
EPOCH: [13], BATCH: [595/889], loss: 0.378, loss_box_reg: 0.112, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 595
[ 2023-10-08 01:12:20 ] Completed saving temp checkpoint 1,665.882 ms, 48.02 s total
[ 2023-10-08 01:12:20 ] Completed replacing temp checkpoint with checkpoint 94.410 ms, 48.12 s total
[ 2023-10-08 01:12:20 ] Completed Epoch: 13 batch 596: moving batch data to device 8.423 ms, 48.12 s total
[ 2023-10-08 01:12:20 ] Completed Epoch: 13 batch 596: forward pass 105.381 ms, 48.23 s total
[ 2023-10-08 01:12:20 ] Completed Epoch: 13 batch 596: backward pass 82.793 ms, 48.31 s total
[ 2023-10-08 01:12:20 ] Completed Epoch: 13 batch 596: computing loss 107.747 ms, 48.42 s total
EPOCH: [13], BATCH: [596/889], loss: 0.391, loss_box_reg: 0.116, loss_classifier: 0.100, loss_mask: 0.127, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 596
[ 2023-10-08 01:12:21 ] Completed saving temp checkpoint 1,299.089 ms, 49.72 s total
[ 2023-10-08 01:12:22 ] Completed replacing temp checkpoint with checkpoint 58.940 ms, 49.78 s total
[ 2023-10-08 01:12:22 ] Completed Epoch: 13 batch 597: moving batch data to device 5.034 ms, 49.78 s total
[ 2023-10-08 01:12:22 ] Completed Epoch: 13 batch 597: forward pass 101.933 ms, 49.88 s total
[ 2023-10-08 01:12:22 ] Completed Epoch: 13 batch 597: backward pass 73.037 ms, 49.96 s total
[ 2023-10-08 01:12:22 ] Completed Epoch: 13 batch 597: computing loss 122.322 ms, 50.08 s total
EPOCH: [13], BATCH: [597/889], loss: 0.353, loss_box_reg: 0.106, loss_classifier: 0.083, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 597
[ 2023-10-08 01:12:23 ] Completed saving temp checkpoint 1,098.803 ms, 51.18 s total
[ 2023-10-08 01:12:23 ] Completed replacing temp checkpoint with checkpoint 55.744 ms, 51.23 s total
[ 2023-10-08 01:12:23 ] Completed Epoch: 13 batch 598: moving batch data to device 6.926 ms, 51.24 s total
[ 2023-10-08 01:12:23 ] Completed Epoch: 13 batch 598: forward pass 105.390 ms, 51.35 s total
[ 2023-10-08 01:12:23 ] Completed Epoch: 13 batch 598: backward pass 70.342 ms, 51.42 s total
[ 2023-10-08 01:12:23 ] Completed Epoch: 13 batch 598: computing loss 123.955 ms, 51.54 s total
EPOCH: [13], BATCH: [598/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 598
[ 2023-10-08 01:12:25 ] Completed saving temp checkpoint 1,438.667 ms, 52.98 s total
[ 2023-10-08 01:12:25 ] Completed replacing temp checkpoint with checkpoint 73.674 ms, 53.05 s total
[ 2023-10-08 01:12:25 ] Completed Epoch: 13 batch 599: moving batch data to device 5.434 ms, 53.06 s total
[ 2023-10-08 01:12:25 ] Completed Epoch: 13 batch 599: forward pass 109.976 ms, 53.17 s total
[ 2023-10-08 01:12:25 ] Completed Epoch: 13 batch 599: backward pass 38.510 ms, 53.21 s total
[ 2023-10-08 01:12:25 ] Completed Epoch: 13 batch 599: computing loss 156.151 ms, 53.36 s total
EPOCH: [13], BATCH: [599/889], loss: 0.394, loss_box_reg: 0.120, loss_classifier: 0.096, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 599
[ 2023-10-08 01:12:26 ] Completed saving temp checkpoint 1,381.586 ms, 54.75 s total
[ 2023-10-08 01:12:27 ] Completed replacing temp checkpoint with checkpoint 82.685 ms, 54.83 s total
[ 2023-10-08 01:12:27 ] Completed Epoch: 13 batch 600: moving batch data to device 7.578 ms, 54.84 s total
[ 2023-10-08 01:12:27 ] Completed Epoch: 13 batch 600: forward pass 107.572 ms, 54.94 s total
[ 2023-10-08 01:12:27 ] Completed Epoch: 13 batch 600: backward pass 32.713 ms, 54.98 s total
[ 2023-10-08 01:12:27 ] Completed Epoch: 13 batch 600: computing loss 161.054 ms, 55.14 s total
EPOCH: [13], BATCH: [600/889], loss: 0.376, loss_box_reg: 0.114, loss_classifier: 0.090, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 600
[ 2023-10-08 01:12:28 ] Completed saving temp checkpoint 1,225.152 ms, 56.36 s total
[ 2023-10-08 01:12:28 ] Completed replacing temp checkpoint with checkpoint 52.844 ms, 56.41 s total
[ 2023-10-08 01:12:28 ] Completed Epoch: 13 batch 601: moving batch data to device 5.269 ms, 56.42 s total
[ 2023-10-08 01:12:28 ] Completed Epoch: 13 batch 601: forward pass 105.407 ms, 56.53 s total
[ 2023-10-08 01:12:28 ] Completed Epoch: 13 batch 601: backward pass 71.415 ms, 56.60 s total
[ 2023-10-08 01:12:28 ] Completed Epoch: 13 batch 601: computing loss 119.052 ms, 56.72 s total
EPOCH: [13], BATCH: [601/889], loss: 0.390, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 601
[ 2023-10-08 01:12:29 ] Completed saving temp checkpoint 1,030.369 ms, 57.75 s total
[ 2023-10-08 01:12:30 ] Completed replacing temp checkpoint with checkpoint 57.513 ms, 57.80 s total
[ 2023-10-08 01:12:30 ] Completed Epoch: 13 batch 602: moving batch data to device 5.043 ms, 57.81 s total
[ 2023-10-08 01:12:30 ] Completed Epoch: 13 batch 602: forward pass 104.305 ms, 57.91 s total
[ 2023-10-08 01:12:30 ] Completed Epoch: 13 batch 602: backward pass 36.314 ms, 57.95 s total
[ 2023-10-08 01:12:30 ] Completed Epoch: 13 batch 602: computing loss 158.053 ms, 58.11 s total
EPOCH: [13], BATCH: [602/889], loss: 0.389, loss_box_reg: 0.126, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 602
[ 2023-10-08 01:12:31 ] Completed saving temp checkpoint 1,102.465 ms, 59.21 s total
[ 2023-10-08 01:12:31 ] Completed replacing temp checkpoint with checkpoint 47.501 ms, 59.26 s total
[ 2023-10-08 01:12:31 ] Completed Epoch: 13 batch 603: moving batch data to device 10.306 ms, 59.27 s total
[ 2023-10-08 01:12:31 ] Completed Epoch: 13 batch 603: forward pass 99.718 ms, 59.37 s total
[ 2023-10-08 01:12:31 ] Completed Epoch: 13 batch 603: backward pass 57.775 ms, 59.43 s total
[ 2023-10-08 01:12:31 ] Completed Epoch: 13 batch 603: computing loss 137.764 ms, 59.56 s total
EPOCH: [13], BATCH: [603/889], loss: 0.384, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 603
[ 2023-10-08 01:12:32 ] Completed saving temp checkpoint 1,024.033 ms, 60.59 s total
[ 2023-10-08 01:12:32 ] Completed replacing temp checkpoint with checkpoint 78.165 ms, 60.67 s total
[ 2023-10-08 01:12:32 ] Completed Epoch: 13 batch 604: moving batch data to device 6.676 ms, 60.67 s total
[ 2023-10-08 01:12:33 ] Completed Epoch: 13 batch 604: forward pass 104.553 ms, 60.78 s total
[ 2023-10-08 01:12:33 ] Completed Epoch: 13 batch 604: backward pass 46.072 ms, 60.82 s total
[ 2023-10-08 01:12:33 ] Completed Epoch: 13 batch 604: computing loss 144.366 ms, 60.97 s total
EPOCH: [13], BATCH: [604/889], loss: 0.412, loss_box_reg: 0.124, loss_classifier: 0.106, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 604
[ 2023-10-08 01:12:34 ] Completed saving temp checkpoint 1,196.065 ms, 62.16 s total
[ 2023-10-08 01:12:34 ] Completed replacing temp checkpoint with checkpoint 88.165 ms, 62.25 s total
[ 2023-10-08 01:12:34 ] Completed Epoch: 13 batch 605: moving batch data to device 6.524 ms, 62.26 s total
[ 2023-10-08 01:12:34 ] Completed Epoch: 13 batch 605: forward pass 108.501 ms, 62.37 s total
[ 2023-10-08 01:12:34 ] Completed Epoch: 13 batch 605: backward pass 77.017 ms, 62.44 s total
[ 2023-10-08 01:12:34 ] Completed Epoch: 13 batch 605: computing loss 115.618 ms, 62.56 s total
EPOCH: [13], BATCH: [605/889], loss: 0.369, loss_box_reg: 0.107, loss_classifier: 0.093, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 605
[ 2023-10-08 01:12:35 ] Completed saving temp checkpoint 1,067.041 ms, 63.63 s total
[ 2023-10-08 01:12:35 ] Completed replacing temp checkpoint with checkpoint 73.829 ms, 63.70 s total
[ 2023-10-08 01:12:35 ] Completed Epoch: 13 batch 606: moving batch data to device 7.558 ms, 63.71 s total
[ 2023-10-08 01:12:36 ] Completed Epoch: 13 batch 606: forward pass 111.592 ms, 63.82 s total
[ 2023-10-08 01:12:36 ] Completed Epoch: 13 batch 606: backward pass 73.864 ms, 63.89 s total
[ 2023-10-08 01:12:36 ] Completed Epoch: 13 batch 606: computing loss 114.185 ms, 64.01 s total
EPOCH: [13], BATCH: [606/889], loss: 0.413, loss_box_reg: 0.124, loss_classifier: 0.107, loss_mask: 0.131, loss_objectness: 0.019, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 606
[ 2023-10-08 01:12:37 ] Completed saving temp checkpoint 1,599.594 ms, 65.61 s total
[ 2023-10-08 01:12:37 ] Completed replacing temp checkpoint with checkpoint 76.106 ms, 65.68 s total
[ 2023-10-08 01:12:37 ] Completed Epoch: 13 batch 607: moving batch data to device 6.000 ms, 65.69 s total
[ 2023-10-08 01:12:38 ] Completed Epoch: 13 batch 607: forward pass 110.164 ms, 65.80 s total
[ 2023-10-08 01:12:38 ] Completed Epoch: 13 batch 607: backward pass 45.679 ms, 65.84 s total
[ 2023-10-08 01:12:38 ] Completed Epoch: 13 batch 607: computing loss 152.217 ms, 66.00 s total
EPOCH: [13], BATCH: [607/889], loss: 0.422, loss_box_reg: 0.131, loss_classifier: 0.108, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 607
[ 2023-10-08 01:12:39 ] Completed saving temp checkpoint 1,673.684 ms, 67.67 s total
[ 2023-10-08 01:12:39 ] Completed replacing temp checkpoint with checkpoint 57.685 ms, 67.73 s total
[ 2023-10-08 01:12:39 ] Completed Epoch: 13 batch 608: moving batch data to device 4.807 ms, 67.73 s total
[ 2023-10-08 01:12:40 ] Completed Epoch: 13 batch 608: forward pass 98.680 ms, 67.83 s total
[ 2023-10-08 01:12:40 ] Completed Epoch: 13 batch 608: backward pass 33.910 ms, 67.87 s total
[ 2023-10-08 01:12:40 ] Completed Epoch: 13 batch 608: computing loss 150.059 ms, 68.02 s total
EPOCH: [13], BATCH: [608/889], loss: 0.365, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 608
[ 2023-10-08 01:12:41 ] Completed saving temp checkpoint 1,314.711 ms, 69.33 s total
[ 2023-10-08 01:12:41 ] Completed replacing temp checkpoint with checkpoint 53.455 ms, 69.38 s total
[ 2023-10-08 01:12:41 ] Completed Epoch: 13 batch 609: moving batch data to device 5.362 ms, 69.39 s total
[ 2023-10-08 01:12:41 ] Completed Epoch: 13 batch 609: forward pass 103.191 ms, 69.49 s total
[ 2023-10-08 01:12:41 ] Completed Epoch: 13 batch 609: backward pass 50.375 ms, 69.54 s total
[ 2023-10-08 01:12:41 ] Completed Epoch: 13 batch 609: computing loss 121.235 ms, 69.66 s total
EPOCH: [13], BATCH: [609/889], loss: 0.360, loss_box_reg: 0.111, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 609
[ 2023-10-08 01:12:42 ] Completed saving temp checkpoint 1,054.073 ms, 70.72 s total
[ 2023-10-08 01:12:43 ] Completed replacing temp checkpoint with checkpoint 65.187 ms, 70.78 s total
[ 2023-10-08 01:12:43 ] Completed Epoch: 13 batch 610: moving batch data to device 5.177 ms, 70.79 s total
[ 2023-10-08 01:12:43 ] Completed Epoch: 13 batch 610: forward pass 112.462 ms, 70.90 s total
[ 2023-10-08 01:12:43 ] Completed Epoch: 13 batch 610: backward pass 76.176 ms, 70.98 s total
[ 2023-10-08 01:12:43 ] Completed Epoch: 13 batch 610: computing loss 117.511 ms, 71.09 s total
EPOCH: [13], BATCH: [610/889], loss: 0.373, loss_box_reg: 0.111, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 610
[ 2023-10-08 01:12:44 ] Completed saving temp checkpoint 1,277.562 ms, 72.37 s total
[ 2023-10-08 01:12:44 ] Completed replacing temp checkpoint with checkpoint 56.640 ms, 72.43 s total
[ 2023-10-08 01:12:44 ] Completed Epoch: 13 batch 611: moving batch data to device 7.401 ms, 72.44 s total
[ 2023-10-08 01:12:44 ] Completed Epoch: 13 batch 611: forward pass 104.191 ms, 72.54 s total
[ 2023-10-08 01:12:44 ] Completed Epoch: 13 batch 611: backward pass 81.124 ms, 72.62 s total
[ 2023-10-08 01:12:44 ] Completed Epoch: 13 batch 611: computing loss 115.952 ms, 72.74 s total
EPOCH: [13], BATCH: [611/889], loss: 0.403, loss_box_reg: 0.123, loss_classifier: 0.102, loss_mask: 0.138, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 611
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 01:25:59 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 01:25:59 ] Completed importing Timer 0.022 ms, 0.00 s total
[ 2023-10-08 01:25:59 ] Completed importing everything else 509.578 ms, 0.51 s total
[ 2023-10-08 01:25:59 ] Completed defined other functions 0.023 ms, 0.51 s total
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 3): env://
| distributed init (rank 2): env://
| distributed init (rank 4): env://| distributed init (rank 5): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 01:26:02 ] Completed main preliminaries 3,005.805 ms, 3.52 s total
loading annotations into memory...
Done (t=11.26s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-08 01:26:15 ] Completed loading data 13,090.801 ms, 16.61 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 01:26:15 ] Completed creating data samplers 109.291 ms, 16.72 s total
[ 2023-10-08 01:26:15 ] Completed creating data loaders 0.221 ms, 16.72 s total
[ 2023-10-08 01:26:16 ] Completed creating model and .to(device) 652.185 ms, 17.37 s total
[ 2023-10-08 01:26:17 ] Completed preparing model for distributed training 1,183.692 ms, 18.55 s total
[ 2023-10-08 01:26:17 ] Completed optimizer and scaler 0.556 ms, 18.55 s total
[ 2023-10-08 01:26:17 ] Completed learning rate schedulers 0.219 ms, 18.55 s total
[ 2023-10-08 01:26:18 ] Completed init coco evaluator 956.387 ms, 19.51 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 01:26:19 ] Completed retrieving checkpoint 897.609 ms, 20.41 s total
EPOCH :: 13
[ 2023-10-08 01:26:19 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 01:26:19 ] Completed training preliminaries 1.134 ms, 0.00 s total
Training / resuming epoch 13 from training step 611
[ 2023-10-08 01:26:20 ] Completed Epoch: 13 batch 611: moving batch data to device 497.458 ms, 0.50 s total
[ 2023-10-08 01:26:21 ] Completed Epoch: 13 batch 611: forward pass 1,049.936 ms, 1.55 s total
[ 2023-10-08 01:26:21 ] Completed Epoch: 13 batch 611: backward pass 181.788 ms, 1.73 s total
[ 2023-10-08 01:26:21 ] Completed Epoch: 13 batch 611: computing loss 521.949 ms, 2.25 s total
EPOCH: [13], BATCH: [611/889], loss: 0.401, loss_box_reg: 0.122, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 611
[ 2023-10-08 01:26:22 ] Completed saving temp checkpoint 1,040.024 ms, 3.29 s total
[ 2023-10-08 01:26:23 ] Completed replacing temp checkpoint with checkpoint 185.870 ms, 3.48 s total
[ 2023-10-08 01:26:23 ] Completed Epoch: 13 batch 612: moving batch data to device 7.712 ms, 3.49 s total
[ 2023-10-08 01:26:23 ] Completed Epoch: 13 batch 612: forward pass 115.920 ms, 3.60 s total
[ 2023-10-08 01:26:23 ] Completed Epoch: 13 batch 612: backward pass 99.098 ms, 3.70 s total
[ 2023-10-08 01:26:23 ] Completed Epoch: 13 batch 612: computing loss 125.932 ms, 3.83 s total
EPOCH: [13], BATCH: [612/889], loss: 0.409, loss_box_reg: 0.124, loss_classifier: 0.104, loss_mask: 0.138, loss_objectness: 0.014, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 612
[ 2023-10-08 01:26:24 ] Completed saving temp checkpoint 926.548 ms, 4.75 s total
[ 2023-10-08 01:26:24 ] Completed replacing temp checkpoint with checkpoint 45.651 ms, 4.80 s total
[ 2023-10-08 01:26:24 ] Completed Epoch: 13 batch 613: moving batch data to device 9.551 ms, 4.81 s total
[ 2023-10-08 01:26:24 ] Completed Epoch: 13 batch 613: forward pass 112.936 ms, 4.92 s total
[ 2023-10-08 01:26:24 ] Completed Epoch: 13 batch 613: backward pass 125.017 ms, 5.05 s total
[ 2023-10-08 01:26:24 ] Completed Epoch: 13 batch 613: computing loss 91.651 ms, 5.14 s total
EPOCH: [13], BATCH: [613/889], loss: 0.408, loss_box_reg: 0.125, loss_classifier: 0.099, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 613
[ 2023-10-08 01:26:26 ] Completed saving temp checkpoint 1,321.734 ms, 6.46 s total
[ 2023-10-08 01:26:26 ] Completed replacing temp checkpoint with checkpoint 95.027 ms, 6.55 s total
[ 2023-10-08 01:26:26 ] Completed Epoch: 13 batch 614: moving batch data to device 4.308 ms, 6.56 s total
[ 2023-10-08 01:26:26 ] Completed Epoch: 13 batch 614: forward pass 196.002 ms, 6.76 s total
[ 2023-10-08 01:26:26 ] Completed Epoch: 13 batch 614: backward pass 79.354 ms, 6.83 s total
[ 2023-10-08 01:26:26 ] Completed Epoch: 13 batch 614: computing loss 183.138 ms, 7.02 s total
EPOCH: [13], BATCH: [614/889], loss: 0.358, loss_box_reg: 0.107, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 614
[ 2023-10-08 01:26:28 ] Completed saving temp checkpoint 1,595.555 ms, 8.61 s total
[ 2023-10-08 01:26:28 ] Completed replacing temp checkpoint with checkpoint 108.370 ms, 8.72 s total
[ 2023-10-08 01:26:28 ] Completed Epoch: 13 batch 615: moving batch data to device 64.252 ms, 8.79 s total
[ 2023-10-08 01:26:28 ] Completed Epoch: 13 batch 615: forward pass 110.577 ms, 8.90 s total
[ 2023-10-08 01:26:28 ] Completed Epoch: 13 batch 615: backward pass 82.601 ms, 8.98 s total
[ 2023-10-08 01:26:28 ] Completed Epoch: 13 batch 615: computing loss 134.310 ms, 9.11 s total
EPOCH: [13], BATCH: [615/889], loss: 0.392, loss_box_reg: 0.115, loss_classifier: 0.096, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 615
[ 2023-10-08 01:26:30 ] Completed saving temp checkpoint 1,661.904 ms, 10.78 s total
[ 2023-10-08 01:26:30 ] Completed replacing temp checkpoint with checkpoint 52.645 ms, 10.83 s total
[ 2023-10-08 01:26:30 ] Completed Epoch: 13 batch 616: moving batch data to device 4.553 ms, 10.83 s total
[ 2023-10-08 01:26:30 ] Completed Epoch: 13 batch 616: forward pass 113.120 ms, 10.95 s total
[ 2023-10-08 01:26:30 ] Completed Epoch: 13 batch 616: backward pass 83.281 ms, 11.03 s total
[ 2023-10-08 01:26:30 ] Completed Epoch: 13 batch 616: computing loss 126.068 ms, 11.15 s total
EPOCH: [13], BATCH: [616/889], loss: 0.350, loss_box_reg: 0.101, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.012, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 616
[ 2023-10-08 01:26:31 ] Completed saving temp checkpoint 1,155.799 ms, 12.31 s total
[ 2023-10-08 01:26:31 ] Completed replacing temp checkpoint with checkpoint 79.360 ms, 12.39 s total
[ 2023-10-08 01:26:31 ] Completed Epoch: 13 batch 617: moving batch data to device 5.408 ms, 12.40 s total
[ 2023-10-08 01:26:32 ] Completed Epoch: 13 batch 617: forward pass 109.354 ms, 12.50 s total
[ 2023-10-08 01:26:32 ] Completed Epoch: 13 batch 617: backward pass 76.533 ms, 12.58 s total
[ 2023-10-08 01:26:32 ] Completed Epoch: 13 batch 617: computing loss 117.623 ms, 12.70 s total
EPOCH: [13], BATCH: [617/889], loss: 0.365, loss_box_reg: 0.109, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 617
[ 2023-10-08 01:26:33 ] Completed saving temp checkpoint 1,271.557 ms, 13.97 s total
[ 2023-10-08 01:26:33 ] Completed replacing temp checkpoint with checkpoint 72.455 ms, 14.04 s total
[ 2023-10-08 01:26:33 ] Completed Epoch: 13 batch 618: moving batch data to device 3.491 ms, 14.05 s total
[ 2023-10-08 01:26:33 ] Completed Epoch: 13 batch 618: forward pass 110.942 ms, 14.16 s total
[ 2023-10-08 01:26:33 ] Completed Epoch: 13 batch 618: backward pass 80.904 ms, 14.24 s total
[ 2023-10-08 01:26:33 ] Completed Epoch: 13 batch 618: computing loss 136.099 ms, 14.37 s total
EPOCH: [13], BATCH: [618/889], loss: 0.395, loss_box_reg: 0.121, loss_classifier: 0.099, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 618
[ 2023-10-08 01:26:35 ] Completed saving temp checkpoint 1,150.054 ms, 15.52 s total
[ 2023-10-08 01:26:35 ] Completed replacing temp checkpoint with checkpoint 70.008 ms, 15.59 s total
[ 2023-10-08 01:26:35 ] Completed Epoch: 13 batch 619: moving batch data to device 4.430 ms, 15.60 s total
[ 2023-10-08 01:26:35 ] Completed Epoch: 13 batch 619: forward pass 100.687 ms, 15.70 s total
[ 2023-10-08 01:26:35 ] Completed Epoch: 13 batch 619: backward pass 78.697 ms, 15.78 s total
[ 2023-10-08 01:26:35 ] Completed Epoch: 13 batch 619: computing loss 110.857 ms, 15.89 s total
EPOCH: [13], BATCH: [619/889], loss: 0.386, loss_box_reg: 0.113, loss_classifier: 0.097, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 619
[ 2023-10-08 01:26:36 ] Completed saving temp checkpoint 1,282.181 ms, 17.17 s total
[ 2023-10-08 01:26:36 ] Completed replacing temp checkpoint with checkpoint 57.692 ms, 17.23 s total
[ 2023-10-08 01:26:36 ] Completed Epoch: 13 batch 620: moving batch data to device 5.290 ms, 17.23 s total
[ 2023-10-08 01:26:36 ] Completed Epoch: 13 batch 620: forward pass 102.107 ms, 17.34 s total
[ 2023-10-08 01:26:36 ] Completed Epoch: 13 batch 620: backward pass 80.741 ms, 17.42 s total
[ 2023-10-08 01:26:37 ] Completed Epoch: 13 batch 620: computing loss 102.601 ms, 17.52 s total
EPOCH: [13], BATCH: [620/889], loss: 0.379, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.130, loss_objectness: 0.018, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 620
[ 2023-10-08 01:26:38 ] Completed saving temp checkpoint 1,150.917 ms, 18.67 s total
[ 2023-10-08 01:26:38 ] Completed replacing temp checkpoint with checkpoint 51.878 ms, 18.72 s total
[ 2023-10-08 01:26:38 ] Completed Epoch: 13 batch 621: moving batch data to device 7.596 ms, 18.73 s total
[ 2023-10-08 01:26:38 ] Completed Epoch: 13 batch 621: forward pass 105.605 ms, 18.84 s total
[ 2023-10-08 01:26:38 ] Completed Epoch: 13 batch 621: backward pass 38.539 ms, 18.87 s total
[ 2023-10-08 01:26:38 ] Completed Epoch: 13 batch 621: computing loss 160.603 ms, 19.03 s total
EPOCH: [13], BATCH: [621/889], loss: 0.372, loss_box_reg: 0.112, loss_classifier: 0.095, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 621
[ 2023-10-08 01:26:39 ] Completed saving temp checkpoint 1,271.449 ms, 20.31 s total
[ 2023-10-08 01:26:39 ] Completed replacing temp checkpoint with checkpoint 66.628 ms, 20.37 s total
[ 2023-10-08 01:26:39 ] Completed Epoch: 13 batch 622: moving batch data to device 6.467 ms, 20.38 s total
[ 2023-10-08 01:26:40 ] Completed Epoch: 13 batch 622: forward pass 105.812 ms, 20.49 s total
[ 2023-10-08 01:26:40 ] Completed Epoch: 13 batch 622: backward pass 80.841 ms, 20.57 s total
[ 2023-10-08 01:26:40 ] Completed Epoch: 13 batch 622: computing loss 118.994 ms, 20.69 s total
EPOCH: [13], BATCH: [622/889], loss: 0.399, loss_box_reg: 0.122, loss_classifier: 0.098, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 622
[ 2023-10-08 01:26:41 ] Completed saving temp checkpoint 1,686.178 ms, 22.37 s total
[ 2023-10-08 01:26:41 ] Completed replacing temp checkpoint with checkpoint 69.133 ms, 22.44 s total
[ 2023-10-08 01:26:41 ] Completed Epoch: 13 batch 623: moving batch data to device 5.748 ms, 22.45 s total
[ 2023-10-08 01:26:42 ] Completed Epoch: 13 batch 623: forward pass 105.687 ms, 22.55 s total
[ 2023-10-08 01:26:42 ] Completed Epoch: 13 batch 623: backward pass 40.227 ms, 22.59 s total
[ 2023-10-08 01:26:42 ] Completed Epoch: 13 batch 623: computing loss 323.023 ms, 22.92 s total
EPOCH: [13], BATCH: [623/889], loss: 0.373, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 623
[ 2023-10-08 01:26:44 ] Completed saving temp checkpoint 1,631.117 ms, 24.55 s total
[ 2023-10-08 01:26:44 ] Completed replacing temp checkpoint with checkpoint 80.850 ms, 24.63 s total
[ 2023-10-08 01:26:44 ] Completed Epoch: 13 batch 624: moving batch data to device 4.774 ms, 24.63 s total
[ 2023-10-08 01:26:44 ] Completed Epoch: 13 batch 624: forward pass 101.878 ms, 24.73 s total
[ 2023-10-08 01:26:44 ] Completed Epoch: 13 batch 624: backward pass 81.224 ms, 24.82 s total
[ 2023-10-08 01:26:44 ] Completed Epoch: 13 batch 624: computing loss 112.844 ms, 24.93 s total
EPOCH: [13], BATCH: [624/889], loss: 0.394, loss_box_reg: 0.119, loss_classifier: 0.096, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 624
[ 2023-10-08 01:26:45 ] Completed saving temp checkpoint 1,170.873 ms, 26.10 s total
[ 2023-10-08 01:26:45 ] Completed replacing temp checkpoint with checkpoint 73.072 ms, 26.17 s total
[ 2023-10-08 01:26:45 ] Completed Epoch: 13 batch 625: moving batch data to device 9.069 ms, 26.18 s total
[ 2023-10-08 01:26:45 ] Completed Epoch: 13 batch 625: forward pass 102.342 ms, 26.28 s total
[ 2023-10-08 01:26:45 ] Completed Epoch: 13 batch 625: backward pass 83.173 ms, 26.37 s total
[ 2023-10-08 01:26:46 ] Completed Epoch: 13 batch 625: computing loss 115.859 ms, 26.48 s total
EPOCH: [13], BATCH: [625/889], loss: 0.425, loss_box_reg: 0.134, loss_classifier: 0.107, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 625
[ 2023-10-08 01:26:47 ] Completed saving temp checkpoint 1,303.777 ms, 27.79 s total
[ 2023-10-08 01:26:47 ] Completed replacing temp checkpoint with checkpoint 59.016 ms, 27.85 s total
[ 2023-10-08 01:26:47 ] Completed Epoch: 13 batch 626: moving batch data to device 7.311 ms, 27.85 s total
[ 2023-10-08 01:26:47 ] Completed Epoch: 13 batch 626: forward pass 106.887 ms, 27.96 s total
[ 2023-10-08 01:26:47 ] Completed Epoch: 13 batch 626: backward pass 76.280 ms, 28.04 s total
[ 2023-10-08 01:26:47 ] Completed Epoch: 13 batch 626: computing loss 124.348 ms, 28.16 s total
EPOCH: [13], BATCH: [626/889], loss: 0.414, loss_box_reg: 0.127, loss_classifier: 0.105, loss_mask: 0.137, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 626
[ 2023-10-08 01:26:49 ] Completed saving temp checkpoint 1,338.932 ms, 29.50 s total
[ 2023-10-08 01:26:49 ] Completed replacing temp checkpoint with checkpoint 70.294 ms, 29.57 s total
[ 2023-10-08 01:26:49 ] Completed Epoch: 13 batch 627: moving batch data to device 4.636 ms, 29.57 s total
[ 2023-10-08 01:26:49 ] Completed Epoch: 13 batch 627: forward pass 106.219 ms, 29.68 s total
[ 2023-10-08 01:26:49 ] Completed Epoch: 13 batch 627: backward pass 78.222 ms, 29.76 s total
[ 2023-10-08 01:26:49 ] Completed Epoch: 13 batch 627: computing loss 116.318 ms, 29.87 s total
EPOCH: [13], BATCH: [627/889], loss: 0.388, loss_box_reg: 0.116, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 627
[ 2023-10-08 01:26:51 ] Completed saving temp checkpoint 2,039.982 ms, 31.91 s total
[ 2023-10-08 01:26:51 ] Completed replacing temp checkpoint with checkpoint 89.102 ms, 32.00 s total
[ 2023-10-08 01:26:51 ] Completed Epoch: 13 batch 628: moving batch data to device 6.281 ms, 32.01 s total
[ 2023-10-08 01:26:51 ] Completed Epoch: 13 batch 628: forward pass 103.716 ms, 32.11 s total
[ 2023-10-08 01:26:51 ] Completed Epoch: 13 batch 628: backward pass 68.459 ms, 32.18 s total
[ 2023-10-08 01:26:51 ] Completed Epoch: 13 batch 628: computing loss 125.045 ms, 32.31 s total
EPOCH: [13], BATCH: [628/889], loss: 0.418, loss_box_reg: 0.127, loss_classifier: 0.109, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 628
[ 2023-10-08 01:26:53 ] Completed saving temp checkpoint 1,203.892 ms, 33.51 s total
[ 2023-10-08 01:26:53 ] Completed replacing temp checkpoint with checkpoint 58.238 ms, 33.57 s total
[ 2023-10-08 01:26:53 ] Completed Epoch: 13 batch 629: moving batch data to device 5.605 ms, 33.57 s total
[ 2023-10-08 01:26:53 ] Completed Epoch: 13 batch 629: forward pass 103.654 ms, 33.68 s total
[ 2023-10-08 01:26:53 ] Completed Epoch: 13 batch 629: backward pass 73.198 ms, 33.75 s total
[ 2023-10-08 01:26:53 ] Completed Epoch: 13 batch 629: computing loss 123.245 ms, 33.87 s total
EPOCH: [13], BATCH: [629/889], loss: 0.381, loss_box_reg: 0.115, loss_classifier: 0.094, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 629
[ 2023-10-08 01:26:54 ] Completed saving temp checkpoint 1,206.513 ms, 35.08 s total
[ 2023-10-08 01:26:54 ] Completed replacing temp checkpoint with checkpoint 31.893 ms, 35.11 s total
[ 2023-10-08 01:26:54 ] Completed Epoch: 13 batch 630: moving batch data to device 5.955 ms, 35.12 s total
[ 2023-10-08 01:26:54 ] Completed Epoch: 13 batch 630: forward pass 107.414 ms, 35.23 s total
[ 2023-10-08 01:26:54 ] Completed Epoch: 13 batch 630: backward pass 74.254 ms, 35.30 s total
[ 2023-10-08 01:26:54 ] Completed Epoch: 13 batch 630: computing loss 128.171 ms, 35.43 s total
EPOCH: [13], BATCH: [630/889], loss: 0.417, loss_box_reg: 0.132, loss_classifier: 0.105, loss_mask: 0.135, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 630
[ 2023-10-08 01:26:56 ] Completed saving temp checkpoint 1,114.046 ms, 36.54 s total
[ 2023-10-08 01:26:56 ] Completed replacing temp checkpoint with checkpoint 56.312 ms, 36.60 s total
[ 2023-10-08 01:26:56 ] Completed Epoch: 13 batch 631: moving batch data to device 5.980 ms, 36.61 s total
[ 2023-10-08 01:26:56 ] Completed Epoch: 13 batch 631: forward pass 99.688 ms, 36.71 s total
[ 2023-10-08 01:26:56 ] Completed Epoch: 13 batch 631: backward pass 75.215 ms, 36.78 s total
[ 2023-10-08 01:26:56 ] Completed Epoch: 13 batch 631: computing loss 96.900 ms, 36.88 s total
EPOCH: [13], BATCH: [631/889], loss: 0.409, loss_box_reg: 0.121, loss_classifier: 0.107, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 631
[ 2023-10-08 01:26:57 ] Completed saving temp checkpoint 1,219.583 ms, 38.10 s total
[ 2023-10-08 01:26:57 ] Completed replacing temp checkpoint with checkpoint 73.627 ms, 38.17 s total
[ 2023-10-08 01:26:57 ] Completed Epoch: 13 batch 632: moving batch data to device 8.713 ms, 38.18 s total
[ 2023-10-08 01:26:57 ] Completed Epoch: 13 batch 632: forward pass 101.811 ms, 38.28 s total
[ 2023-10-08 01:26:57 ] Completed Epoch: 13 batch 632: backward pass 55.110 ms, 38.34 s total
[ 2023-10-08 01:26:58 ] Completed Epoch: 13 batch 632: computing loss 143.280 ms, 38.48 s total
EPOCH: [13], BATCH: [632/889], loss: 0.373, loss_box_reg: 0.109, loss_classifier: 0.093, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 632
[ 2023-10-08 01:26:59 ] Completed saving temp checkpoint 1,032.813 ms, 39.51 s total
[ 2023-10-08 01:26:59 ] Completed replacing temp checkpoint with checkpoint 73.521 ms, 39.59 s total
[ 2023-10-08 01:26:59 ] Completed Epoch: 13 batch 633: moving batch data to device 7.703 ms, 39.59 s total
[ 2023-10-08 01:26:59 ] Completed Epoch: 13 batch 633: forward pass 105.863 ms, 39.70 s total
[ 2023-10-08 01:26:59 ] Completed Epoch: 13 batch 633: backward pass 74.569 ms, 39.77 s total
[ 2023-10-08 01:26:59 ] Completed Epoch: 13 batch 633: computing loss 119.626 ms, 39.89 s total
EPOCH: [13], BATCH: [633/889], loss: 0.398, loss_box_reg: 0.117, loss_classifier: 0.101, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 633
[ 2023-10-08 01:27:00 ] Completed saving temp checkpoint 1,122.214 ms, 41.02 s total
[ 2023-10-08 01:27:00 ] Completed replacing temp checkpoint with checkpoint 65.341 ms, 41.08 s total
[ 2023-10-08 01:27:00 ] Completed Epoch: 13 batch 634: moving batch data to device 5.258 ms, 41.09 s total
[ 2023-10-08 01:27:00 ] Completed Epoch: 13 batch 634: forward pass 106.438 ms, 41.19 s total
[ 2023-10-08 01:27:00 ] Completed Epoch: 13 batch 634: backward pass 78.260 ms, 41.27 s total
[ 2023-10-08 01:27:00 ] Completed Epoch: 13 batch 634: computing loss 111.216 ms, 41.38 s total
EPOCH: [13], BATCH: [634/889], loss: 0.356, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.127, loss_objectness: 0.013, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 634
[ 2023-10-08 01:27:02 ] Completed saving temp checkpoint 1,449.220 ms, 42.83 s total
[ 2023-10-08 01:27:02 ] Completed replacing temp checkpoint with checkpoint 58.707 ms, 42.89 s total
[ 2023-10-08 01:27:02 ] Completed Epoch: 13 batch 635: moving batch data to device 7.167 ms, 42.90 s total
[ 2023-10-08 01:27:02 ] Completed Epoch: 13 batch 635: forward pass 108.423 ms, 43.01 s total
[ 2023-10-08 01:27:02 ] Completed Epoch: 13 batch 635: backward pass 70.360 ms, 43.08 s total
[ 2023-10-08 01:27:02 ] Completed Epoch: 13 batch 635: computing loss 123.413 ms, 43.20 s total
EPOCH: [13], BATCH: [635/889], loss: 0.375, loss_box_reg: 0.117, loss_classifier: 0.092, loss_mask: 0.133, loss_objectness: 0.014, loss_rpn_box_reg: 0.019
Saving checkpoint at epoch 13 train batch 635
[ 2023-10-08 01:27:04 ] Completed saving temp checkpoint 1,978.731 ms, 45.18 s total
[ 2023-10-08 01:27:04 ] Completed replacing temp checkpoint with checkpoint 47.950 ms, 45.23 s total
[ 2023-10-08 01:27:04 ] Completed Epoch: 13 batch 636: moving batch data to device 6.907 ms, 45.23 s total
[ 2023-10-08 01:27:04 ] Completed Epoch: 13 batch 636: forward pass 107.836 ms, 45.34 s total
[ 2023-10-08 01:27:04 ] Completed Epoch: 13 batch 636: backward pass 79.521 ms, 45.42 s total
[ 2023-10-08 01:27:05 ] Completed Epoch: 13 batch 636: computing loss 119.533 ms, 45.54 s total
EPOCH: [13], BATCH: [636/889], loss: 0.394, loss_box_reg: 0.121, loss_classifier: 0.098, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 636
[ 2023-10-08 01:27:06 ] Completed saving temp checkpoint 1,089.308 ms, 46.63 s total
[ 2023-10-08 01:27:06 ] Completed replacing temp checkpoint with checkpoint 60.625 ms, 46.69 s total
[ 2023-10-08 01:27:06 ] Completed Epoch: 13 batch 637: moving batch data to device 5.020 ms, 46.69 s total
[ 2023-10-08 01:27:06 ] Completed Epoch: 13 batch 637: forward pass 105.400 ms, 46.80 s total
[ 2023-10-08 01:27:06 ] Completed Epoch: 13 batch 637: backward pass 81.529 ms, 46.88 s total
[ 2023-10-08 01:27:06 ] Completed Epoch: 13 batch 637: computing loss 114.036 ms, 47.00 s total
EPOCH: [13], BATCH: [637/889], loss: 0.372, loss_box_reg: 0.114, loss_classifier: 0.093, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 637
[ 2023-10-08 01:27:07 ] Completed saving temp checkpoint 1,155.159 ms, 48.15 s total
[ 2023-10-08 01:27:07 ] Completed replacing temp checkpoint with checkpoint 59.118 ms, 48.21 s total
[ 2023-10-08 01:27:07 ] Completed Epoch: 13 batch 638: moving batch data to device 4.769 ms, 48.21 s total
[ 2023-10-08 01:27:07 ] Completed Epoch: 13 batch 638: forward pass 103.508 ms, 48.32 s total
[ 2023-10-08 01:27:07 ] Completed Epoch: 13 batch 638: backward pass 45.875 ms, 48.36 s total
[ 2023-10-08 01:27:08 ] Completed Epoch: 13 batch 638: computing loss 153.351 ms, 48.52 s total
EPOCH: [13], BATCH: [638/889], loss: 0.361, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.011, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 638
[ 2023-10-08 01:27:09 ] Completed saving temp checkpoint 1,189.140 ms, 49.71 s total
[ 2023-10-08 01:27:09 ] Completed replacing temp checkpoint with checkpoint 62.906 ms, 49.77 s total
[ 2023-10-08 01:27:09 ] Completed Epoch: 13 batch 639: moving batch data to device 6.781 ms, 49.78 s total
[ 2023-10-08 01:27:09 ] Completed Epoch: 13 batch 639: forward pass 103.470 ms, 49.88 s total
[ 2023-10-08 01:27:09 ] Completed Epoch: 13 batch 639: backward pass 76.585 ms, 49.96 s total
[ 2023-10-08 01:27:09 ] Completed Epoch: 13 batch 639: computing loss 115.785 ms, 50.07 s total
EPOCH: [13], BATCH: [639/889], loss: 0.401, loss_box_reg: 0.121, loss_classifier: 0.100, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 639
[ 2023-10-08 01:27:11 ] Completed saving temp checkpoint 1,532.919 ms, 51.61 s total
[ 2023-10-08 01:27:11 ] Completed replacing temp checkpoint with checkpoint 83.585 ms, 51.69 s total
[ 2023-10-08 01:27:11 ] Completed Epoch: 13 batch 640: moving batch data to device 5.001 ms, 51.69 s total
[ 2023-10-08 01:27:11 ] Completed Epoch: 13 batch 640: forward pass 101.052 ms, 51.79 s total
[ 2023-10-08 01:27:11 ] Completed Epoch: 13 batch 640: backward pass 74.435 ms, 51.87 s total
[ 2023-10-08 01:27:11 ] Completed Epoch: 13 batch 640: computing loss 126.832 ms, 52.00 s total
EPOCH: [13], BATCH: [640/889], loss: 0.393, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 640
[ 2023-10-08 01:27:13 ] Completed saving temp checkpoint 1,591.709 ms, 53.59 s total
[ 2023-10-08 01:27:13 ] Completed replacing temp checkpoint with checkpoint 101.947 ms, 53.69 s total
[ 2023-10-08 01:27:13 ] Completed Epoch: 13 batch 641: moving batch data to device 6.958 ms, 53.70 s total
[ 2023-10-08 01:27:13 ] Completed Epoch: 13 batch 641: forward pass 103.299 ms, 53.80 s total
[ 2023-10-08 01:27:13 ] Completed Epoch: 13 batch 641: backward pass 42.747 ms, 53.84 s total
[ 2023-10-08 01:27:13 ] Completed Epoch: 13 batch 641: computing loss 151.900 ms, 53.99 s total
EPOCH: [13], BATCH: [641/889], loss: 0.382, loss_box_reg: 0.117, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 641
[ 2023-10-08 01:27:14 ] Completed saving temp checkpoint 1,218.723 ms, 55.21 s total
[ 2023-10-08 01:27:14 ] Completed replacing temp checkpoint with checkpoint 79.785 ms, 55.29 s total
[ 2023-10-08 01:27:14 ] Completed Epoch: 13 batch 642: moving batch data to device 8.187 ms, 55.30 s total
[ 2023-10-08 01:27:14 ] Completed Epoch: 13 batch 642: forward pass 108.115 ms, 55.41 s total
[ 2023-10-08 01:27:15 ] Completed Epoch: 13 batch 642: backward pass 49.136 ms, 55.46 s total
[ 2023-10-08 01:27:15 ] Completed Epoch: 13 batch 642: computing loss 123.960 ms, 55.58 s total
EPOCH: [13], BATCH: [642/889], loss: 0.435, loss_box_reg: 0.133, loss_classifier: 0.110, loss_mask: 0.138, loss_objectness: 0.018, loss_rpn_box_reg: 0.036
Saving checkpoint at epoch 13 train batch 642
[ 2023-10-08 01:27:16 ] Completed saving temp checkpoint 982.046 ms, 56.56 s total
[ 2023-10-08 01:27:16 ] Completed replacing temp checkpoint with checkpoint 70.208 ms, 56.63 s total
[ 2023-10-08 01:27:16 ] Completed Epoch: 13 batch 643: moving batch data to device 8.227 ms, 56.64 s total
[ 2023-10-08 01:27:16 ] Completed Epoch: 13 batch 643: forward pass 107.122 ms, 56.75 s total
[ 2023-10-08 01:27:16 ] Completed Epoch: 13 batch 643: backward pass 71.203 ms, 56.82 s total
[ 2023-10-08 01:27:16 ] Completed Epoch: 13 batch 643: computing loss 125.222 ms, 56.95 s total
EPOCH: [13], BATCH: [643/889], loss: 0.409, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.141, loss_objectness: 0.015, loss_rpn_box_reg: 0.035
Saving checkpoint at epoch 13 train batch 643
[ 2023-10-08 01:27:17 ] Completed saving temp checkpoint 1,126.647 ms, 58.07 s total
[ 2023-10-08 01:27:17 ] Completed replacing temp checkpoint with checkpoint 75.256 ms, 58.15 s total
[ 2023-10-08 01:27:17 ] Completed Epoch: 13 batch 644: moving batch data to device 8.355 ms, 58.16 s total
[ 2023-10-08 01:27:17 ] Completed Epoch: 13 batch 644: forward pass 109.206 ms, 58.27 s total
[ 2023-10-08 01:27:17 ] Completed Epoch: 13 batch 644: backward pass 39.945 ms, 58.31 s total
[ 2023-10-08 01:27:18 ] Completed Epoch: 13 batch 644: computing loss 149.148 ms, 58.46 s total
EPOCH: [13], BATCH: [644/889], loss: 0.394, loss_box_reg: 0.118, loss_classifier: 0.103, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 644
[ 2023-10-08 01:27:18 ] Completed saving temp checkpoint 970.100 ms, 59.43 s total
[ 2023-10-08 01:27:19 ] Completed replacing temp checkpoint with checkpoint 66.525 ms, 59.49 s total
[ 2023-10-08 01:27:19 ] Completed Epoch: 13 batch 645: moving batch data to device 7.772 ms, 59.50 s total
[ 2023-10-08 01:27:19 ] Completed Epoch: 13 batch 645: forward pass 106.133 ms, 59.61 s total
[ 2023-10-08 01:27:19 ] Completed Epoch: 13 batch 645: backward pass 76.767 ms, 59.68 s total
[ 2023-10-08 01:27:19 ] Completed Epoch: 13 batch 645: computing loss 114.438 ms, 59.80 s total
EPOCH: [13], BATCH: [645/889], loss: 0.386, loss_box_reg: 0.118, loss_classifier: 0.092, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 645
[ 2023-10-08 01:27:20 ] Completed saving temp checkpoint 1,147.398 ms, 60.94 s total
[ 2023-10-08 01:27:20 ] Completed replacing temp checkpoint with checkpoint 54.441 ms, 61.00 s total
[ 2023-10-08 01:27:20 ] Completed Epoch: 13 batch 646: moving batch data to device 6.480 ms, 61.01 s total
[ 2023-10-08 01:27:20 ] Completed Epoch: 13 batch 646: forward pass 103.225 ms, 61.11 s total
[ 2023-10-08 01:27:20 ] Completed Epoch: 13 batch 646: backward pass 40.453 ms, 61.15 s total
[ 2023-10-08 01:27:20 ] Completed Epoch: 13 batch 646: computing loss 128.768 ms, 61.28 s total
EPOCH: [13], BATCH: [646/889], loss: 0.387, loss_box_reg: 0.115, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 646
[ 2023-10-08 01:27:21 ] Completed saving temp checkpoint 1,164.935 ms, 62.44 s total
[ 2023-10-08 01:27:22 ] Completed replacing temp checkpoint with checkpoint 81.469 ms, 62.52 s total
[ 2023-10-08 01:27:22 ] Completed Epoch: 13 batch 647: moving batch data to device 6.675 ms, 62.53 s total
[ 2023-10-08 01:27:22 ] Completed Epoch: 13 batch 647: forward pass 112.499 ms, 62.64 s total
[ 2023-10-08 01:27:22 ] Completed Epoch: 13 batch 647: backward pass 72.278 ms, 62.72 s total
[ 2023-10-08 01:27:22 ] Completed Epoch: 13 batch 647: computing loss 125.817 ms, 62.84 s total
EPOCH: [13], BATCH: [647/889], loss: 0.378, loss_box_reg: 0.114, loss_classifier: 0.097, loss_mask: 0.130, loss_objectness: 0.012, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 647
[ 2023-10-08 01:27:23 ] Completed saving temp checkpoint 1,533.710 ms, 64.38 s total
[ 2023-10-08 01:27:24 ] Completed replacing temp checkpoint with checkpoint 74.761 ms, 64.45 s total
[ 2023-10-08 01:27:24 ] Completed Epoch: 13 batch 648: moving batch data to device 5.846 ms, 64.46 s total
[ 2023-10-08 01:27:24 ] Completed Epoch: 13 batch 648: forward pass 104.873 ms, 64.56 s total
[ 2023-10-08 01:27:24 ] Completed Epoch: 13 batch 648: backward pass 73.055 ms, 64.63 s total
[ 2023-10-08 01:27:24 ] Completed Epoch: 13 batch 648: computing loss 123.586 ms, 64.76 s total
EPOCH: [13], BATCH: [648/889], loss: 0.376, loss_box_reg: 0.111, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.019, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 648
[ 2023-10-08 01:27:26 ] Completed saving temp checkpoint 1,706.145 ms, 66.46 s total
[ 2023-10-08 01:27:26 ] Completed replacing temp checkpoint with checkpoint 92.157 ms, 66.56 s total
[ 2023-10-08 01:27:26 ] Completed Epoch: 13 batch 649: moving batch data to device 11.081 ms, 66.57 s total
[ 2023-10-08 01:27:26 ] Completed Epoch: 13 batch 649: forward pass 103.347 ms, 66.67 s total
[ 2023-10-08 01:27:26 ] Completed Epoch: 13 batch 649: backward pass 71.364 ms, 66.74 s total
[ 2023-10-08 01:27:26 ] Completed Epoch: 13 batch 649: computing loss 166.295 ms, 66.91 s total
EPOCH: [13], BATCH: [649/889], loss: 0.393, loss_box_reg: 0.113, loss_classifier: 0.096, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.037
Saving checkpoint at epoch 13 train batch 649
[ 2023-10-08 01:27:27 ] Completed saving temp checkpoint 1,181.257 ms, 68.09 s total
[ 2023-10-08 01:27:27 ] Completed replacing temp checkpoint with checkpoint 62.903 ms, 68.15 s total
[ 2023-10-08 01:27:27 ] Completed Epoch: 13 batch 650: moving batch data to device 5.299 ms, 68.16 s total
[ 2023-10-08 01:27:27 ] Completed Epoch: 13 batch 650: forward pass 103.136 ms, 68.26 s total
[ 2023-10-08 01:27:27 ] Completed Epoch: 13 batch 650: backward pass 34.361 ms, 68.29 s total
[ 2023-10-08 01:27:28 ] Completed Epoch: 13 batch 650: computing loss 165.205 ms, 68.46 s total
EPOCH: [13], BATCH: [650/889], loss: 0.372, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.020, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 650
[ 2023-10-08 01:27:28 ] Completed saving temp checkpoint 976.753 ms, 69.44 s total
[ 2023-10-08 01:27:29 ] Completed replacing temp checkpoint with checkpoint 71.827 ms, 69.51 s total
[ 2023-10-08 01:27:29 ] Completed Epoch: 13 batch 651: moving batch data to device 7.175 ms, 69.52 s total
[ 2023-10-08 01:27:29 ] Completed Epoch: 13 batch 651: forward pass 102.954 ms, 69.62 s total
[ 2023-10-08 01:27:29 ] Completed Epoch: 13 batch 651: backward pass 34.690 ms, 69.65 s total
[ 2023-10-08 01:27:29 ] Completed Epoch: 13 batch 651: computing loss 168.277 ms, 69.82 s total
EPOCH: [13], BATCH: [651/889], loss: 0.405, loss_box_reg: 0.124, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 651
[ 2023-10-08 01:27:30 ] Completed saving temp checkpoint 1,567.003 ms, 71.39 s total
[ 2023-10-08 01:27:31 ] Completed replacing temp checkpoint with checkpoint 97.423 ms, 71.49 s total
[ 2023-10-08 01:27:31 ] Completed Epoch: 13 batch 652: moving batch data to device 6.931 ms, 71.49 s total
[ 2023-10-08 01:27:31 ] Completed Epoch: 13 batch 652: forward pass 108.909 ms, 71.60 s total
[ 2023-10-08 01:27:31 ] Completed Epoch: 13 batch 652: backward pass 47.988 ms, 71.65 s total
[ 2023-10-08 01:27:31 ] Completed Epoch: 13 batch 652: computing loss 147.720 ms, 71.80 s total
EPOCH: [13], BATCH: [652/889], loss: 0.382, loss_box_reg: 0.115, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 652
[ 2023-10-08 01:27:32 ] Completed saving temp checkpoint 1,358.632 ms, 73.16 s total
[ 2023-10-08 01:27:32 ] Completed replacing temp checkpoint with checkpoint 75.351 ms, 73.23 s total
[ 2023-10-08 01:27:32 ] Completed Epoch: 13 batch 653: moving batch data to device 7.899 ms, 73.24 s total
[ 2023-10-08 01:27:32 ] Completed Epoch: 13 batch 653: forward pass 106.045 ms, 73.35 s total
[ 2023-10-08 01:27:32 ] Completed Epoch: 13 batch 653: backward pass 76.226 ms, 73.42 s total
[ 2023-10-08 01:27:33 ] Completed Epoch: 13 batch 653: computing loss 116.567 ms, 73.54 s total
EPOCH: [13], BATCH: [653/889], loss: 0.398, loss_box_reg: 0.121, loss_classifier: 0.100, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 653
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 01:40:57 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 01:40:57 ] Completed importing Timer 0.021 ms, 0.00 s total
[ 2023-10-08 01:40:58 ] Completed importing everything else 554.587 ms, 0.55 s total
[ 2023-10-08 01:40:58 ] Completed defined other functions 0.021 ms, 0.55 s total
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 5): env://
| distributed init (rank 0): env://
| distributed init (rank 1): env://
| distributed init (rank 4): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 01:41:05 ] Completed main preliminaries 7,827.309 ms, 8.38 s total
loading annotations into memory...
Done (t=10.77s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-08 01:41:18 ] Completed loading data 12,577.176 ms, 20.96 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 01:41:18 ] Completed creating data samplers 105.694 ms, 21.06 s total
[ 2023-10-08 01:41:18 ] Completed creating data loaders 0.222 ms, 21.07 s total
[ 2023-10-08 01:41:19 ] Completed creating model and .to(device) 652.205 ms, 21.72 s total
[ 2023-10-08 01:41:20 ] Completed preparing model for distributed training 1,680.751 ms, 23.40 s total
[ 2023-10-08 01:41:20 ] Completed optimizer and scaler 0.622 ms, 23.40 s total
[ 2023-10-08 01:41:20 ] Completed learning rate schedulers 0.247 ms, 23.40 s total
[ 2023-10-08 01:41:21 ] Completed init coco evaluator 960.240 ms, 24.36 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 01:41:22 ] Completed retrieving checkpoint 875.605 ms, 25.23 s total
EPOCH :: 13
[ 2023-10-08 01:41:22 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 01:41:22 ] Completed training preliminaries 0.851 ms, 0.00 s total
Training / resuming epoch 13 from training step 653
[ 2023-10-08 01:41:23 ] Completed Epoch: 13 batch 653: moving batch data to device 444.854 ms, 0.45 s total
[ 2023-10-08 01:41:24 ] Completed Epoch: 13 batch 653: forward pass 1,278.660 ms, 1.72 s total
[ 2023-10-08 01:41:24 ] Completed Epoch: 13 batch 653: backward pass 172.484 ms, 1.90 s total
[ 2023-10-08 01:41:25 ] Completed Epoch: 13 batch 653: computing loss 440.910 ms, 2.34 s total
EPOCH: [13], BATCH: [653/889], loss: 0.396, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 653
[ 2023-10-08 01:41:25 ] Completed saving temp checkpoint 947.700 ms, 3.29 s total
[ 2023-10-08 01:41:26 ] Completed replacing temp checkpoint with checkpoint 153.532 ms, 3.44 s total
[ 2023-10-08 01:41:26 ] Completed Epoch: 13 batch 654: moving batch data to device 3.544 ms, 3.44 s total
[ 2023-10-08 01:41:26 ] Completed Epoch: 13 batch 654: forward pass 174.938 ms, 3.62 s total
[ 2023-10-08 01:41:26 ] Completed Epoch: 13 batch 654: backward pass 69.700 ms, 3.69 s total
[ 2023-10-08 01:41:26 ] Completed Epoch: 13 batch 654: computing loss 217.201 ms, 3.90 s total
EPOCH: [13], BATCH: [654/889], loss: 0.375, loss_box_reg: 0.111, loss_classifier: 0.100, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 654
[ 2023-10-08 01:41:27 ] Completed saving temp checkpoint 1,100.004 ms, 5.00 s total
[ 2023-10-08 01:41:27 ] Completed replacing temp checkpoint with checkpoint 74.076 ms, 5.08 s total
[ 2023-10-08 01:41:27 ] Completed Epoch: 13 batch 655: moving batch data to device 49.885 ms, 5.13 s total
[ 2023-10-08 01:41:27 ] Completed Epoch: 13 batch 655: forward pass 122.197 ms, 5.25 s total
[ 2023-10-08 01:41:28 ] Completed Epoch: 13 batch 655: backward pass 92.052 ms, 5.34 s total
[ 2023-10-08 01:41:28 ] Completed Epoch: 13 batch 655: computing loss 128.587 ms, 5.47 s total
EPOCH: [13], BATCH: [655/889], loss: 0.358, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.120, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 655
[ 2023-10-08 01:41:29 ] Completed saving temp checkpoint 971.711 ms, 6.44 s total
[ 2023-10-08 01:41:29 ] Completed replacing temp checkpoint with checkpoint 34.083 ms, 6.48 s total
[ 2023-10-08 01:41:29 ] Completed Epoch: 13 batch 656: moving batch data to device 2.167 ms, 6.48 s total
[ 2023-10-08 01:41:29 ] Completed Epoch: 13 batch 656: forward pass 210.397 ms, 6.69 s total
[ 2023-10-08 01:41:29 ] Completed Epoch: 13 batch 656: backward pass 79.622 ms, 6.77 s total
[ 2023-10-08 01:41:29 ] Completed Epoch: 13 batch 656: computing loss 133.401 ms, 6.90 s total
EPOCH: [13], BATCH: [656/889], loss: 0.400, loss_box_reg: 0.119, loss_classifier: 0.106, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 656
[ 2023-10-08 01:41:30 ] Completed saving temp checkpoint 942.621 ms, 7.85 s total
[ 2023-10-08 01:41:30 ] Completed replacing temp checkpoint with checkpoint 45.975 ms, 7.89 s total
[ 2023-10-08 01:41:30 ] Completed Epoch: 13 batch 657: moving batch data to device 8.210 ms, 7.90 s total
[ 2023-10-08 01:41:30 ] Completed Epoch: 13 batch 657: forward pass 104.455 ms, 8.00 s total
[ 2023-10-08 01:41:30 ] Completed Epoch: 13 batch 657: backward pass 38.325 ms, 8.04 s total
[ 2023-10-08 01:41:30 ] Completed Epoch: 13 batch 657: computing loss 177.051 ms, 8.22 s total
EPOCH: [13], BATCH: [657/889], loss: 0.380, loss_box_reg: 0.116, loss_classifier: 0.097, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 657
[ 2023-10-08 01:41:31 ] Completed saving temp checkpoint 784.786 ms, 9.00 s total
[ 2023-10-08 01:41:31 ] Completed replacing temp checkpoint with checkpoint 68.837 ms, 9.07 s total
[ 2023-10-08 01:41:31 ] Completed Epoch: 13 batch 658: moving batch data to device 4.648 ms, 9.08 s total
[ 2023-10-08 01:41:31 ] Completed Epoch: 13 batch 658: forward pass 111.186 ms, 9.19 s total
[ 2023-10-08 01:41:31 ] Completed Epoch: 13 batch 658: backward pass 82.817 ms, 9.27 s total
[ 2023-10-08 01:41:32 ] Completed Epoch: 13 batch 658: computing loss 240.436 ms, 9.51 s total
EPOCH: [13], BATCH: [658/889], loss: 0.402, loss_box_reg: 0.122, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 658
[ 2023-10-08 01:41:33 ] Completed saving temp checkpoint 818.963 ms, 10.33 s total
[ 2023-10-08 01:41:33 ] Completed replacing temp checkpoint with checkpoint 60.273 ms, 10.39 s total
[ 2023-10-08 01:41:33 ] Completed Epoch: 13 batch 659: moving batch data to device 2.434 ms, 10.39 s total
[ 2023-10-08 01:41:33 ] Completed Epoch: 13 batch 659: forward pass 100.733 ms, 10.49 s total
[ 2023-10-08 01:41:33 ] Completed Epoch: 13 batch 659: backward pass 47.044 ms, 10.54 s total
[ 2023-10-08 01:41:33 ] Completed Epoch: 13 batch 659: computing loss 160.422 ms, 10.70 s total
EPOCH: [13], BATCH: [659/889], loss: 0.399, loss_box_reg: 0.120, loss_classifier: 0.096, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 659
[ 2023-10-08 01:41:34 ] Completed saving temp checkpoint 1,434.919 ms, 12.14 s total
[ 2023-10-08 01:41:34 ] Completed replacing temp checkpoint with checkpoint 97.156 ms, 12.23 s total
[ 2023-10-08 01:41:34 ] Completed Epoch: 13 batch 660: moving batch data to device 10.033 ms, 12.24 s total
[ 2023-10-08 01:41:35 ] Completed Epoch: 13 batch 660: forward pass 111.820 ms, 12.36 s total
[ 2023-10-08 01:41:35 ] Completed Epoch: 13 batch 660: backward pass 73.202 ms, 12.43 s total
[ 2023-10-08 01:41:35 ] Completed Epoch: 13 batch 660: computing loss 124.009 ms, 12.55 s total
EPOCH: [13], BATCH: [660/889], loss: 0.393, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 660
[ 2023-10-08 01:41:37 ] Completed saving temp checkpoint 1,827.780 ms, 14.38 s total
[ 2023-10-08 01:41:37 ] Completed replacing temp checkpoint with checkpoint 99.834 ms, 14.48 s total
[ 2023-10-08 01:41:37 ] Completed Epoch: 13 batch 661: moving batch data to device 2.971 ms, 14.48 s total
[ 2023-10-08 01:41:37 ] Completed Epoch: 13 batch 661: forward pass 104.069 ms, 14.59 s total
[ 2023-10-08 01:41:37 ] Completed Epoch: 13 batch 661: backward pass 84.118 ms, 14.67 s total
[ 2023-10-08 01:41:37 ] Completed Epoch: 13 batch 661: computing loss 108.580 ms, 14.78 s total
EPOCH: [13], BATCH: [661/889], loss: 0.398, loss_box_reg: 0.119, loss_classifier: 0.104, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 661
[ 2023-10-08 01:41:39 ] Completed saving temp checkpoint 1,712.442 ms, 16.49 s total
[ 2023-10-08 01:41:39 ] Completed replacing temp checkpoint with checkpoint 59.641 ms, 16.55 s total
[ 2023-10-08 01:41:39 ] Completed Epoch: 13 batch 662: moving batch data to device 6.984 ms, 16.56 s total
[ 2023-10-08 01:41:39 ] Completed Epoch: 13 batch 662: forward pass 106.922 ms, 16.67 s total
[ 2023-10-08 01:41:39 ] Completed Epoch: 13 batch 662: backward pass 69.469 ms, 16.74 s total
[ 2023-10-08 01:41:39 ] Completed Epoch: 13 batch 662: computing loss 119.709 ms, 16.86 s total
EPOCH: [13], BATCH: [662/889], loss: 0.383, loss_box_reg: 0.110, loss_classifier: 0.095, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 662
[ 2023-10-08 01:41:41 ] Completed saving temp checkpoint 1,648.998 ms, 18.50 s total
[ 2023-10-08 01:41:41 ] Completed replacing temp checkpoint with checkpoint 68.605 ms, 18.57 s total
[ 2023-10-08 01:41:41 ] Completed Epoch: 13 batch 663: moving batch data to device 8.329 ms, 18.58 s total
[ 2023-10-08 01:41:41 ] Completed Epoch: 13 batch 663: forward pass 109.012 ms, 18.69 s total
[ 2023-10-08 01:41:41 ] Completed Epoch: 13 batch 663: backward pass 38.515 ms, 18.73 s total
[ 2023-10-08 01:41:41 ] Completed Epoch: 13 batch 663: computing loss 160.636 ms, 18.89 s total
EPOCH: [13], BATCH: [663/889], loss: 0.387, loss_box_reg: 0.119, loss_classifier: 0.101, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 663
[ 2023-10-08 01:41:42 ] Completed saving temp checkpoint 1,071.451 ms, 19.96 s total
[ 2023-10-08 01:41:42 ] Completed replacing temp checkpoint with checkpoint 71.871 ms, 20.03 s total
[ 2023-10-08 01:41:42 ] Completed Epoch: 13 batch 664: moving batch data to device 8.385 ms, 20.04 s total
[ 2023-10-08 01:41:42 ] Completed Epoch: 13 batch 664: forward pass 105.134 ms, 20.15 s total
[ 2023-10-08 01:41:42 ] Completed Epoch: 13 batch 664: backward pass 74.512 ms, 20.22 s total
[ 2023-10-08 01:41:43 ] Completed Epoch: 13 batch 664: computing loss 122.502 ms, 20.34 s total
EPOCH: [13], BATCH: [664/889], loss: 0.356, loss_box_reg: 0.105, loss_classifier: 0.093, loss_mask: 0.128, loss_objectness: 0.011, loss_rpn_box_reg: 0.019
Saving checkpoint at epoch 13 train batch 664
[ 2023-10-08 01:41:44 ] Completed saving temp checkpoint 1,198.784 ms, 21.54 s total
[ 2023-10-08 01:41:44 ] Completed replacing temp checkpoint with checkpoint 46.166 ms, 21.59 s total
[ 2023-10-08 01:41:44 ] Completed Epoch: 13 batch 665: moving batch data to device 5.238 ms, 21.59 s total
[ 2023-10-08 01:41:44 ] Completed Epoch: 13 batch 665: forward pass 106.567 ms, 21.70 s total
[ 2023-10-08 01:41:44 ] Completed Epoch: 13 batch 665: backward pass 78.298 ms, 21.78 s total
[ 2023-10-08 01:41:44 ] Completed Epoch: 13 batch 665: computing loss 121.125 ms, 21.90 s total
EPOCH: [13], BATCH: [665/889], loss: 0.392, loss_box_reg: 0.116, loss_classifier: 0.101, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 665
[ 2023-10-08 01:41:45 ] Completed saving temp checkpoint 1,030.479 ms, 22.93 s total
[ 2023-10-08 01:41:45 ] Completed replacing temp checkpoint with checkpoint 72.631 ms, 23.00 s total
[ 2023-10-08 01:41:45 ] Completed Epoch: 13 batch 666: moving batch data to device 9.490 ms, 23.01 s total
[ 2023-10-08 01:41:45 ] Completed Epoch: 13 batch 666: forward pass 107.482 ms, 23.12 s total
[ 2023-10-08 01:41:45 ] Completed Epoch: 13 batch 666: backward pass 72.195 ms, 23.19 s total
[ 2023-10-08 01:41:46 ] Completed Epoch: 13 batch 666: computing loss 299.857 ms, 23.49 s total
EPOCH: [13], BATCH: [666/889], loss: 0.433, loss_box_reg: 0.134, loss_classifier: 0.117, loss_mask: 0.139, loss_objectness: 0.019, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 666
[ 2023-10-08 01:41:47 ] Completed saving temp checkpoint 1,111.747 ms, 24.60 s total
[ 2023-10-08 01:41:47 ] Completed replacing temp checkpoint with checkpoint 63.096 ms, 24.67 s total
[ 2023-10-08 01:41:47 ] Completed Epoch: 13 batch 667: moving batch data to device 7.458 ms, 24.67 s total
[ 2023-10-08 01:41:47 ] Completed Epoch: 13 batch 667: forward pass 108.927 ms, 24.78 s total
[ 2023-10-08 01:41:47 ] Completed Epoch: 13 batch 667: backward pass 32.682 ms, 24.82 s total
[ 2023-10-08 01:41:47 ] Completed Epoch: 13 batch 667: computing loss 158.988 ms, 24.97 s total
EPOCH: [13], BATCH: [667/889], loss: 0.401, loss_box_reg: 0.120, loss_classifier: 0.098, loss_mask: 0.135, loss_objectness: 0.019, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 667
[ 2023-10-08 01:41:48 ] Completed saving temp checkpoint 1,016.217 ms, 25.99 s total
[ 2023-10-08 01:41:48 ] Completed replacing temp checkpoint with checkpoint 70.311 ms, 26.06 s total
[ 2023-10-08 01:41:48 ] Completed Epoch: 13 batch 668: moving batch data to device 7.867 ms, 26.07 s total
[ 2023-10-08 01:41:48 ] Completed Epoch: 13 batch 668: forward pass 102.700 ms, 26.17 s total
[ 2023-10-08 01:41:48 ] Completed Epoch: 13 batch 668: backward pass 78.047 ms, 26.25 s total
[ 2023-10-08 01:41:49 ] Completed Epoch: 13 batch 668: computing loss 128.564 ms, 26.38 s total
EPOCH: [13], BATCH: [668/889], loss: 0.362, loss_box_reg: 0.104, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 668
[ 2023-10-08 01:41:50 ] Completed saving temp checkpoint 1,069.003 ms, 27.45 s total
[ 2023-10-08 01:41:50 ] Completed replacing temp checkpoint with checkpoint 43.315 ms, 27.49 s total
[ 2023-10-08 01:41:50 ] Completed Epoch: 13 batch 669: moving batch data to device 8.262 ms, 27.50 s total
[ 2023-10-08 01:41:50 ] Completed Epoch: 13 batch 669: forward pass 108.305 ms, 27.61 s total
[ 2023-10-08 01:41:50 ] Completed Epoch: 13 batch 669: backward pass 78.412 ms, 27.69 s total
[ 2023-10-08 01:41:50 ] Completed Epoch: 13 batch 669: computing loss 116.174 ms, 27.80 s total
EPOCH: [13], BATCH: [669/889], loss: 0.393, loss_box_reg: 0.121, loss_classifier: 0.102, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 669
[ 2023-10-08 01:41:51 ] Completed saving temp checkpoint 995.723 ms, 28.80 s total
[ 2023-10-08 01:41:51 ] Completed replacing temp checkpoint with checkpoint 46.502 ms, 28.84 s total
[ 2023-10-08 01:41:51 ] Completed Epoch: 13 batch 670: moving batch data to device 7.742 ms, 28.85 s total
[ 2023-10-08 01:41:51 ] Completed Epoch: 13 batch 670: forward pass 123.522 ms, 28.98 s total
[ 2023-10-08 01:41:51 ] Completed Epoch: 13 batch 670: backward pass 47.861 ms, 29.02 s total
[ 2023-10-08 01:41:51 ] Completed Epoch: 13 batch 670: computing loss 146.909 ms, 29.17 s total
EPOCH: [13], BATCH: [670/889], loss: 0.357, loss_box_reg: 0.107, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 670
[ 2023-10-08 01:41:52 ] Completed saving temp checkpoint 1,040.632 ms, 30.21 s total
[ 2023-10-08 01:41:52 ] Completed replacing temp checkpoint with checkpoint 61.298 ms, 30.27 s total
[ 2023-10-08 01:41:52 ] Completed Epoch: 13 batch 671: moving batch data to device 7.500 ms, 30.28 s total
[ 2023-10-08 01:41:53 ] Completed Epoch: 13 batch 671: forward pass 105.598 ms, 30.39 s total
[ 2023-10-08 01:41:53 ] Completed Epoch: 13 batch 671: backward pass 74.517 ms, 30.46 s total
[ 2023-10-08 01:41:53 ] Completed Epoch: 13 batch 671: computing loss 112.008 ms, 30.57 s total
EPOCH: [13], BATCH: [671/889], loss: 0.390, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 671
[ 2023-10-08 01:41:54 ] Completed saving temp checkpoint 973.233 ms, 31.54 s total
[ 2023-10-08 01:41:54 ] Completed replacing temp checkpoint with checkpoint 47.194 ms, 31.59 s total
[ 2023-10-08 01:41:54 ] Completed Epoch: 13 batch 672: moving batch data to device 7.101 ms, 31.60 s total
[ 2023-10-08 01:41:54 ] Completed Epoch: 13 batch 672: forward pass 105.263 ms, 31.70 s total
[ 2023-10-08 01:41:54 ] Completed Epoch: 13 batch 672: backward pass 33.876 ms, 31.74 s total
[ 2023-10-08 01:41:54 ] Completed Epoch: 13 batch 672: computing loss 136.171 ms, 31.87 s total
EPOCH: [13], BATCH: [672/889], loss: 0.391, loss_box_reg: 0.120, loss_classifier: 0.101, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 672
[ 2023-10-08 01:41:56 ] Completed saving temp checkpoint 1,654.808 ms, 33.53 s total
[ 2023-10-08 01:41:56 ] Completed replacing temp checkpoint with checkpoint 45.881 ms, 33.58 s total
[ 2023-10-08 01:41:56 ] Completed Epoch: 13 batch 673: moving batch data to device 7.221 ms, 33.58 s total
[ 2023-10-08 01:41:56 ] Completed Epoch: 13 batch 673: forward pass 110.193 ms, 33.69 s total
[ 2023-10-08 01:41:56 ] Completed Epoch: 13 batch 673: backward pass 80.402 ms, 33.77 s total
[ 2023-10-08 01:41:56 ] Completed Epoch: 13 batch 673: computing loss 110.067 ms, 33.88 s total
EPOCH: [13], BATCH: [673/889], loss: 0.409, loss_box_reg: 0.123, loss_classifier: 0.105, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 673
[ 2023-10-08 01:41:57 ] Completed saving temp checkpoint 1,313.587 ms, 35.20 s total
[ 2023-10-08 01:41:57 ] Completed replacing temp checkpoint with checkpoint 87.593 ms, 35.28 s total
[ 2023-10-08 01:41:57 ] Completed Epoch: 13 batch 674: moving batch data to device 6.822 ms, 35.29 s total
[ 2023-10-08 01:41:58 ] Completed Epoch: 13 batch 674: forward pass 108.130 ms, 35.40 s total
[ 2023-10-08 01:41:58 ] Completed Epoch: 13 batch 674: backward pass 69.549 ms, 35.47 s total
[ 2023-10-08 01:41:58 ] Completed Epoch: 13 batch 674: computing loss 124.153 ms, 35.59 s total
EPOCH: [13], BATCH: [674/889], loss: 0.392, loss_box_reg: 0.117, loss_classifier: 0.102, loss_mask: 0.137, loss_objectness: 0.016, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 674
[ 2023-10-08 01:41:59 ] Completed saving temp checkpoint 1,167.439 ms, 36.76 s total
[ 2023-10-08 01:41:59 ] Completed replacing temp checkpoint with checkpoint 81.407 ms, 36.84 s total
[ 2023-10-08 01:41:59 ] Completed Epoch: 13 batch 675: moving batch data to device 6.320 ms, 36.85 s total
[ 2023-10-08 01:41:59 ] Completed Epoch: 13 batch 675: forward pass 100.317 ms, 36.95 s total
[ 2023-10-08 01:41:59 ] Completed Epoch: 13 batch 675: backward pass 73.816 ms, 37.02 s total
[ 2023-10-08 01:41:59 ] Completed Epoch: 13 batch 675: computing loss 108.512 ms, 37.13 s total
EPOCH: [13], BATCH: [675/889], loss: 0.386, loss_box_reg: 0.116, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 675
[ 2023-10-08 01:42:00 ] Completed saving temp checkpoint 995.105 ms, 38.13 s total
[ 2023-10-08 01:42:00 ] Completed replacing temp checkpoint with checkpoint 51.829 ms, 38.18 s total
[ 2023-10-08 01:42:00 ] Completed Epoch: 13 batch 676: moving batch data to device 6.515 ms, 38.18 s total
[ 2023-10-08 01:42:00 ] Completed Epoch: 13 batch 676: forward pass 103.576 ms, 38.29 s total
[ 2023-10-08 01:42:01 ] Completed Epoch: 13 batch 676: backward pass 79.647 ms, 38.37 s total
[ 2023-10-08 01:42:01 ] Completed Epoch: 13 batch 676: computing loss 112.609 ms, 38.48 s total
EPOCH: [13], BATCH: [676/889], loss: 0.409, loss_box_reg: 0.126, loss_classifier: 0.100, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 676
[ 2023-10-08 01:42:02 ] Completed saving temp checkpoint 1,182.308 ms, 39.66 s total
[ 2023-10-08 01:42:02 ] Completed replacing temp checkpoint with checkpoint 74.461 ms, 39.74 s total
[ 2023-10-08 01:42:02 ] Completed Epoch: 13 batch 677: moving batch data to device 6.410 ms, 39.74 s total
[ 2023-10-08 01:42:02 ] Completed Epoch: 13 batch 677: forward pass 106.038 ms, 39.85 s total
[ 2023-10-08 01:42:02 ] Completed Epoch: 13 batch 677: backward pass 77.005 ms, 39.93 s total
[ 2023-10-08 01:42:02 ] Completed Epoch: 13 batch 677: computing loss 98.853 ms, 40.02 s total
EPOCH: [13], BATCH: [677/889], loss: 0.346, loss_box_reg: 0.098, loss_classifier: 0.085, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 677
[ 2023-10-08 01:42:03 ] Completed saving temp checkpoint 952.381 ms, 40.98 s total
[ 2023-10-08 01:42:03 ] Completed replacing temp checkpoint with checkpoint 68.040 ms, 41.05 s total
[ 2023-10-08 01:42:03 ] Completed Epoch: 13 batch 678: moving batch data to device 6.858 ms, 41.05 s total
[ 2023-10-08 01:42:03 ] Completed Epoch: 13 batch 678: forward pass 101.482 ms, 41.15 s total
[ 2023-10-08 01:42:03 ] Completed Epoch: 13 batch 678: backward pass 78.074 ms, 41.23 s total
[ 2023-10-08 01:42:04 ] Completed Epoch: 13 batch 678: computing loss 117.767 ms, 41.35 s total
EPOCH: [13], BATCH: [678/889], loss: 0.393, loss_box_reg: 0.113, loss_classifier: 0.098, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 678
[ 2023-10-08 01:42:05 ] Completed saving temp checkpoint 1,109.014 ms, 42.46 s total
[ 2023-10-08 01:42:05 ] Completed replacing temp checkpoint with checkpoint 30.677 ms, 42.49 s total
[ 2023-10-08 01:42:05 ] Completed Epoch: 13 batch 679: moving batch data to device 5.161 ms, 42.49 s total
[ 2023-10-08 01:42:05 ] Completed Epoch: 13 batch 679: forward pass 106.157 ms, 42.60 s total
[ 2023-10-08 01:42:05 ] Completed Epoch: 13 batch 679: backward pass 41.448 ms, 42.64 s total
[ 2023-10-08 01:42:05 ] Completed Epoch: 13 batch 679: computing loss 156.883 ms, 42.80 s total
EPOCH: [13], BATCH: [679/889], loss: 0.372, loss_box_reg: 0.107, loss_classifier: 0.098, loss_mask: 0.125, loss_objectness: 0.018, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 679
[ 2023-10-08 01:42:07 ] Completed saving temp checkpoint 1,687.780 ms, 44.49 s total
[ 2023-10-08 01:42:07 ] Completed replacing temp checkpoint with checkpoint 40.496 ms, 44.53 s total
[ 2023-10-08 01:42:07 ] Completed Epoch: 13 batch 680: moving batch data to device 6.126 ms, 44.53 s total
[ 2023-10-08 01:42:07 ] Completed Epoch: 13 batch 680: forward pass 105.650 ms, 44.64 s total
[ 2023-10-08 01:42:07 ] Completed Epoch: 13 batch 680: backward pass 79.108 ms, 44.72 s total
[ 2023-10-08 01:42:07 ] Completed Epoch: 13 batch 680: computing loss 90.343 ms, 44.81 s total
EPOCH: [13], BATCH: [680/889], loss: 0.378, loss_box_reg: 0.113, loss_classifier: 0.097, loss_mask: 0.124, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 680
[ 2023-10-08 01:42:09 ] Completed saving temp checkpoint 1,771.296 ms, 46.58 s total
[ 2023-10-08 01:42:09 ] Completed replacing temp checkpoint with checkpoint 43.076 ms, 46.62 s total
[ 2023-10-08 01:42:09 ] Completed Epoch: 13 batch 681: moving batch data to device 6.770 ms, 46.63 s total
[ 2023-10-08 01:42:09 ] Completed Epoch: 13 batch 681: forward pass 107.463 ms, 46.74 s total
[ 2023-10-08 01:42:09 ] Completed Epoch: 13 batch 681: backward pass 94.715 ms, 46.83 s total
[ 2023-10-08 01:42:09 ] Completed Epoch: 13 batch 681: computing loss 109.641 ms, 46.94 s total
EPOCH: [13], BATCH: [681/889], loss: 0.399, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 681
[ 2023-10-08 01:42:10 ] Completed saving temp checkpoint 1,016.200 ms, 47.96 s total
[ 2023-10-08 01:42:10 ] Completed replacing temp checkpoint with checkpoint 72.297 ms, 48.03 s total
[ 2023-10-08 01:42:10 ] Completed Epoch: 13 batch 682: moving batch data to device 6.808 ms, 48.04 s total
[ 2023-10-08 01:42:10 ] Completed Epoch: 13 batch 682: forward pass 107.287 ms, 48.14 s total
[ 2023-10-08 01:42:10 ] Completed Epoch: 13 batch 682: backward pass 77.950 ms, 48.22 s total
[ 2023-10-08 01:42:11 ] Completed Epoch: 13 batch 682: computing loss 106.716 ms, 48.33 s total
EPOCH: [13], BATCH: [682/889], loss: 0.448, loss_box_reg: 0.142, loss_classifier: 0.116, loss_mask: 0.140, loss_objectness: 0.017, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 682
[ 2023-10-08 01:42:12 ] Completed saving temp checkpoint 1,075.568 ms, 49.40 s total
[ 2023-10-08 01:42:12 ] Completed replacing temp checkpoint with checkpoint 53.984 ms, 49.46 s total
[ 2023-10-08 01:42:12 ] Completed Epoch: 13 batch 683: moving batch data to device 5.938 ms, 49.46 s total
[ 2023-10-08 01:42:12 ] Completed Epoch: 13 batch 683: forward pass 101.217 ms, 49.57 s total
[ 2023-10-08 01:42:12 ] Completed Epoch: 13 batch 683: backward pass 53.267 ms, 49.62 s total
[ 2023-10-08 01:42:12 ] Completed Epoch: 13 batch 683: computing loss 139.362 ms, 49.76 s total
EPOCH: [13], BATCH: [683/889], loss: 0.380, loss_box_reg: 0.116, loss_classifier: 0.102, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 683
[ 2023-10-08 01:42:13 ] Completed saving temp checkpoint 1,195.542 ms, 50.95 s total
[ 2023-10-08 01:42:13 ] Completed replacing temp checkpoint with checkpoint 80.424 ms, 51.03 s total
[ 2023-10-08 01:42:13 ] Completed Epoch: 13 batch 684: moving batch data to device 5.885 ms, 51.04 s total
[ 2023-10-08 01:42:13 ] Completed Epoch: 13 batch 684: forward pass 109.662 ms, 51.15 s total
[ 2023-10-08 01:42:13 ] Completed Epoch: 13 batch 684: backward pass 80.762 ms, 51.23 s total
[ 2023-10-08 01:42:14 ] Completed Epoch: 13 batch 684: computing loss 88.349 ms, 51.32 s total
EPOCH: [13], BATCH: [684/889], loss: 0.381, loss_box_reg: 0.114, loss_classifier: 0.094, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 684
[ 2023-10-08 01:42:15 ] Completed saving temp checkpoint 1,083.350 ms, 52.40 s total
[ 2023-10-08 01:42:15 ] Completed replacing temp checkpoint with checkpoint 78.441 ms, 52.48 s total
[ 2023-10-08 01:42:15 ] Completed Epoch: 13 batch 685: moving batch data to device 7.356 ms, 52.49 s total
[ 2023-10-08 01:42:15 ] Completed Epoch: 13 batch 685: forward pass 100.686 ms, 52.59 s total
[ 2023-10-08 01:42:15 ] Completed Epoch: 13 batch 685: backward pass 55.602 ms, 52.64 s total
[ 2023-10-08 01:42:15 ] Completed Epoch: 13 batch 685: computing loss 134.624 ms, 52.78 s total
EPOCH: [13], BATCH: [685/889], loss: 0.414, loss_box_reg: 0.118, loss_classifier: 0.113, loss_mask: 0.132, loss_objectness: 0.019, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 685
[ 2023-10-08 01:42:16 ] Completed saving temp checkpoint 1,007.810 ms, 53.79 s total
[ 2023-10-08 01:42:16 ] Completed replacing temp checkpoint with checkpoint 74.274 ms, 53.86 s total
[ 2023-10-08 01:42:16 ] Completed Epoch: 13 batch 686: moving batch data to device 6.450 ms, 53.87 s total
[ 2023-10-08 01:42:16 ] Completed Epoch: 13 batch 686: forward pass 104.544 ms, 53.97 s total
[ 2023-10-08 01:42:16 ] Completed Epoch: 13 batch 686: backward pass 65.891 ms, 54.04 s total
[ 2023-10-08 01:42:16 ] Completed Epoch: 13 batch 686: computing loss 131.755 ms, 54.17 s total
EPOCH: [13], BATCH: [686/889], loss: 0.409, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.136, loss_objectness: 0.016, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 686
[ 2023-10-08 01:42:18 ] Completed saving temp checkpoint 1,589.437 ms, 55.76 s total
[ 2023-10-08 01:42:18 ] Completed replacing temp checkpoint with checkpoint 77.717 ms, 55.84 s total
[ 2023-10-08 01:42:18 ] Completed Epoch: 13 batch 687: moving batch data to device 7.200 ms, 55.84 s total
[ 2023-10-08 01:42:18 ] Completed Epoch: 13 batch 687: forward pass 108.212 ms, 55.95 s total
[ 2023-10-08 01:42:18 ] Completed Epoch: 13 batch 687: backward pass 46.510 ms, 56.00 s total
[ 2023-10-08 01:42:18 ] Completed Epoch: 13 batch 687: computing loss 146.865 ms, 56.15 s total
EPOCH: [13], BATCH: [687/889], loss: 0.353, loss_box_reg: 0.102, loss_classifier: 0.089, loss_mask: 0.127, loss_objectness: 0.013, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 687
[ 2023-10-08 01:42:20 ] Completed saving temp checkpoint 1,399.759 ms, 57.55 s total
[ 2023-10-08 01:42:20 ] Completed replacing temp checkpoint with checkpoint 40.991 ms, 57.59 s total
[ 2023-10-08 01:42:20 ] Completed Epoch: 13 batch 688: moving batch data to device 6.492 ms, 57.59 s total
[ 2023-10-08 01:42:20 ] Completed Epoch: 13 batch 688: forward pass 102.481 ms, 57.70 s total
[ 2023-10-08 01:42:20 ] Completed Epoch: 13 batch 688: backward pass 56.902 ms, 57.75 s total
[ 2023-10-08 01:42:20 ] Completed Epoch: 13 batch 688: computing loss 134.132 ms, 57.89 s total
EPOCH: [13], BATCH: [688/889], loss: 0.403, loss_box_reg: 0.120, loss_classifier: 0.103, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 688
[ 2023-10-08 01:42:21 ] Completed saving temp checkpoint 1,358.930 ms, 59.25 s total
[ 2023-10-08 01:42:22 ] Completed replacing temp checkpoint with checkpoint 58.851 ms, 59.30 s total
[ 2023-10-08 01:42:22 ] Completed Epoch: 13 batch 689: moving batch data to device 8.436 ms, 59.31 s total
[ 2023-10-08 01:42:22 ] Completed Epoch: 13 batch 689: forward pass 106.338 ms, 59.42 s total
[ 2023-10-08 01:42:22 ] Completed Epoch: 13 batch 689: backward pass 72.708 ms, 59.49 s total
[ 2023-10-08 01:42:22 ] Completed Epoch: 13 batch 689: computing loss 119.460 ms, 59.61 s total
EPOCH: [13], BATCH: [689/889], loss: 0.404, loss_box_reg: 0.120, loss_classifier: 0.104, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 689
[ 2023-10-08 01:42:23 ] Completed saving temp checkpoint 1,015.835 ms, 60.63 s total
[ 2023-10-08 01:42:23 ] Completed replacing temp checkpoint with checkpoint 46.606 ms, 60.67 s total
[ 2023-10-08 01:42:23 ] Completed Epoch: 13 batch 690: moving batch data to device 6.203 ms, 60.68 s total
[ 2023-10-08 01:42:23 ] Completed Epoch: 13 batch 690: forward pass 106.271 ms, 60.79 s total
[ 2023-10-08 01:42:23 ] Completed Epoch: 13 batch 690: backward pass 47.428 ms, 60.83 s total
[ 2023-10-08 01:42:23 ] Completed Epoch: 13 batch 690: computing loss 144.883 ms, 60.98 s total
EPOCH: [13], BATCH: [690/889], loss: 0.373, loss_box_reg: 0.107, loss_classifier: 0.095, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 690
[ 2023-10-08 01:42:24 ] Completed saving temp checkpoint 1,158.842 ms, 62.14 s total
[ 2023-10-08 01:42:24 ] Completed replacing temp checkpoint with checkpoint 81.188 ms, 62.22 s total
[ 2023-10-08 01:42:24 ] Completed Epoch: 13 batch 691: moving batch data to device 8.954 ms, 62.23 s total
[ 2023-10-08 01:42:25 ] Completed Epoch: 13 batch 691: forward pass 110.753 ms, 62.34 s total
[ 2023-10-08 01:42:25 ] Completed Epoch: 13 batch 691: backward pass 80.227 ms, 62.42 s total
[ 2023-10-08 01:42:25 ] Completed Epoch: 13 batch 691: computing loss 114.722 ms, 62.53 s total
EPOCH: [13], BATCH: [691/889], loss: 0.359, loss_box_reg: 0.108, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 691
[ 2023-10-08 01:42:26 ] Completed saving temp checkpoint 1,018.327 ms, 63.55 s total
[ 2023-10-08 01:42:26 ] Completed replacing temp checkpoint with checkpoint 58.493 ms, 63.61 s total
[ 2023-10-08 01:42:26 ] Completed Epoch: 13 batch 692: moving batch data to device 6.287 ms, 63.62 s total
[ 2023-10-08 01:42:26 ] Completed Epoch: 13 batch 692: forward pass 104.703 ms, 63.72 s total
[ 2023-10-08 01:42:26 ] Completed Epoch: 13 batch 692: backward pass 76.836 ms, 63.80 s total
[ 2023-10-08 01:42:26 ] Completed Epoch: 13 batch 692: computing loss 118.377 ms, 63.92 s total
EPOCH: [13], BATCH: [692/889], loss: 0.370, loss_box_reg: 0.109, loss_classifier: 0.092, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 692
[ 2023-10-08 01:42:27 ] Completed saving temp checkpoint 1,162.709 ms, 65.08 s total
[ 2023-10-08 01:42:27 ] Completed replacing temp checkpoint with checkpoint 77.880 ms, 65.16 s total
[ 2023-10-08 01:42:27 ] Completed Epoch: 13 batch 693: moving batch data to device 6.269 ms, 65.16 s total
[ 2023-10-08 01:42:27 ] Completed Epoch: 13 batch 693: forward pass 110.648 ms, 65.27 s total
[ 2023-10-08 01:42:28 ] Completed Epoch: 13 batch 693: backward pass 69.785 ms, 65.34 s total
[ 2023-10-08 01:42:28 ] Completed Epoch: 13 batch 693: computing loss 113.928 ms, 65.46 s total
EPOCH: [13], BATCH: [693/889], loss: 0.415, loss_box_reg: 0.124, loss_classifier: 0.105, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 693
[ 2023-10-08 01:42:29 ] Completed saving temp checkpoint 1,014.684 ms, 66.47 s total
[ 2023-10-08 01:42:29 ] Completed replacing temp checkpoint with checkpoint 52.195 ms, 66.52 s total
[ 2023-10-08 01:42:29 ] Completed Epoch: 13 batch 694: moving batch data to device 7.269 ms, 66.53 s total
[ 2023-10-08 01:42:29 ] Completed Epoch: 13 batch 694: forward pass 101.947 ms, 66.63 s total
[ 2023-10-08 01:42:29 ] Completed Epoch: 13 batch 694: backward pass 33.888 ms, 66.67 s total
[ 2023-10-08 01:42:29 ] Completed Epoch: 13 batch 694: computing loss 136.936 ms, 66.80 s total
EPOCH: [13], BATCH: [694/889], loss: 0.435, loss_box_reg: 0.133, loss_classifier: 0.110, loss_mask: 0.138, loss_objectness: 0.020, loss_rpn_box_reg: 0.035
Saving checkpoint at epoch 13 train batch 694
[ 2023-10-08 01:42:30 ] Completed saving temp checkpoint 1,192.971 ms, 68.00 s total
[ 2023-10-08 01:42:30 ] Completed replacing temp checkpoint with checkpoint 58.126 ms, 68.05 s total
[ 2023-10-08 01:42:30 ] Completed Epoch: 13 batch 695: moving batch data to device 5.855 ms, 68.06 s total
[ 2023-10-08 01:42:30 ] Completed Epoch: 13 batch 695: forward pass 102.710 ms, 68.16 s total
[ 2023-10-08 01:42:30 ] Completed Epoch: 13 batch 695: backward pass 81.669 ms, 68.25 s total
[ 2023-10-08 01:42:31 ] Completed Epoch: 13 batch 695: computing loss 121.380 ms, 68.37 s total
EPOCH: [13], BATCH: [695/889], loss: 0.406, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 695
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 01:55:45 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 01:55:45 ] Completed importing Timer 0.021 ms, 0.00 s total
[ 2023-10-08 01:55:45 ] Completed importing everything else 526.177 ms, 0.53 s total
[ 2023-10-08 01:55:45 ] Completed defined other functions 0.023 ms, 0.53 s total
| distributed init (rank 4): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 5): env://
| distributed init (rank 0): env://
| distributed init (rank 1): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 01:55:48 ] Completed main preliminaries 2,670.097 ms, 3.20 s total
loading annotations into memory...
Done (t=11.33s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-08 01:56:01 ] Completed loading data 13,273.341 ms, 16.47 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 01:56:01 ] Completed creating data samplers 105.959 ms, 16.58 s total
[ 2023-10-08 01:56:01 ] Completed creating data loaders 0.215 ms, 16.58 s total
[ 2023-10-08 01:56:02 ] Completed creating model and .to(device) 638.304 ms, 17.21 s total
[ 2023-10-08 01:56:03 ] Completed preparing model for distributed training 1,416.382 ms, 18.63 s total
[ 2023-10-08 01:56:03 ] Completed optimizer and scaler 0.626 ms, 18.63 s total
[ 2023-10-08 01:56:03 ] Completed learning rate schedulers 0.258 ms, 18.63 s total
[ 2023-10-08 01:56:04 ] Completed init coco evaluator 951.953 ms, 19.58 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 01:56:05 ] Completed retrieving checkpoint 929.446 ms, 20.51 s total
EPOCH :: 13
[ 2023-10-08 01:56:05 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 01:56:05 ] Completed training preliminaries 0.913 ms, 0.00 s total
Training / resuming epoch 13 from training step 695
[ 2023-10-08 01:56:06 ] Completed Epoch: 13 batch 695: moving batch data to device 522.244 ms, 0.52 s total
[ 2023-10-08 01:56:07 ] Completed Epoch: 13 batch 695: forward pass 846.108 ms, 1.37 s total
[ 2023-10-08 01:56:07 ] Completed Epoch: 13 batch 695: backward pass 177.229 ms, 1.55 s total
[ 2023-10-08 01:56:07 ] Completed Epoch: 13 batch 695: computing loss 527.307 ms, 2.07 s total
EPOCH: [13], BATCH: [695/889], loss: 0.404, loss_box_reg: 0.122, loss_classifier: 0.103, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 695
[ 2023-10-08 01:56:08 ] Completed saving temp checkpoint 826.990 ms, 2.90 s total
[ 2023-10-08 01:56:08 ] Completed replacing temp checkpoint with checkpoint 178.703 ms, 3.08 s total
[ 2023-10-08 01:56:08 ] Completed Epoch: 13 batch 696: moving batch data to device 3.936 ms, 3.08 s total
[ 2023-10-08 01:56:08 ] Completed Epoch: 13 batch 696: forward pass 109.345 ms, 3.19 s total
[ 2023-10-08 01:56:08 ] Completed Epoch: 13 batch 696: backward pass 128.837 ms, 3.32 s total
[ 2023-10-08 01:56:09 ] Completed Epoch: 13 batch 696: computing loss 97.569 ms, 3.42 s total
EPOCH: [13], BATCH: [696/889], loss: 0.364, loss_box_reg: 0.109, loss_classifier: 0.086, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 696
[ 2023-10-08 01:56:10 ] Completed saving temp checkpoint 1,047.460 ms, 4.47 s total
[ 2023-10-08 01:56:10 ] Completed replacing temp checkpoint with checkpoint 52.011 ms, 4.52 s total
[ 2023-10-08 01:56:10 ] Completed Epoch: 13 batch 697: moving batch data to device 3.586 ms, 4.52 s total
[ 2023-10-08 01:56:10 ] Completed Epoch: 13 batch 697: forward pass 110.296 ms, 4.63 s total
[ 2023-10-08 01:56:10 ] Completed Epoch: 13 batch 697: backward pass 70.152 ms, 4.70 s total
[ 2023-10-08 01:56:10 ] Completed Epoch: 13 batch 697: computing loss 146.967 ms, 4.85 s total
EPOCH: [13], BATCH: [697/889], loss: 0.394, loss_box_reg: 0.118, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 697
[ 2023-10-08 01:56:11 ] Completed saving temp checkpoint 1,014.375 ms, 5.86 s total
[ 2023-10-08 01:56:11 ] Completed replacing temp checkpoint with checkpoint 73.752 ms, 5.94 s total
[ 2023-10-08 01:56:11 ] Completed Epoch: 13 batch 698: moving batch data to device 13.309 ms, 5.95 s total
[ 2023-10-08 01:56:11 ] Completed Epoch: 13 batch 698: forward pass 107.339 ms, 6.06 s total
[ 2023-10-08 01:56:11 ] Completed Epoch: 13 batch 698: backward pass 81.096 ms, 6.14 s total
[ 2023-10-08 01:56:11 ] Completed Epoch: 13 batch 698: computing loss 136.091 ms, 6.28 s total
EPOCH: [13], BATCH: [698/889], loss: 0.394, loss_box_reg: 0.120, loss_classifier: 0.094, loss_mask: 0.139, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 698
[ 2023-10-08 01:56:12 ] Completed saving temp checkpoint 997.170 ms, 7.27 s total
[ 2023-10-08 01:56:12 ] Completed replacing temp checkpoint with checkpoint 70.209 ms, 7.34 s total
[ 2023-10-08 01:56:12 ] Completed Epoch: 13 batch 699: moving batch data to device 2.890 ms, 7.35 s total
[ 2023-10-08 01:56:13 ] Completed Epoch: 13 batch 699: forward pass 113.591 ms, 7.46 s total
[ 2023-10-08 01:56:13 ] Completed Epoch: 13 batch 699: backward pass 78.322 ms, 7.54 s total
[ 2023-10-08 01:56:13 ] Completed Epoch: 13 batch 699: computing loss 134.492 ms, 7.67 s total
EPOCH: [13], BATCH: [699/889], loss: 0.380, loss_box_reg: 0.119, loss_classifier: 0.095, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 699
[ 2023-10-08 01:56:14 ] Completed saving temp checkpoint 861.805 ms, 8.53 s total
[ 2023-10-08 01:56:14 ] Completed replacing temp checkpoint with checkpoint 63.080 ms, 8.60 s total
[ 2023-10-08 01:56:14 ] Completed Epoch: 13 batch 700: moving batch data to device 4.526 ms, 8.60 s total
[ 2023-10-08 01:56:14 ] Completed Epoch: 13 batch 700: forward pass 107.501 ms, 8.71 s total
[ 2023-10-08 01:56:14 ] Completed Epoch: 13 batch 700: backward pass 79.823 ms, 8.79 s total
[ 2023-10-08 01:56:14 ] Completed Epoch: 13 batch 700: computing loss 130.156 ms, 8.92 s total
EPOCH: [13], BATCH: [700/889], loss: 0.380, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 700
[ 2023-10-08 01:56:15 ] Completed saving temp checkpoint 1,366.562 ms, 10.29 s total
[ 2023-10-08 01:56:16 ] Completed replacing temp checkpoint with checkpoint 89.388 ms, 10.38 s total
[ 2023-10-08 01:56:16 ] Completed Epoch: 13 batch 701: moving batch data to device 8.327 ms, 10.38 s total
[ 2023-10-08 01:56:16 ] Completed Epoch: 13 batch 701: forward pass 107.247 ms, 10.49 s total
[ 2023-10-08 01:56:16 ] Completed Epoch: 13 batch 701: backward pass 44.721 ms, 10.54 s total
[ 2023-10-08 01:56:16 ] Completed Epoch: 13 batch 701: computing loss 168.888 ms, 10.70 s total
EPOCH: [13], BATCH: [701/889], loss: 0.351, loss_box_reg: 0.107, loss_classifier: 0.087, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 701
[ 2023-10-08 01:56:17 ] Completed saving temp checkpoint 1,253.711 ms, 11.96 s total
[ 2023-10-08 01:56:17 ] Completed replacing temp checkpoint with checkpoint 86.242 ms, 12.04 s total
[ 2023-10-08 01:56:17 ] Completed Epoch: 13 batch 702: moving batch data to device 13.674 ms, 12.06 s total
[ 2023-10-08 01:56:17 ] Completed Epoch: 13 batch 702: forward pass 104.892 ms, 12.16 s total
[ 2023-10-08 01:56:17 ] Completed Epoch: 13 batch 702: backward pass 79.969 ms, 12.24 s total
[ 2023-10-08 01:56:17 ] Completed Epoch: 13 batch 702: computing loss 114.798 ms, 12.36 s total
EPOCH: [13], BATCH: [702/889], loss: 0.386, loss_box_reg: 0.108, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 702
[ 2023-10-08 01:56:19 ] Completed saving temp checkpoint 1,793.261 ms, 14.15 s total
[ 2023-10-08 01:56:19 ] Completed replacing temp checkpoint with checkpoint 83.514 ms, 14.23 s total
[ 2023-10-08 01:56:19 ] Completed Epoch: 13 batch 703: moving batch data to device 2.646 ms, 14.24 s total
[ 2023-10-08 01:56:19 ] Completed Epoch: 13 batch 703: forward pass 107.618 ms, 14.34 s total
[ 2023-10-08 01:56:20 ] Completed Epoch: 13 batch 703: backward pass 69.917 ms, 14.41 s total
[ 2023-10-08 01:56:20 ] Completed Epoch: 13 batch 703: computing loss 129.319 ms, 14.54 s total
EPOCH: [13], BATCH: [703/889], loss: 0.405, loss_box_reg: 0.122, loss_classifier: 0.104, loss_mask: 0.129, loss_objectness: 0.018, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 703
[ 2023-10-08 01:56:21 ] Completed saving temp checkpoint 1,582.468 ms, 16.13 s total
[ 2023-10-08 01:56:21 ] Completed replacing temp checkpoint with checkpoint 95.820 ms, 16.22 s total
[ 2023-10-08 01:56:21 ] Completed Epoch: 13 batch 704: moving batch data to device 8.976 ms, 16.23 s total
[ 2023-10-08 01:56:21 ] Completed Epoch: 13 batch 704: forward pass 105.875 ms, 16.34 s total
[ 2023-10-08 01:56:22 ] Completed Epoch: 13 batch 704: backward pass 47.861 ms, 16.38 s total
[ 2023-10-08 01:56:22 ] Completed Epoch: 13 batch 704: computing loss 194.741 ms, 16.58 s total
EPOCH: [13], BATCH: [704/889], loss: 0.375, loss_box_reg: 0.115, loss_classifier: 0.094, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 704
[ 2023-10-08 01:56:23 ] Completed saving temp checkpoint 1,493.570 ms, 18.07 s total
[ 2023-10-08 01:56:23 ] Completed replacing temp checkpoint with checkpoint 93.600 ms, 18.17 s total
[ 2023-10-08 01:56:23 ] Completed Epoch: 13 batch 705: moving batch data to device 9.541 ms, 18.18 s total
[ 2023-10-08 01:56:23 ] Completed Epoch: 13 batch 705: forward pass 108.814 ms, 18.29 s total
[ 2023-10-08 01:56:24 ] Completed Epoch: 13 batch 705: backward pass 79.995 ms, 18.37 s total
[ 2023-10-08 01:56:24 ] Completed Epoch: 13 batch 705: computing loss 130.809 ms, 18.50 s total
EPOCH: [13], BATCH: [705/889], loss: 0.374, loss_box_reg: 0.108, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 705
[ 2023-10-08 01:56:25 ] Completed saving temp checkpoint 1,073.757 ms, 19.57 s total
[ 2023-10-08 01:56:25 ] Completed replacing temp checkpoint with checkpoint 59.390 ms, 19.63 s total
[ 2023-10-08 01:56:25 ] Completed Epoch: 13 batch 706: moving batch data to device 6.408 ms, 19.64 s total
[ 2023-10-08 01:56:25 ] Completed Epoch: 13 batch 706: forward pass 109.973 ms, 19.75 s total
[ 2023-10-08 01:56:25 ] Completed Epoch: 13 batch 706: backward pass 79.279 ms, 19.82 s total
[ 2023-10-08 01:56:25 ] Completed Epoch: 13 batch 706: computing loss 118.458 ms, 19.94 s total
EPOCH: [13], BATCH: [706/889], loss: 0.366, loss_box_reg: 0.111, loss_classifier: 0.095, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.019
Saving checkpoint at epoch 13 train batch 706
[ 2023-10-08 01:56:26 ] Completed saving temp checkpoint 1,119.954 ms, 21.06 s total
[ 2023-10-08 01:56:26 ] Completed replacing temp checkpoint with checkpoint 78.490 ms, 21.14 s total
[ 2023-10-08 01:56:26 ] Completed Epoch: 13 batch 707: moving batch data to device 7.441 ms, 21.15 s total
[ 2023-10-08 01:56:26 ] Completed Epoch: 13 batch 707: forward pass 107.054 ms, 21.26 s total
[ 2023-10-08 01:56:26 ] Completed Epoch: 13 batch 707: backward pass 42.669 ms, 21.30 s total
[ 2023-10-08 01:56:27 ] Completed Epoch: 13 batch 707: computing loss 209.375 ms, 21.51 s total
EPOCH: [13], BATCH: [707/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.017, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 707
[ 2023-10-08 01:56:28 ] Completed saving temp checkpoint 980.745 ms, 22.49 s total
[ 2023-10-08 01:56:28 ] Completed replacing temp checkpoint with checkpoint 57.256 ms, 22.55 s total
[ 2023-10-08 01:56:28 ] Completed Epoch: 13 batch 708: moving batch data to device 8.483 ms, 22.55 s total
[ 2023-10-08 01:56:28 ] Completed Epoch: 13 batch 708: forward pass 106.493 ms, 22.66 s total
[ 2023-10-08 01:56:28 ] Completed Epoch: 13 batch 708: backward pass 50.786 ms, 22.71 s total
[ 2023-10-08 01:56:28 ] Completed Epoch: 13 batch 708: computing loss 141.346 ms, 22.85 s total
EPOCH: [13], BATCH: [708/889], loss: 0.375, loss_box_reg: 0.116, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 708
[ 2023-10-08 01:56:29 ] Completed saving temp checkpoint 1,077.884 ms, 23.93 s total
[ 2023-10-08 01:56:29 ] Completed replacing temp checkpoint with checkpoint 76.120 ms, 24.01 s total
[ 2023-10-08 01:56:29 ] Completed Epoch: 13 batch 709: moving batch data to device 8.650 ms, 24.02 s total
[ 2023-10-08 01:56:29 ] Completed Epoch: 13 batch 709: forward pass 106.965 ms, 24.12 s total
[ 2023-10-08 01:56:29 ] Completed Epoch: 13 batch 709: backward pass 75.958 ms, 24.20 s total
[ 2023-10-08 01:56:29 ] Completed Epoch: 13 batch 709: computing loss 120.747 ms, 24.32 s total
EPOCH: [13], BATCH: [709/889], loss: 0.390, loss_box_reg: 0.120, loss_classifier: 0.092, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 709
[ 2023-10-08 01:56:31 ] Completed saving temp checkpoint 1,060.664 ms, 25.38 s total
[ 2023-10-08 01:56:31 ] Completed replacing temp checkpoint with checkpoint 46.704 ms, 25.43 s total
[ 2023-10-08 01:56:31 ] Completed Epoch: 13 batch 710: moving batch data to device 8.214 ms, 25.44 s total
[ 2023-10-08 01:56:31 ] Completed Epoch: 13 batch 710: forward pass 106.622 ms, 25.54 s total
[ 2023-10-08 01:56:31 ] Completed Epoch: 13 batch 710: backward pass 77.176 ms, 25.62 s total
[ 2023-10-08 01:56:31 ] Completed Epoch: 13 batch 710: computing loss 114.604 ms, 25.73 s total
EPOCH: [13], BATCH: [710/889], loss: 0.420, loss_box_reg: 0.130, loss_classifier: 0.106, loss_mask: 0.142, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 710
[ 2023-10-08 01:56:32 ] Completed saving temp checkpoint 1,171.822 ms, 26.91 s total
[ 2023-10-08 01:56:32 ] Completed replacing temp checkpoint with checkpoint 64.597 ms, 26.97 s total
[ 2023-10-08 01:56:32 ] Completed Epoch: 13 batch 711: moving batch data to device 4.608 ms, 26.97 s total
[ 2023-10-08 01:56:32 ] Completed Epoch: 13 batch 711: forward pass 108.088 ms, 27.08 s total
[ 2023-10-08 01:56:32 ] Completed Epoch: 13 batch 711: backward pass 43.021 ms, 27.13 s total
[ 2023-10-08 01:56:32 ] Completed Epoch: 13 batch 711: computing loss 155.182 ms, 27.28 s total
EPOCH: [13], BATCH: [711/889], loss: 0.373, loss_box_reg: 0.114, loss_classifier: 0.097, loss_mask: 0.128, loss_objectness: 0.012, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 711
[ 2023-10-08 01:56:33 ] Completed saving temp checkpoint 1,049.312 ms, 28.33 s total
[ 2023-10-08 01:56:34 ] Completed replacing temp checkpoint with checkpoint 73.100 ms, 28.40 s total
[ 2023-10-08 01:56:34 ] Completed Epoch: 13 batch 712: moving batch data to device 6.781 ms, 28.41 s total
[ 2023-10-08 01:56:34 ] Completed Epoch: 13 batch 712: forward pass 107.216 ms, 28.52 s total
[ 2023-10-08 01:56:34 ] Completed Epoch: 13 batch 712: backward pass 69.358 ms, 28.59 s total
[ 2023-10-08 01:56:34 ] Completed Epoch: 13 batch 712: computing loss 304.600 ms, 28.89 s total
EPOCH: [13], BATCH: [712/889], loss: 0.403, loss_box_reg: 0.125, loss_classifier: 0.099, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 712
[ 2023-10-08 01:56:35 ] Completed saving temp checkpoint 1,144.952 ms, 30.04 s total
[ 2023-10-08 01:56:35 ] Completed replacing temp checkpoint with checkpoint 56.406 ms, 30.09 s total
[ 2023-10-08 01:56:35 ] Completed Epoch: 13 batch 713: moving batch data to device 4.452 ms, 30.10 s total
[ 2023-10-08 01:56:35 ] Completed Epoch: 13 batch 713: forward pass 105.055 ms, 30.20 s total
[ 2023-10-08 01:56:35 ] Completed Epoch: 13 batch 713: backward pass 69.900 ms, 30.27 s total
[ 2023-10-08 01:56:36 ] Completed Epoch: 13 batch 713: computing loss 127.278 ms, 30.40 s total
EPOCH: [13], BATCH: [713/889], loss: 0.341, loss_box_reg: 0.100, loss_classifier: 0.081, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 713
[ 2023-10-08 01:56:37 ] Completed saving temp checkpoint 1,167.554 ms, 31.57 s total
[ 2023-10-08 01:56:37 ] Completed replacing temp checkpoint with checkpoint 66.526 ms, 31.63 s total
[ 2023-10-08 01:56:37 ] Completed Epoch: 13 batch 714: moving batch data to device 7.025 ms, 31.64 s total
[ 2023-10-08 01:56:37 ] Completed Epoch: 13 batch 714: forward pass 103.546 ms, 31.74 s total
[ 2023-10-08 01:56:37 ] Completed Epoch: 13 batch 714: backward pass 34.969 ms, 31.78 s total
[ 2023-10-08 01:56:37 ] Completed Epoch: 13 batch 714: computing loss 135.131 ms, 31.91 s total
EPOCH: [13], BATCH: [714/889], loss: 0.372, loss_box_reg: 0.118, loss_classifier: 0.092, loss_mask: 0.124, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 714
[ 2023-10-08 01:56:39 ] Completed saving temp checkpoint 1,806.228 ms, 33.72 s total
[ 2023-10-08 01:56:39 ] Completed replacing temp checkpoint with checkpoint 80.123 ms, 33.80 s total
[ 2023-10-08 01:56:39 ] Completed Epoch: 13 batch 715: moving batch data to device 8.613 ms, 33.81 s total
[ 2023-10-08 01:56:39 ] Completed Epoch: 13 batch 715: forward pass 105.220 ms, 33.91 s total
[ 2023-10-08 01:56:39 ] Completed Epoch: 13 batch 715: backward pass 42.116 ms, 33.96 s total
[ 2023-10-08 01:56:39 ] Completed Epoch: 13 batch 715: computing loss 146.858 ms, 34.10 s total
EPOCH: [13], BATCH: [715/889], loss: 0.346, loss_box_reg: 0.105, loss_classifier: 0.089, loss_mask: 0.117, loss_objectness: 0.013, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 715
[ 2023-10-08 01:56:41 ] Completed saving temp checkpoint 1,519.652 ms, 35.62 s total
[ 2023-10-08 01:56:41 ] Completed replacing temp checkpoint with checkpoint 68.269 ms, 35.69 s total
[ 2023-10-08 01:56:41 ] Completed Epoch: 13 batch 716: moving batch data to device 7.116 ms, 35.70 s total
[ 2023-10-08 01:56:41 ] Completed Epoch: 13 batch 716: forward pass 106.004 ms, 35.80 s total
[ 2023-10-08 01:56:41 ] Completed Epoch: 13 batch 716: backward pass 76.106 ms, 35.88 s total
[ 2023-10-08 01:56:41 ] Completed Epoch: 13 batch 716: computing loss 111.928 ms, 35.99 s total
EPOCH: [13], BATCH: [716/889], loss: 0.362, loss_box_reg: 0.104, loss_classifier: 0.098, loss_mask: 0.120, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 716
[ 2023-10-08 01:56:42 ] Completed saving temp checkpoint 1,241.774 ms, 37.23 s total
[ 2023-10-08 01:56:42 ] Completed replacing temp checkpoint with checkpoint 103.407 ms, 37.34 s total
[ 2023-10-08 01:56:42 ] Completed Epoch: 13 batch 717: moving batch data to device 7.517 ms, 37.35 s total
[ 2023-10-08 01:56:43 ] Completed Epoch: 13 batch 717: forward pass 108.986 ms, 37.45 s total
[ 2023-10-08 01:56:43 ] Completed Epoch: 13 batch 717: backward pass 44.657 ms, 37.50 s total
[ 2023-10-08 01:56:43 ] Completed Epoch: 13 batch 717: computing loss 150.311 ms, 37.65 s total
EPOCH: [13], BATCH: [717/889], loss: 0.371, loss_box_reg: 0.114, loss_classifier: 0.095, loss_mask: 0.123, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 717
[ 2023-10-08 01:56:44 ] Completed saving temp checkpoint 1,026.821 ms, 38.68 s total
[ 2023-10-08 01:56:44 ] Completed replacing temp checkpoint with checkpoint 66.710 ms, 38.74 s total
[ 2023-10-08 01:56:44 ] Completed Epoch: 13 batch 718: moving batch data to device 7.947 ms, 38.75 s total
[ 2023-10-08 01:56:44 ] Completed Epoch: 13 batch 718: forward pass 111.484 ms, 38.86 s total
[ 2023-10-08 01:56:44 ] Completed Epoch: 13 batch 718: backward pass 72.845 ms, 38.93 s total
[ 2023-10-08 01:56:44 ] Completed Epoch: 13 batch 718: computing loss 122.257 ms, 39.06 s total
EPOCH: [13], BATCH: [718/889], loss: 0.428, loss_box_reg: 0.131, loss_classifier: 0.112, loss_mask: 0.133, loss_objectness: 0.019, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 718
[ 2023-10-08 01:56:45 ] Completed saving temp checkpoint 1,077.112 ms, 40.13 s total
[ 2023-10-08 01:56:45 ] Completed replacing temp checkpoint with checkpoint 87.054 ms, 40.22 s total
[ 2023-10-08 01:56:45 ] Completed Epoch: 13 batch 719: moving batch data to device 7.410 ms, 40.23 s total
[ 2023-10-08 01:56:45 ] Completed Epoch: 13 batch 719: forward pass 108.744 ms, 40.34 s total
[ 2023-10-08 01:56:46 ] Completed Epoch: 13 batch 719: backward pass 72.093 ms, 40.41 s total
[ 2023-10-08 01:56:46 ] Completed Epoch: 13 batch 719: computing loss 123.600 ms, 40.53 s total
EPOCH: [13], BATCH: [719/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.104, loss_mask: 0.134, loss_objectness: 0.017, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 719
[ 2023-10-08 01:56:47 ] Completed saving temp checkpoint 1,011.889 ms, 41.54 s total
[ 2023-10-08 01:56:47 ] Completed replacing temp checkpoint with checkpoint 53.091 ms, 41.60 s total
[ 2023-10-08 01:56:47 ] Completed Epoch: 13 batch 720: moving batch data to device 7.950 ms, 41.61 s total
[ 2023-10-08 01:56:47 ] Completed Epoch: 13 batch 720: forward pass 101.506 ms, 41.71 s total
[ 2023-10-08 01:56:47 ] Completed Epoch: 13 batch 720: backward pass 66.120 ms, 41.77 s total
[ 2023-10-08 01:56:47 ] Completed Epoch: 13 batch 720: computing loss 222.902 ms, 42.00 s total
EPOCH: [13], BATCH: [720/889], loss: 0.368, loss_box_reg: 0.105, loss_classifier: 0.092, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 720
[ 2023-10-08 01:56:48 ] Completed saving temp checkpoint 1,100.939 ms, 43.10 s total
[ 2023-10-08 01:56:48 ] Completed replacing temp checkpoint with checkpoint 78.717 ms, 43.18 s total
[ 2023-10-08 01:56:48 ] Completed Epoch: 13 batch 721: moving batch data to device 10.851 ms, 43.19 s total
[ 2023-10-08 01:56:48 ] Completed Epoch: 13 batch 721: forward pass 103.992 ms, 43.29 s total
[ 2023-10-08 01:56:49 ] Completed Epoch: 13 batch 721: backward pass 72.754 ms, 43.36 s total
[ 2023-10-08 01:56:49 ] Completed Epoch: 13 batch 721: computing loss 123.795 ms, 43.49 s total
EPOCH: [13], BATCH: [721/889], loss: 0.374, loss_box_reg: 0.110, loss_classifier: 0.093, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 721
[ 2023-10-08 01:56:50 ] Completed saving temp checkpoint 1,393.385 ms, 44.88 s total
[ 2023-10-08 01:56:50 ] Completed replacing temp checkpoint with checkpoint 76.370 ms, 44.96 s total
[ 2023-10-08 01:56:50 ] Completed Epoch: 13 batch 722: moving batch data to device 6.245 ms, 44.96 s total
[ 2023-10-08 01:56:50 ] Completed Epoch: 13 batch 722: forward pass 105.886 ms, 45.07 s total
[ 2023-10-08 01:56:50 ] Completed Epoch: 13 batch 722: backward pass 80.388 ms, 45.15 s total
[ 2023-10-08 01:56:50 ] Completed Epoch: 13 batch 722: computing loss 90.171 ms, 45.24 s total
EPOCH: [13], BATCH: [722/889], loss: 0.372, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 722
[ 2023-10-08 01:56:52 ] Completed saving temp checkpoint 1,972.839 ms, 47.21 s total
[ 2023-10-08 01:56:52 ] Completed replacing temp checkpoint with checkpoint 92.623 ms, 47.31 s total
[ 2023-10-08 01:56:52 ] Completed Epoch: 13 batch 723: moving batch data to device 6.275 ms, 47.31 s total
[ 2023-10-08 01:56:53 ] Completed Epoch: 13 batch 723: forward pass 104.387 ms, 47.42 s total
[ 2023-10-08 01:56:53 ] Completed Epoch: 13 batch 723: backward pass 82.491 ms, 47.50 s total
[ 2023-10-08 01:56:53 ] Completed Epoch: 13 batch 723: computing loss 84.799 ms, 47.58 s total
EPOCH: [13], BATCH: [723/889], loss: 0.347, loss_box_reg: 0.107, loss_classifier: 0.088, loss_mask: 0.121, loss_objectness: 0.011, loss_rpn_box_reg: 0.018
Saving checkpoint at epoch 13 train batch 723
[ 2023-10-08 01:56:54 ] Completed saving temp checkpoint 1,421.752 ms, 49.01 s total
[ 2023-10-08 01:56:54 ] Completed replacing temp checkpoint with checkpoint 64.960 ms, 49.07 s total
[ 2023-10-08 01:56:54 ] Completed Epoch: 13 batch 724: moving batch data to device 4.719 ms, 49.07 s total
[ 2023-10-08 01:56:54 ] Completed Epoch: 13 batch 724: forward pass 102.074 ms, 49.18 s total
[ 2023-10-08 01:56:54 ] Completed Epoch: 13 batch 724: backward pass 68.540 ms, 49.25 s total
[ 2023-10-08 01:56:55 ] Completed Epoch: 13 batch 724: computing loss 327.864 ms, 49.57 s total
EPOCH: [13], BATCH: [724/889], loss: 0.396, loss_box_reg: 0.119, loss_classifier: 0.094, loss_mask: 0.137, loss_objectness: 0.015, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 724
[ 2023-10-08 01:56:56 ] Completed saving temp checkpoint 1,108.011 ms, 50.68 s total
[ 2023-10-08 01:56:56 ] Completed replacing temp checkpoint with checkpoint 62.951 ms, 50.74 s total
[ 2023-10-08 01:56:56 ] Completed Epoch: 13 batch 725: moving batch data to device 5.875 ms, 50.75 s total
[ 2023-10-08 01:56:56 ] Completed Epoch: 13 batch 725: forward pass 104.483 ms, 50.85 s total
[ 2023-10-08 01:56:56 ] Completed Epoch: 13 batch 725: backward pass 72.883 ms, 50.93 s total
[ 2023-10-08 01:56:56 ] Completed Epoch: 13 batch 725: computing loss 121.549 ms, 51.05 s total
EPOCH: [13], BATCH: [725/889], loss: 0.431, loss_box_reg: 0.131, loss_classifier: 0.110, loss_mask: 0.139, loss_objectness: 0.018, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 725
[ 2023-10-08 01:56:57 ] Completed saving temp checkpoint 949.175 ms, 52.00 s total
[ 2023-10-08 01:56:57 ] Completed replacing temp checkpoint with checkpoint 44.939 ms, 52.04 s total
[ 2023-10-08 01:56:57 ] Completed Epoch: 13 batch 726: moving batch data to device 5.582 ms, 52.05 s total
[ 2023-10-08 01:56:57 ] Completed Epoch: 13 batch 726: forward pass 104.306 ms, 52.15 s total
[ 2023-10-08 01:56:57 ] Completed Epoch: 13 batch 726: backward pass 73.143 ms, 52.23 s total
[ 2023-10-08 01:56:57 ] Completed Epoch: 13 batch 726: computing loss 116.470 ms, 52.34 s total
EPOCH: [13], BATCH: [726/889], loss: 0.409, loss_box_reg: 0.130, loss_classifier: 0.105, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 726
[ 2023-10-08 01:56:59 ] Completed saving temp checkpoint 1,427.820 ms, 53.77 s total
[ 2023-10-08 01:56:59 ] Completed replacing temp checkpoint with checkpoint 75.814 ms, 53.85 s total
[ 2023-10-08 01:56:59 ] Completed Epoch: 13 batch 727: moving batch data to device 5.159 ms, 53.85 s total
[ 2023-10-08 01:56:59 ] Completed Epoch: 13 batch 727: forward pass 105.025 ms, 53.96 s total
[ 2023-10-08 01:56:59 ] Completed Epoch: 13 batch 727: backward pass 42.042 ms, 54.00 s total
[ 2023-10-08 01:56:59 ] Completed Epoch: 13 batch 727: computing loss 149.142 ms, 54.15 s total
EPOCH: [13], BATCH: [727/889], loss: 0.349, loss_box_reg: 0.103, loss_classifier: 0.086, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 727
[ 2023-10-08 01:57:00 ] Completed saving temp checkpoint 1,065.151 ms, 55.21 s total
[ 2023-10-08 01:57:00 ] Completed replacing temp checkpoint with checkpoint 59.142 ms, 55.27 s total
[ 2023-10-08 01:57:00 ] Completed Epoch: 13 batch 728: moving batch data to device 4.628 ms, 55.28 s total
[ 2023-10-08 01:57:01 ] Completed Epoch: 13 batch 728: forward pass 108.572 ms, 55.39 s total
[ 2023-10-08 01:57:01 ] Completed Epoch: 13 batch 728: backward pass 77.236 ms, 55.46 s total
[ 2023-10-08 01:57:01 ] Completed Epoch: 13 batch 728: computing loss 95.404 ms, 55.56 s total
EPOCH: [13], BATCH: [728/889], loss: 0.390, loss_box_reg: 0.114, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 728
[ 2023-10-08 01:57:02 ] Completed saving temp checkpoint 1,260.546 ms, 56.82 s total
[ 2023-10-08 01:57:02 ] Completed replacing temp checkpoint with checkpoint 80.611 ms, 56.90 s total
[ 2023-10-08 01:57:02 ] Completed Epoch: 13 batch 729: moving batch data to device 7.019 ms, 56.91 s total
[ 2023-10-08 01:57:02 ] Completed Epoch: 13 batch 729: forward pass 105.489 ms, 57.01 s total
[ 2023-10-08 01:57:02 ] Completed Epoch: 13 batch 729: backward pass 73.379 ms, 57.08 s total
[ 2023-10-08 01:57:02 ] Completed Epoch: 13 batch 729: computing loss 95.125 ms, 57.18 s total
EPOCH: [13], BATCH: [729/889], loss: 0.392, loss_box_reg: 0.116, loss_classifier: 0.093, loss_mask: 0.130, loss_objectness: 0.017, loss_rpn_box_reg: 0.036
Saving checkpoint at epoch 13 train batch 729
[ 2023-10-08 01:57:03 ] Completed saving temp checkpoint 953.255 ms, 58.13 s total
[ 2023-10-08 01:57:03 ] Completed replacing temp checkpoint with checkpoint 70.097 ms, 58.20 s total
[ 2023-10-08 01:57:03 ] Completed Epoch: 13 batch 730: moving batch data to device 9.129 ms, 58.21 s total
[ 2023-10-08 01:57:03 ] Completed Epoch: 13 batch 730: forward pass 106.613 ms, 58.32 s total
[ 2023-10-08 01:57:04 ] Completed Epoch: 13 batch 730: backward pass 76.437 ms, 58.40 s total
[ 2023-10-08 01:57:04 ] Completed Epoch: 13 batch 730: computing loss 116.800 ms, 58.51 s total
EPOCH: [13], BATCH: [730/889], loss: 0.388, loss_box_reg: 0.120, loss_classifier: 0.097, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 730
[ 2023-10-08 01:57:05 ] Completed saving temp checkpoint 1,123.110 ms, 59.64 s total
[ 2023-10-08 01:57:05 ] Completed replacing temp checkpoint with checkpoint 65.979 ms, 59.70 s total
[ 2023-10-08 01:57:05 ] Completed Epoch: 13 batch 731: moving batch data to device 7.638 ms, 59.71 s total
[ 2023-10-08 01:57:05 ] Completed Epoch: 13 batch 731: forward pass 106.618 ms, 59.82 s total
[ 2023-10-08 01:57:05 ] Completed Epoch: 13 batch 731: backward pass 73.572 ms, 59.89 s total
[ 2023-10-08 01:57:05 ] Completed Epoch: 13 batch 731: computing loss 116.736 ms, 60.01 s total
EPOCH: [13], BATCH: [731/889], loss: 0.385, loss_box_reg: 0.115, loss_classifier: 0.096, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 731
[ 2023-10-08 01:57:06 ] Completed saving temp checkpoint 967.538 ms, 60.97 s total
[ 2023-10-08 01:57:06 ] Completed replacing temp checkpoint with checkpoint 71.989 ms, 61.05 s total
[ 2023-10-08 01:57:06 ] Completed Epoch: 13 batch 732: moving batch data to device 6.385 ms, 61.05 s total
[ 2023-10-08 01:57:06 ] Completed Epoch: 13 batch 732: forward pass 106.508 ms, 61.16 s total
[ 2023-10-08 01:57:06 ] Completed Epoch: 13 batch 732: backward pass 36.381 ms, 61.19 s total
[ 2023-10-08 01:57:06 ] Completed Epoch: 13 batch 732: computing loss 162.416 ms, 61.36 s total
EPOCH: [13], BATCH: [732/889], loss: 0.396, loss_box_reg: 0.119, loss_classifier: 0.099, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 732
[ 2023-10-08 01:57:08 ] Completed saving temp checkpoint 1,226.487 ms, 62.58 s total
[ 2023-10-08 01:57:08 ] Completed replacing temp checkpoint with checkpoint 39.046 ms, 62.62 s total
[ 2023-10-08 01:57:08 ] Completed Epoch: 13 batch 733: moving batch data to device 5.045 ms, 62.63 s total
[ 2023-10-08 01:57:08 ] Completed Epoch: 13 batch 733: forward pass 104.037 ms, 62.73 s total
[ 2023-10-08 01:57:08 ] Completed Epoch: 13 batch 733: backward pass 65.985 ms, 62.80 s total
[ 2023-10-08 01:57:08 ] Completed Epoch: 13 batch 733: computing loss 104.399 ms, 62.90 s total
EPOCH: [13], BATCH: [733/889], loss: 0.357, loss_box_reg: 0.105, loss_classifier: 0.090, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 733
[ 2023-10-08 01:57:10 ] Completed saving temp checkpoint 1,685.018 ms, 64.59 s total
[ 2023-10-08 01:57:10 ] Completed replacing temp checkpoint with checkpoint 64.342 ms, 64.65 s total
[ 2023-10-08 01:57:10 ] Completed Epoch: 13 batch 734: moving batch data to device 7.246 ms, 64.66 s total
[ 2023-10-08 01:57:10 ] Completed Epoch: 13 batch 734: forward pass 106.073 ms, 64.76 s total
[ 2023-10-08 01:57:10 ] Completed Epoch: 13 batch 734: backward pass 83.349 ms, 64.85 s total
[ 2023-10-08 01:57:10 ] Completed Epoch: 13 batch 734: computing loss 115.051 ms, 64.96 s total
EPOCH: [13], BATCH: [734/889], loss: 0.399, loss_box_reg: 0.118, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 734
[ 2023-10-08 01:57:12 ] Completed saving temp checkpoint 2,007.961 ms, 66.97 s total
[ 2023-10-08 01:57:12 ] Completed replacing temp checkpoint with checkpoint 76.782 ms, 67.05 s total
[ 2023-10-08 01:57:12 ] Completed Epoch: 13 batch 735: moving batch data to device 6.620 ms, 67.05 s total
[ 2023-10-08 01:57:12 ] Completed Epoch: 13 batch 735: forward pass 99.990 ms, 67.15 s total
[ 2023-10-08 01:57:12 ] Completed Epoch: 13 batch 735: backward pass 70.176 ms, 67.22 s total
[ 2023-10-08 01:57:12 ] Completed Epoch: 13 batch 735: computing loss 128.016 ms, 67.35 s total
EPOCH: [13], BATCH: [735/889], loss: 0.380, loss_box_reg: 0.122, loss_classifier: 0.091, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 735
[ 2023-10-08 01:57:14 ] Completed saving temp checkpoint 1,195.007 ms, 68.55 s total
[ 2023-10-08 01:57:14 ] Completed replacing temp checkpoint with checkpoint 53.772 ms, 68.60 s total
[ 2023-10-08 01:57:14 ] Completed Epoch: 13 batch 736: moving batch data to device 8.065 ms, 68.61 s total
[ 2023-10-08 01:57:14 ] Completed Epoch: 13 batch 736: forward pass 105.259 ms, 68.71 s total
[ 2023-10-08 01:57:14 ] Completed Epoch: 13 batch 736: backward pass 83.512 ms, 68.80 s total
[ 2023-10-08 01:57:14 ] Completed Epoch: 13 batch 736: computing loss 88.254 ms, 68.89 s total
EPOCH: [13], BATCH: [736/889], loss: 0.390, loss_box_reg: 0.120, loss_classifier: 0.098, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 736
[ 2023-10-08 01:57:15 ] Completed saving temp checkpoint 1,035.521 ms, 69.92 s total
[ 2023-10-08 01:57:15 ] Completed replacing temp checkpoint with checkpoint 51.668 ms, 69.97 s total
[ 2023-10-08 01:57:15 ] Completed Epoch: 13 batch 737: moving batch data to device 4.510 ms, 69.98 s total
[ 2023-10-08 01:57:15 ] Completed Epoch: 13 batch 737: forward pass 99.815 ms, 70.08 s total
[ 2023-10-08 01:57:15 ] Completed Epoch: 13 batch 737: backward pass 69.466 ms, 70.15 s total
[ 2023-10-08 01:57:15 ] Completed Epoch: 13 batch 737: computing loss 98.629 ms, 70.25 s total
EPOCH: [13], BATCH: [737/889], loss: 0.368, loss_box_reg: 0.115, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.012, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 737
[ 2023-10-08 01:57:16 ] Completed saving temp checkpoint 934.065 ms, 71.18 s total
[ 2023-10-08 01:57:16 ] Completed replacing temp checkpoint with checkpoint 48.182 ms, 71.23 s total
[ 2023-10-08 01:57:16 ] Completed Epoch: 13 batch 738: moving batch data to device 5.838 ms, 71.23 s total
[ 2023-10-08 01:57:16 ] Completed Epoch: 13 batch 738: forward pass 103.776 ms, 71.34 s total
[ 2023-10-08 01:57:17 ] Completed Epoch: 13 batch 738: backward pass 68.464 ms, 71.41 s total
[ 2023-10-08 01:57:17 ] Completed Epoch: 13 batch 738: computing loss 115.601 ms, 71.52 s total
EPOCH: [13], BATCH: [738/889], loss: 0.361, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 738
[ 2023-10-08 01:57:18 ] Completed saving temp checkpoint 1,074.715 ms, 72.60 s total
[ 2023-10-08 01:57:18 ] Completed replacing temp checkpoint with checkpoint 63.398 ms, 72.66 s total
[ 2023-10-08 01:57:18 ] Completed Epoch: 13 batch 739: moving batch data to device 7.530 ms, 72.67 s total
[ 2023-10-08 01:57:18 ] Completed Epoch: 13 batch 739: forward pass 102.255 ms, 72.77 s total
[ 2023-10-08 01:57:18 ] Completed Epoch: 13 batch 739: backward pass 51.637 ms, 72.82 s total
[ 2023-10-08 01:57:18 ] Completed Epoch: 13 batch 739: computing loss 148.229 ms, 72.97 s total
EPOCH: [13], BATCH: [739/889], loss: 0.368, loss_box_reg: 0.106, loss_classifier: 0.089, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 739
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 02:10:34 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:10:34 ] Completed importing Timer 0.021 ms, 0.00 s total
[ 2023-10-08 02:10:35 ] Completed importing everything else 583.042 ms, 0.58 s total
[ 2023-10-08 02:10:35 ] Completed defined other functions 0.026 ms, 0.58 s total
| distributed init (rank 2): env://
| distributed init (rank 0): env://
| distributed init (rank 3): env://
| distributed init (rank 4): env://
| distributed init (rank 5): env://
| distributed init (rank 1): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 02:10:43 ] Completed main preliminaries 7,432.592 ms, 8.02 s total
loading annotations into memory...
Done (t=11.23s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-08 02:10:56 ] Completed loading data 13,053.319 ms, 21.07 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 02:10:56 ] Completed creating data samplers 100.610 ms, 21.17 s total
[ 2023-10-08 02:10:56 ] Completed creating data loaders 0.210 ms, 21.17 s total
[ 2023-10-08 02:10:56 ] Completed creating model and .to(device) 668.945 ms, 21.84 s total
[ 2023-10-08 02:10:58 ] Completed preparing model for distributed training 1,621.409 ms, 23.46 s total
[ 2023-10-08 02:10:58 ] Completed optimizer and scaler 0.603 ms, 23.46 s total
[ 2023-10-08 02:10:58 ] Completed learning rate schedulers 0.223 ms, 23.46 s total
[ 2023-10-08 02:10:59 ] Completed init coco evaluator 947.531 ms, 24.41 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 02:11:00 ] Completed retrieving checkpoint 861.206 ms, 25.27 s total
EPOCH :: 13
[ 2023-10-08 02:11:00 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:11:00 ] Completed training preliminaries 0.873 ms, 0.00 s total
Training / resuming epoch 13 from training step 739
[ 2023-10-08 02:11:00 ] Completed Epoch: 13 batch 739: moving batch data to device 517.862 ms, 0.52 s total
[ 2023-10-08 02:11:01 ] Completed Epoch: 13 batch 739: forward pass 1,073.185 ms, 1.59 s total
[ 2023-10-08 02:11:02 ] Completed Epoch: 13 batch 739: backward pass 157.575 ms, 1.75 s total
[ 2023-10-08 02:11:02 ] Completed Epoch: 13 batch 739: computing loss 183.840 ms, 1.93 s total
EPOCH: [13], BATCH: [739/889], loss: 0.368, loss_box_reg: 0.108, loss_classifier: 0.088, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 739
[ 2023-10-08 02:11:03 ] Completed saving temp checkpoint 1,048.269 ms, 2.98 s total
[ 2023-10-08 02:11:03 ] Completed replacing temp checkpoint with checkpoint 172.257 ms, 3.15 s total
[ 2023-10-08 02:11:03 ] Completed Epoch: 13 batch 740: moving batch data to device 6.275 ms, 3.16 s total
[ 2023-10-08 02:11:03 ] Completed Epoch: 13 batch 740: forward pass 108.404 ms, 3.27 s total
[ 2023-10-08 02:11:03 ] Completed Epoch: 13 batch 740: backward pass 92.023 ms, 3.36 s total
[ 2023-10-08 02:11:03 ] Completed Epoch: 13 batch 740: computing loss 137.039 ms, 3.50 s total
EPOCH: [13], BATCH: [740/889], loss: 0.361, loss_box_reg: 0.106, loss_classifier: 0.088, loss_mask: 0.130, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 740
[ 2023-10-08 02:11:05 ] Completed saving temp checkpoint 1,340.469 ms, 4.84 s total
[ 2023-10-08 02:11:05 ] Completed replacing temp checkpoint with checkpoint 74.974 ms, 4.91 s total
[ 2023-10-08 02:11:05 ] Completed Epoch: 13 batch 741: moving batch data to device 3.898 ms, 4.92 s total
[ 2023-10-08 02:11:05 ] Completed Epoch: 13 batch 741: forward pass 109.704 ms, 5.03 s total
[ 2023-10-08 02:11:05 ] Completed Epoch: 13 batch 741: backward pass 79.400 ms, 5.11 s total
[ 2023-10-08 02:11:05 ] Completed Epoch: 13 batch 741: computing loss 212.634 ms, 5.32 s total
EPOCH: [13], BATCH: [741/889], loss: 0.382, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 741
[ 2023-10-08 02:11:06 ] Completed saving temp checkpoint 1,056.608 ms, 6.38 s total
[ 2023-10-08 02:11:06 ] Completed replacing temp checkpoint with checkpoint 75.945 ms, 6.45 s total
[ 2023-10-08 02:11:06 ] Completed Epoch: 13 batch 742: moving batch data to device 7.203 ms, 6.46 s total
[ 2023-10-08 02:11:06 ] Completed Epoch: 13 batch 742: forward pass 111.016 ms, 6.57 s total
[ 2023-10-08 02:11:06 ] Completed Epoch: 13 batch 742: backward pass 80.709 ms, 6.65 s total
[ 2023-10-08 02:11:07 ] Completed Epoch: 13 batch 742: computing loss 134.858 ms, 6.79 s total
EPOCH: [13], BATCH: [742/889], loss: 0.373, loss_box_reg: 0.114, loss_classifier: 0.094, loss_mask: 0.129, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 742
[ 2023-10-08 02:11:08 ] Completed saving temp checkpoint 1,101.089 ms, 7.89 s total
[ 2023-10-08 02:11:08 ] Completed replacing temp checkpoint with checkpoint 81.003 ms, 7.97 s total
[ 2023-10-08 02:11:08 ] Completed Epoch: 13 batch 743: moving batch data to device 39.215 ms, 8.01 s total
[ 2023-10-08 02:11:08 ] Completed Epoch: 13 batch 743: forward pass 113.162 ms, 8.12 s total
[ 2023-10-08 02:11:08 ] Completed Epoch: 13 batch 743: backward pass 79.536 ms, 8.20 s total
[ 2023-10-08 02:11:08 ] Completed Epoch: 13 batch 743: computing loss 205.348 ms, 8.40 s total
EPOCH: [13], BATCH: [743/889], loss: 0.355, loss_box_reg: 0.110, loss_classifier: 0.089, loss_mask: 0.125, loss_objectness: 0.011, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 743
[ 2023-10-08 02:11:10 ] Completed saving temp checkpoint 1,653.818 ms, 10.06 s total
[ 2023-10-08 02:11:10 ] Completed replacing temp checkpoint with checkpoint 60.459 ms, 10.12 s total
[ 2023-10-08 02:11:10 ] Completed Epoch: 13 batch 744: moving batch data to device 2.432 ms, 10.12 s total
[ 2023-10-08 02:11:10 ] Completed Epoch: 13 batch 744: forward pass 180.409 ms, 10.30 s total
[ 2023-10-08 02:11:10 ] Completed Epoch: 13 batch 744: backward pass 70.928 ms, 10.37 s total
[ 2023-10-08 02:11:10 ] Completed Epoch: 13 batch 744: computing loss 231.364 ms, 10.60 s total
EPOCH: [13], BATCH: [744/889], loss: 0.409, loss_box_reg: 0.125, loss_classifier: 0.102, loss_mask: 0.139, loss_objectness: 0.014, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 744
[ 2023-10-08 02:11:12 ] Completed saving temp checkpoint 1,266.988 ms, 11.87 s total
[ 2023-10-08 02:11:12 ] Completed replacing temp checkpoint with checkpoint 69.078 ms, 11.94 s total
[ 2023-10-08 02:11:12 ] Completed Epoch: 13 batch 745: moving batch data to device 5.022 ms, 11.94 s total
[ 2023-10-08 02:11:12 ] Completed Epoch: 13 batch 745: forward pass 105.329 ms, 12.05 s total
[ 2023-10-08 02:11:12 ] Completed Epoch: 13 batch 745: backward pass 73.584 ms, 12.12 s total
[ 2023-10-08 02:11:12 ] Completed Epoch: 13 batch 745: computing loss 129.046 ms, 12.25 s total
EPOCH: [13], BATCH: [745/889], loss: 0.403, loss_box_reg: 0.124, loss_classifier: 0.104, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 745
[ 2023-10-08 02:11:13 ] Completed saving temp checkpoint 984.171 ms, 13.24 s total
[ 2023-10-08 02:11:13 ] Completed replacing temp checkpoint with checkpoint 67.019 ms, 13.30 s total
[ 2023-10-08 02:11:13 ] Completed Epoch: 13 batch 746: moving batch data to device 10.130 ms, 13.31 s total
[ 2023-10-08 02:11:13 ] Completed Epoch: 13 batch 746: forward pass 109.449 ms, 13.42 s total
[ 2023-10-08 02:11:13 ] Completed Epoch: 13 batch 746: backward pass 34.267 ms, 13.46 s total
[ 2023-10-08 02:11:13 ] Completed Epoch: 13 batch 746: computing loss 171.769 ms, 13.63 s total
EPOCH: [13], BATCH: [746/889], loss: 0.402, loss_box_reg: 0.123, loss_classifier: 0.103, loss_mask: 0.138, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 746
[ 2023-10-08 02:11:15 ] Completed saving temp checkpoint 1,160.410 ms, 14.79 s total
[ 2023-10-08 02:11:15 ] Completed replacing temp checkpoint with checkpoint 68.228 ms, 14.86 s total
[ 2023-10-08 02:11:15 ] Completed Epoch: 13 batch 747: moving batch data to device 5.502 ms, 14.86 s total
[ 2023-10-08 02:11:15 ] Completed Epoch: 13 batch 747: forward pass 116.338 ms, 14.98 s total
[ 2023-10-08 02:11:15 ] Completed Epoch: 13 batch 747: backward pass 78.229 ms, 15.06 s total
[ 2023-10-08 02:11:15 ] Completed Epoch: 13 batch 747: computing loss 130.884 ms, 15.19 s total
EPOCH: [13], BATCH: [747/889], loss: 0.391, loss_box_reg: 0.121, loss_classifier: 0.098, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 747
[ 2023-10-08 02:11:16 ] Completed saving temp checkpoint 1,026.293 ms, 16.22 s total
[ 2023-10-08 02:11:16 ] Completed replacing temp checkpoint with checkpoint 76.588 ms, 16.29 s total
[ 2023-10-08 02:11:16 ] Completed Epoch: 13 batch 748: moving batch data to device 9.335 ms, 16.30 s total
[ 2023-10-08 02:11:16 ] Completed Epoch: 13 batch 748: forward pass 105.762 ms, 16.41 s total
[ 2023-10-08 02:11:16 ] Completed Epoch: 13 batch 748: backward pass 74.824 ms, 16.48 s total
[ 2023-10-08 02:11:16 ] Completed Epoch: 13 batch 748: computing loss 118.114 ms, 16.60 s total
EPOCH: [13], BATCH: [748/889], loss: 0.368, loss_box_reg: 0.109, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 748
[ 2023-10-08 02:11:18 ] Completed saving temp checkpoint 1,199.234 ms, 17.80 s total
[ 2023-10-08 02:11:18 ] Completed replacing temp checkpoint with checkpoint 49.044 ms, 17.85 s total
[ 2023-10-08 02:11:18 ] Completed Epoch: 13 batch 749: moving batch data to device 5.344 ms, 17.85 s total
[ 2023-10-08 02:11:18 ] Completed Epoch: 13 batch 749: forward pass 101.536 ms, 17.96 s total
[ 2023-10-08 02:11:18 ] Completed Epoch: 13 batch 749: backward pass 83.114 ms, 18.04 s total
[ 2023-10-08 02:11:18 ] Completed Epoch: 13 batch 749: computing loss 207.099 ms, 18.25 s total
EPOCH: [13], BATCH: [749/889], loss: 0.378, loss_box_reg: 0.113, loss_classifier: 0.095, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 749
[ 2023-10-08 02:11:19 ] Completed saving temp checkpoint 1,061.857 ms, 19.31 s total
[ 2023-10-08 02:11:19 ] Completed replacing temp checkpoint with checkpoint 72.088 ms, 19.38 s total
[ 2023-10-08 02:11:19 ] Completed Epoch: 13 batch 750: moving batch data to device 7.396 ms, 19.39 s total
[ 2023-10-08 02:11:19 ] Completed Epoch: 13 batch 750: forward pass 105.763 ms, 19.49 s total
[ 2023-10-08 02:11:19 ] Completed Epoch: 13 batch 750: backward pass 37.947 ms, 19.53 s total
[ 2023-10-08 02:11:19 ] Completed Epoch: 13 batch 750: computing loss 158.970 ms, 19.69 s total
EPOCH: [13], BATCH: [750/889], loss: 0.394, loss_box_reg: 0.115, loss_classifier: 0.104, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 750
[ 2023-10-08 02:11:21 ] Completed saving temp checkpoint 1,170.520 ms, 20.86 s total
[ 2023-10-08 02:11:21 ] Completed replacing temp checkpoint with checkpoint 65.625 ms, 20.93 s total
[ 2023-10-08 02:11:21 ] Completed Epoch: 13 batch 751: moving batch data to device 11.498 ms, 20.94 s total
[ 2023-10-08 02:11:21 ] Completed Epoch: 13 batch 751: forward pass 104.664 ms, 21.04 s total
[ 2023-10-08 02:11:21 ] Completed Epoch: 13 batch 751: backward pass 78.937 ms, 21.12 s total
[ 2023-10-08 02:11:21 ] Completed Epoch: 13 batch 751: computing loss 112.970 ms, 21.23 s total
EPOCH: [13], BATCH: [751/889], loss: 0.392, loss_box_reg: 0.118, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.018, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 751
[ 2023-10-08 02:11:22 ] Completed saving temp checkpoint 1,065.798 ms, 22.30 s total
[ 2023-10-08 02:11:22 ] Completed replacing temp checkpoint with checkpoint 75.845 ms, 22.38 s total
[ 2023-10-08 02:11:22 ] Completed Epoch: 13 batch 752: moving batch data to device 6.684 ms, 22.38 s total
[ 2023-10-08 02:11:22 ] Completed Epoch: 13 batch 752: forward pass 108.230 ms, 22.49 s total
[ 2023-10-08 02:11:22 ] Completed Epoch: 13 batch 752: backward pass 66.332 ms, 22.56 s total
[ 2023-10-08 02:11:23 ] Completed Epoch: 13 batch 752: computing loss 194.185 ms, 22.75 s total
EPOCH: [13], BATCH: [752/889], loss: 0.381, loss_box_reg: 0.119, loss_classifier: 0.096, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 752
[ 2023-10-08 02:11:24 ] Completed saving temp checkpoint 1,327.478 ms, 24.08 s total
[ 2023-10-08 02:11:24 ] Completed replacing temp checkpoint with checkpoint 86.188 ms, 24.16 s total
[ 2023-10-08 02:11:24 ] Completed Epoch: 13 batch 753: moving batch data to device 6.546 ms, 24.17 s total
[ 2023-10-08 02:11:24 ] Completed Epoch: 13 batch 753: forward pass 111.039 ms, 24.28 s total
[ 2023-10-08 02:11:24 ] Completed Epoch: 13 batch 753: backward pass 73.249 ms, 24.36 s total
[ 2023-10-08 02:11:24 ] Completed Epoch: 13 batch 753: computing loss 112.179 ms, 24.47 s total
EPOCH: [13], BATCH: [753/889], loss: 0.347, loss_box_reg: 0.105, loss_classifier: 0.087, loss_mask: 0.124, loss_objectness: 0.011, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 753
[ 2023-10-08 02:11:26 ] Completed saving temp checkpoint 1,403.432 ms, 25.87 s total
[ 2023-10-08 02:11:26 ] Completed replacing temp checkpoint with checkpoint 66.197 ms, 25.94 s total
[ 2023-10-08 02:11:26 ] Completed Epoch: 13 batch 754: moving batch data to device 8.064 ms, 25.95 s total
[ 2023-10-08 02:11:26 ] Completed Epoch: 13 batch 754: forward pass 105.835 ms, 26.05 s total
[ 2023-10-08 02:11:26 ] Completed Epoch: 13 batch 754: backward pass 81.318 ms, 26.13 s total
[ 2023-10-08 02:11:26 ] Completed Epoch: 13 batch 754: computing loss 107.383 ms, 26.24 s total
EPOCH: [13], BATCH: [754/889], loss: 0.403, loss_box_reg: 0.120, loss_classifier: 0.102, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 754
[ 2023-10-08 02:11:28 ] Completed saving temp checkpoint 1,824.914 ms, 28.06 s total
[ 2023-10-08 02:11:28 ] Completed replacing temp checkpoint with checkpoint 80.522 ms, 28.15 s total
[ 2023-10-08 02:11:28 ] Completed Epoch: 13 batch 755: moving batch data to device 7.614 ms, 28.15 s total
[ 2023-10-08 02:11:28 ] Completed Epoch: 13 batch 755: forward pass 116.591 ms, 28.27 s total
[ 2023-10-08 02:11:28 ] Completed Epoch: 13 batch 755: backward pass 79.625 ms, 28.35 s total
[ 2023-10-08 02:11:28 ] Completed Epoch: 13 batch 755: computing loss 109.431 ms, 28.46 s total
EPOCH: [13], BATCH: [755/889], loss: 0.383, loss_box_reg: 0.115, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 755
[ 2023-10-08 02:11:30 ] Completed saving temp checkpoint 1,428.389 ms, 29.89 s total
[ 2023-10-08 02:11:30 ] Completed replacing temp checkpoint with checkpoint 66.854 ms, 29.95 s total
[ 2023-10-08 02:11:30 ] Completed Epoch: 13 batch 756: moving batch data to device 7.697 ms, 29.96 s total
[ 2023-10-08 02:11:30 ] Completed Epoch: 13 batch 756: forward pass 108.665 ms, 30.07 s total
[ 2023-10-08 02:11:30 ] Completed Epoch: 13 batch 756: backward pass 74.063 ms, 30.14 s total
[ 2023-10-08 02:11:30 ] Completed Epoch: 13 batch 756: computing loss 185.797 ms, 30.33 s total
EPOCH: [13], BATCH: [756/889], loss: 0.365, loss_box_reg: 0.110, loss_classifier: 0.091, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 756
[ 2023-10-08 02:11:32 ] Completed saving temp checkpoint 1,588.206 ms, 31.92 s total
[ 2023-10-08 02:11:32 ] Completed replacing temp checkpoint with checkpoint 70.649 ms, 31.99 s total
[ 2023-10-08 02:11:32 ] Completed Epoch: 13 batch 757: moving batch data to device 8.059 ms, 32.00 s total
[ 2023-10-08 02:11:32 ] Completed Epoch: 13 batch 757: forward pass 106.154 ms, 32.10 s total
[ 2023-10-08 02:11:32 ] Completed Epoch: 13 batch 757: backward pass 42.606 ms, 32.15 s total
[ 2023-10-08 02:11:32 ] Completed Epoch: 13 batch 757: computing loss 148.375 ms, 32.29 s total
EPOCH: [13], BATCH: [757/889], loss: 0.372, loss_box_reg: 0.113, loss_classifier: 0.091, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 757
[ 2023-10-08 02:11:33 ] Completed saving temp checkpoint 1,020.062 ms, 33.31 s total
[ 2023-10-08 02:11:33 ] Completed replacing temp checkpoint with checkpoint 71.167 ms, 33.39 s total
[ 2023-10-08 02:11:33 ] Completed Epoch: 13 batch 758: moving batch data to device 7.094 ms, 33.39 s total
[ 2023-10-08 02:11:33 ] Completed Epoch: 13 batch 758: forward pass 104.734 ms, 33.50 s total
[ 2023-10-08 02:11:33 ] Completed Epoch: 13 batch 758: backward pass 75.829 ms, 33.57 s total
[ 2023-10-08 02:11:33 ] Completed Epoch: 13 batch 758: computing loss 115.193 ms, 33.69 s total
EPOCH: [13], BATCH: [758/889], loss: 0.361, loss_box_reg: 0.111, loss_classifier: 0.094, loss_mask: 0.122, loss_objectness: 0.012, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 758
[ 2023-10-08 02:11:35 ] Completed saving temp checkpoint 1,439.964 ms, 35.13 s total
[ 2023-10-08 02:11:35 ] Completed replacing temp checkpoint with checkpoint 86.899 ms, 35.21 s total
[ 2023-10-08 02:11:35 ] Completed Epoch: 13 batch 759: moving batch data to device 8.843 ms, 35.22 s total
[ 2023-10-08 02:11:35 ] Completed Epoch: 13 batch 759: forward pass 103.842 ms, 35.33 s total
[ 2023-10-08 02:11:35 ] Completed Epoch: 13 batch 759: backward pass 81.327 ms, 35.41 s total
[ 2023-10-08 02:11:35 ] Completed Epoch: 13 batch 759: computing loss 115.118 ms, 35.52 s total
EPOCH: [13], BATCH: [759/889], loss: 0.413, loss_box_reg: 0.120, loss_classifier: 0.106, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 759
[ 2023-10-08 02:11:36 ] Completed saving temp checkpoint 1,078.101 ms, 36.60 s total
[ 2023-10-08 02:11:36 ] Completed replacing temp checkpoint with checkpoint 65.840 ms, 36.67 s total
[ 2023-10-08 02:11:36 ] Completed Epoch: 13 batch 760: moving batch data to device 7.645 ms, 36.68 s total
[ 2023-10-08 02:11:37 ] Completed Epoch: 13 batch 760: forward pass 103.870 ms, 36.78 s total
[ 2023-10-08 02:11:37 ] Completed Epoch: 13 batch 760: backward pass 47.543 ms, 36.83 s total
[ 2023-10-08 02:11:37 ] Completed Epoch: 13 batch 760: computing loss 154.569 ms, 36.98 s total
EPOCH: [13], BATCH: [760/889], loss: 0.381, loss_box_reg: 0.110, loss_classifier: 0.100, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 760
[ 2023-10-08 02:11:38 ] Completed saving temp checkpoint 1,563.584 ms, 38.55 s total
[ 2023-10-08 02:11:38 ] Completed replacing temp checkpoint with checkpoint 97.111 ms, 38.64 s total
[ 2023-10-08 02:11:38 ] Completed Epoch: 13 batch 761: moving batch data to device 9.644 ms, 38.65 s total
[ 2023-10-08 02:11:39 ] Completed Epoch: 13 batch 761: forward pass 104.669 ms, 38.76 s total
[ 2023-10-08 02:11:39 ] Completed Epoch: 13 batch 761: backward pass 74.425 ms, 38.83 s total
[ 2023-10-08 02:11:39 ] Completed Epoch: 13 batch 761: computing loss 117.306 ms, 38.95 s total
EPOCH: [13], BATCH: [761/889], loss: 0.410, loss_box_reg: 0.124, loss_classifier: 0.100, loss_mask: 0.134, loss_objectness: 0.019, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 761
[ 2023-10-08 02:11:40 ] Completed saving temp checkpoint 1,276.176 ms, 40.22 s total
[ 2023-10-08 02:11:40 ] Completed replacing temp checkpoint with checkpoint 82.192 ms, 40.31 s total
[ 2023-10-08 02:11:40 ] Completed Epoch: 13 batch 762: moving batch data to device 6.995 ms, 40.31 s total
[ 2023-10-08 02:11:40 ] Completed Epoch: 13 batch 762: forward pass 106.594 ms, 40.42 s total
[ 2023-10-08 02:11:40 ] Completed Epoch: 13 batch 762: backward pass 51.376 ms, 40.47 s total
[ 2023-10-08 02:11:40 ] Completed Epoch: 13 batch 762: computing loss 137.695 ms, 40.61 s total
EPOCH: [13], BATCH: [762/889], loss: 0.431, loss_box_reg: 0.131, loss_classifier: 0.109, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.039
Saving checkpoint at epoch 13 train batch 762
[ 2023-10-08 02:11:42 ] Completed saving temp checkpoint 1,444.797 ms, 42.05 s total
[ 2023-10-08 02:11:42 ] Completed replacing temp checkpoint with checkpoint 54.045 ms, 42.11 s total
[ 2023-10-08 02:11:42 ] Completed Epoch: 13 batch 763: moving batch data to device 4.654 ms, 42.11 s total
[ 2023-10-08 02:11:42 ] Completed Epoch: 13 batch 763: forward pass 101.952 ms, 42.21 s total
[ 2023-10-08 02:11:42 ] Completed Epoch: 13 batch 763: backward pass 49.850 ms, 42.26 s total
[ 2023-10-08 02:11:42 ] Completed Epoch: 13 batch 763: computing loss 141.349 ms, 42.41 s total
EPOCH: [13], BATCH: [763/889], loss: 0.387, loss_box_reg: 0.121, loss_classifier: 0.094, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 763
[ 2023-10-08 02:11:43 ] Completed saving temp checkpoint 1,294.020 ms, 43.70 s total
[ 2023-10-08 02:11:43 ] Completed replacing temp checkpoint with checkpoint 45.333 ms, 43.75 s total
[ 2023-10-08 02:11:44 ] Completed Epoch: 13 batch 764: moving batch data to device 5.280 ms, 43.75 s total
[ 2023-10-08 02:11:44 ] Completed Epoch: 13 batch 764: forward pass 105.047 ms, 43.86 s total
[ 2023-10-08 02:11:44 ] Completed Epoch: 13 batch 764: backward pass 73.262 ms, 43.93 s total
[ 2023-10-08 02:11:44 ] Completed Epoch: 13 batch 764: computing loss 118.395 ms, 44.05 s total
EPOCH: [13], BATCH: [764/889], loss: 0.362, loss_box_reg: 0.112, loss_classifier: 0.097, loss_mask: 0.121, loss_objectness: 0.012, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 764
[ 2023-10-08 02:11:45 ] Completed saving temp checkpoint 1,458.861 ms, 45.51 s total
[ 2023-10-08 02:11:45 ] Completed replacing temp checkpoint with checkpoint 102.191 ms, 45.61 s total
[ 2023-10-08 02:11:45 ] Completed Epoch: 13 batch 765: moving batch data to device 7.723 ms, 45.62 s total
[ 2023-10-08 02:11:45 ] Completed Epoch: 13 batch 765: forward pass 104.582 ms, 45.72 s total
[ 2023-10-08 02:11:46 ] Completed Epoch: 13 batch 765: backward pass 71.119 ms, 45.79 s total
[ 2023-10-08 02:11:46 ] Completed Epoch: 13 batch 765: computing loss 123.602 ms, 45.92 s total
EPOCH: [13], BATCH: [765/889], loss: 0.366, loss_box_reg: 0.106, loss_classifier: 0.089, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 765
[ 2023-10-08 02:11:47 ] Completed saving temp checkpoint 1,413.756 ms, 47.33 s total
[ 2023-10-08 02:11:47 ] Completed replacing temp checkpoint with checkpoint 91.981 ms, 47.42 s total
[ 2023-10-08 02:11:47 ] Completed Epoch: 13 batch 766: moving batch data to device 7.573 ms, 47.43 s total
[ 2023-10-08 02:11:47 ] Completed Epoch: 13 batch 766: forward pass 104.198 ms, 47.53 s total
[ 2023-10-08 02:11:47 ] Completed Epoch: 13 batch 766: backward pass 75.280 ms, 47.61 s total
[ 2023-10-08 02:11:47 ] Completed Epoch: 13 batch 766: computing loss 92.962 ms, 47.70 s total
EPOCH: [13], BATCH: [766/889], loss: 0.363, loss_box_reg: 0.107, loss_classifier: 0.089, loss_mask: 0.122, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 766
[ 2023-10-08 02:11:50 ] Completed saving temp checkpoint 2,055.116 ms, 49.76 s total
[ 2023-10-08 02:11:50 ] Completed replacing temp checkpoint with checkpoint 104.747 ms, 49.86 s total
[ 2023-10-08 02:11:50 ] Completed Epoch: 13 batch 767: moving batch data to device 7.231 ms, 49.87 s total
[ 2023-10-08 02:11:50 ] Completed Epoch: 13 batch 767: forward pass 101.711 ms, 49.97 s total
[ 2023-10-08 02:11:50 ] Completed Epoch: 13 batch 767: backward pass 43.830 ms, 50.01 s total
[ 2023-10-08 02:11:50 ] Completed Epoch: 13 batch 767: computing loss 142.833 ms, 50.16 s total
EPOCH: [13], BATCH: [767/889], loss: 0.398, loss_box_reg: 0.127, loss_classifier: 0.097, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 767
[ 2023-10-08 02:11:51 ] Completed saving temp checkpoint 1,293.653 ms, 51.45 s total
[ 2023-10-08 02:11:51 ] Completed replacing temp checkpoint with checkpoint 89.140 ms, 51.54 s total
[ 2023-10-08 02:11:51 ] Completed Epoch: 13 batch 768: moving batch data to device 9.564 ms, 51.55 s total
[ 2023-10-08 02:11:51 ] Completed Epoch: 13 batch 768: forward pass 102.134 ms, 51.65 s total
[ 2023-10-08 02:11:51 ] Completed Epoch: 13 batch 768: backward pass 73.674 ms, 51.72 s total
[ 2023-10-08 02:11:52 ] Completed Epoch: 13 batch 768: computing loss 216.438 ms, 51.94 s total
EPOCH: [13], BATCH: [768/889], loss: 0.410, loss_box_reg: 0.122, loss_classifier: 0.100, loss_mask: 0.143, loss_objectness: 0.018, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 768
[ 2023-10-08 02:11:53 ] Completed saving temp checkpoint 1,666.371 ms, 53.61 s total
[ 2023-10-08 02:11:53 ] Completed replacing temp checkpoint with checkpoint 93.203 ms, 53.70 s total
[ 2023-10-08 02:11:53 ] Completed Epoch: 13 batch 769: moving batch data to device 7.233 ms, 53.71 s total
[ 2023-10-08 02:11:54 ] Completed Epoch: 13 batch 769: forward pass 104.423 ms, 53.81 s total
[ 2023-10-08 02:11:54 ] Completed Epoch: 13 batch 769: backward pass 35.036 ms, 53.85 s total
[ 2023-10-08 02:11:54 ] Completed Epoch: 13 batch 769: computing loss 157.190 ms, 54.00 s total
EPOCH: [13], BATCH: [769/889], loss: 0.364, loss_box_reg: 0.108, loss_classifier: 0.091, loss_mask: 0.124, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 769
[ 2023-10-08 02:11:55 ] Completed saving temp checkpoint 1,262.791 ms, 55.27 s total
[ 2023-10-08 02:11:55 ] Completed replacing temp checkpoint with checkpoint 84.186 ms, 55.35 s total
[ 2023-10-08 02:11:55 ] Completed Epoch: 13 batch 770: moving batch data to device 8.123 ms, 55.36 s total
[ 2023-10-08 02:11:55 ] Completed Epoch: 13 batch 770: forward pass 103.825 ms, 55.46 s total
[ 2023-10-08 02:11:55 ] Completed Epoch: 13 batch 770: backward pass 43.409 ms, 55.51 s total
[ 2023-10-08 02:11:55 ] Completed Epoch: 13 batch 770: computing loss 139.025 ms, 55.65 s total
EPOCH: [13], BATCH: [770/889], loss: 0.373, loss_box_reg: 0.110, loss_classifier: 0.100, loss_mask: 0.123, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 770
[ 2023-10-08 02:11:57 ] Completed saving temp checkpoint 1,332.848 ms, 56.98 s total
[ 2023-10-08 02:11:57 ] Completed replacing temp checkpoint with checkpoint 72.744 ms, 57.05 s total
[ 2023-10-08 02:11:57 ] Completed Epoch: 13 batch 771: moving batch data to device 7.265 ms, 57.06 s total
[ 2023-10-08 02:11:57 ] Completed Epoch: 13 batch 771: forward pass 111.886 ms, 57.17 s total
[ 2023-10-08 02:11:57 ] Completed Epoch: 13 batch 771: backward pass 39.940 ms, 57.21 s total
[ 2023-10-08 02:11:57 ] Completed Epoch: 13 batch 771: computing loss 153.842 ms, 57.36 s total
EPOCH: [13], BATCH: [771/889], loss: 0.428, loss_box_reg: 0.135, loss_classifier: 0.112, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 771
[ 2023-10-08 02:11:58 ] Completed saving temp checkpoint 1,205.849 ms, 58.57 s total
[ 2023-10-08 02:11:58 ] Completed replacing temp checkpoint with checkpoint 62.969 ms, 58.63 s total
[ 2023-10-08 02:11:58 ] Completed Epoch: 13 batch 772: moving batch data to device 4.590 ms, 58.64 s total
[ 2023-10-08 02:11:58 ] Completed Epoch: 13 batch 772: forward pass 106.324 ms, 58.74 s total
[ 2023-10-08 02:11:59 ] Completed Epoch: 13 batch 772: backward pass 67.030 ms, 58.81 s total
[ 2023-10-08 02:11:59 ] Completed Epoch: 13 batch 772: computing loss 102.838 ms, 58.91 s total
EPOCH: [13], BATCH: [772/889], loss: 0.362, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 772
[ 2023-10-08 02:12:00 ] Completed saving temp checkpoint 1,315.294 ms, 60.23 s total
[ 2023-10-08 02:12:00 ] Completed replacing temp checkpoint with checkpoint 82.634 ms, 60.31 s total
[ 2023-10-08 02:12:00 ] Completed Epoch: 13 batch 773: moving batch data to device 8.534 ms, 60.32 s total
[ 2023-10-08 02:12:00 ] Completed Epoch: 13 batch 773: forward pass 108.068 ms, 60.43 s total
[ 2023-10-08 02:12:00 ] Completed Epoch: 13 batch 773: backward pass 50.130 ms, 60.48 s total
[ 2023-10-08 02:12:00 ] Completed Epoch: 13 batch 773: computing loss 119.269 ms, 60.60 s total
EPOCH: [13], BATCH: [773/889], loss: 0.397, loss_box_reg: 0.122, loss_classifier: 0.098, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 773
[ 2023-10-08 02:12:02 ] Completed saving temp checkpoint 1,195.751 ms, 61.79 s total
[ 2023-10-08 02:12:02 ] Completed replacing temp checkpoint with checkpoint 56.804 ms, 61.85 s total
[ 2023-10-08 02:12:02 ] Completed Epoch: 13 batch 774: moving batch data to device 9.166 ms, 61.86 s total
[ 2023-10-08 02:12:02 ] Completed Epoch: 13 batch 774: forward pass 110.497 ms, 61.97 s total
[ 2023-10-08 02:12:02 ] Completed Epoch: 13 batch 774: backward pass 78.826 ms, 62.05 s total
[ 2023-10-08 02:12:02 ] Completed Epoch: 13 batch 774: computing loss 113.015 ms, 62.16 s total
EPOCH: [13], BATCH: [774/889], loss: 0.377, loss_box_reg: 0.109, loss_classifier: 0.096, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 774
[ 2023-10-08 02:12:03 ] Completed saving temp checkpoint 1,295.906 ms, 63.46 s total
[ 2023-10-08 02:12:03 ] Completed replacing temp checkpoint with checkpoint 89.839 ms, 63.55 s total
[ 2023-10-08 02:12:03 ] Completed Epoch: 13 batch 775: moving batch data to device 8.607 ms, 63.56 s total
[ 2023-10-08 02:12:03 ] Completed Epoch: 13 batch 775: forward pass 107.256 ms, 63.66 s total
[ 2023-10-08 02:12:03 ] Completed Epoch: 13 batch 775: backward pass 58.819 ms, 63.72 s total
[ 2023-10-08 02:12:04 ] Completed Epoch: 13 batch 775: computing loss 128.359 ms, 63.85 s total
EPOCH: [13], BATCH: [775/889], loss: 0.416, loss_box_reg: 0.129, loss_classifier: 0.107, loss_mask: 0.135, loss_objectness: 0.017, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 775
[ 2023-10-08 02:12:05 ] Completed saving temp checkpoint 1,161.136 ms, 65.01 s total
[ 2023-10-08 02:12:05 ] Completed replacing temp checkpoint with checkpoint 71.325 ms, 65.08 s total
[ 2023-10-08 02:12:05 ] Completed Epoch: 13 batch 776: moving batch data to device 8.555 ms, 65.09 s total
[ 2023-10-08 02:12:05 ] Completed Epoch: 13 batch 776: forward pass 107.145 ms, 65.20 s total
[ 2023-10-08 02:12:05 ] Completed Epoch: 13 batch 776: backward pass 75.477 ms, 65.27 s total
[ 2023-10-08 02:12:05 ] Completed Epoch: 13 batch 776: computing loss 111.426 ms, 65.39 s total
EPOCH: [13], BATCH: [776/889], loss: 0.374, loss_box_reg: 0.111, loss_classifier: 0.094, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 776
[ 2023-10-08 02:12:06 ] Completed saving temp checkpoint 1,275.099 ms, 66.66 s total
[ 2023-10-08 02:12:06 ] Completed replacing temp checkpoint with checkpoint 73.391 ms, 66.73 s total
[ 2023-10-08 02:12:06 ] Completed Epoch: 13 batch 777: moving batch data to device 7.041 ms, 66.74 s total
[ 2023-10-08 02:12:07 ] Completed Epoch: 13 batch 777: forward pass 109.685 ms, 66.85 s total
[ 2023-10-08 02:12:07 ] Completed Epoch: 13 batch 777: backward pass 80.707 ms, 66.93 s total
[ 2023-10-08 02:12:07 ] Completed Epoch: 13 batch 777: computing loss 118.612 ms, 67.05 s total
EPOCH: [13], BATCH: [777/889], loss: 0.415, loss_box_reg: 0.129, loss_classifier: 0.109, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 777
[ 2023-10-08 02:12:08 ] Completed saving temp checkpoint 1,158.376 ms, 68.21 s total
[ 2023-10-08 02:12:08 ] Completed replacing temp checkpoint with checkpoint 81.735 ms, 68.29 s total
[ 2023-10-08 02:12:08 ] Completed Epoch: 13 batch 778: moving batch data to device 7.978 ms, 68.30 s total
[ 2023-10-08 02:12:08 ] Completed Epoch: 13 batch 778: forward pass 101.564 ms, 68.40 s total
[ 2023-10-08 02:12:08 ] Completed Epoch: 13 batch 778: backward pass 75.771 ms, 68.48 s total
[ 2023-10-08 02:12:08 ] Completed Epoch: 13 batch 778: computing loss 118.249 ms, 68.59 s total
EPOCH: [13], BATCH: [778/889], loss: 0.375, loss_box_reg: 0.112, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 778
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 02:25:25 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:25:25 ] Completed importing Timer 0.019 ms, 0.00 s total
[ 2023-10-08 02:25:25 ] Completed importing everything else 592.101 ms, 0.59 s total
[ 2023-10-08 02:25:25 ] Completed defined other functions 0.032 ms, 0.59 s total
| distributed init (rank 5): env://
| distributed init (rank 4): env://
| distributed init (rank 3): env://
| distributed init (rank 0): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 02:25:28 ] Completed main preliminaries 2,685.988 ms, 3.28 s total
loading annotations into memory...
Done (t=10.36s)
creating index...
index created!
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
[ 2023-10-08 02:25:40 ] Completed loading data 12,072.906 ms, 15.35 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 02:25:40 ] Completed creating data samplers 92.705 ms, 15.44 s total
[ 2023-10-08 02:25:40 ] Completed creating data loaders 0.197 ms, 15.44 s total
[ 2023-10-08 02:25:41 ] Completed creating model and .to(device) 668.984 ms, 16.11 s total
[ 2023-10-08 02:25:43 ] Completed preparing model for distributed training 2,134.598 ms, 18.25 s total
[ 2023-10-08 02:25:43 ] Completed optimizer and scaler 0.631 ms, 18.25 s total
[ 2023-10-08 02:25:43 ] Completed learning rate schedulers 0.233 ms, 18.25 s total
[ 2023-10-08 02:25:44 ] Completed init coco evaluator 942.040 ms, 19.19 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 02:25:45 ] Completed retrieving checkpoint 956.142 ms, 20.15 s total
EPOCH :: 13
[ 2023-10-08 02:25:45 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:25:45 ] Completed training preliminaries 1.030 ms, 0.00 s total
Training / resuming epoch 13 from training step 778
[ 2023-10-08 02:25:46 ] Completed Epoch: 13 batch 778: moving batch data to device 613.527 ms, 0.61 s total
[ 2023-10-08 02:25:46 ] Completed Epoch: 13 batch 778: forward pass 976.563 ms, 1.59 s total
[ 2023-10-08 02:25:47 ] Completed Epoch: 13 batch 778: backward pass 179.536 ms, 1.77 s total
[ 2023-10-08 02:25:47 ] Completed Epoch: 13 batch 778: computing loss 171.064 ms, 1.94 s total
EPOCH: [13], BATCH: [778/889], loss: 0.373, loss_box_reg: 0.111, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 778
[ 2023-10-08 02:25:48 ] Completed saving temp checkpoint 1,315.139 ms, 3.26 s total
[ 2023-10-08 02:25:48 ] Completed replacing temp checkpoint with checkpoint 153.297 ms, 3.41 s total
[ 2023-10-08 02:25:48 ] Completed Epoch: 13 batch 779: moving batch data to device 4.381 ms, 3.41 s total
[ 2023-10-08 02:25:49 ] Completed Epoch: 13 batch 779: forward pass 213.054 ms, 3.63 s total
[ 2023-10-08 02:25:49 ] Completed Epoch: 13 batch 779: backward pass 106.319 ms, 3.73 s total
[ 2023-10-08 02:25:49 ] Completed Epoch: 13 batch 779: computing loss 132.626 ms, 3.87 s total
EPOCH: [13], BATCH: [779/889], loss: 0.368, loss_box_reg: 0.109, loss_classifier: 0.091, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 779
[ 2023-10-08 02:25:50 ] Completed saving temp checkpoint 1,561.580 ms, 5.43 s total
[ 2023-10-08 02:25:50 ] Completed replacing temp checkpoint with checkpoint 70.778 ms, 5.50 s total
[ 2023-10-08 02:25:50 ] Completed Epoch: 13 batch 780: moving batch data to device 3.737 ms, 5.50 s total
[ 2023-10-08 02:25:51 ] Completed Epoch: 13 batch 780: forward pass 111.343 ms, 5.61 s total
[ 2023-10-08 02:25:51 ] Completed Epoch: 13 batch 780: backward pass 71.289 ms, 5.69 s total
[ 2023-10-08 02:25:51 ] Completed Epoch: 13 batch 780: computing loss 144.886 ms, 5.83 s total
EPOCH: [13], BATCH: [780/889], loss: 0.423, loss_box_reg: 0.129, loss_classifier: 0.112, loss_mask: 0.135, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 780
[ 2023-10-08 02:25:52 ] Completed saving temp checkpoint 1,590.175 ms, 7.42 s total
[ 2023-10-08 02:25:52 ] Completed replacing temp checkpoint with checkpoint 52.320 ms, 7.47 s total
[ 2023-10-08 02:25:52 ] Completed Epoch: 13 batch 781: moving batch data to device 3.761 ms, 7.48 s total
[ 2023-10-08 02:25:52 ] Completed Epoch: 13 batch 781: forward pass 107.998 ms, 7.58 s total
[ 2023-10-08 02:25:53 ] Completed Epoch: 13 batch 781: backward pass 81.426 ms, 7.67 s total
[ 2023-10-08 02:25:53 ] Completed Epoch: 13 batch 781: computing loss 134.976 ms, 7.80 s total
EPOCH: [13], BATCH: [781/889], loss: 0.384, loss_box_reg: 0.116, loss_classifier: 0.103, loss_mask: 0.128, loss_objectness: 0.017, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 781
[ 2023-10-08 02:25:54 ] Completed saving temp checkpoint 1,197.258 ms, 9.00 s total
[ 2023-10-08 02:25:54 ] Completed replacing temp checkpoint with checkpoint 83.386 ms, 9.08 s total
[ 2023-10-08 02:25:54 ] Completed Epoch: 13 batch 782: moving batch data to device 10.285 ms, 9.09 s total
[ 2023-10-08 02:25:54 ] Completed Epoch: 13 batch 782: forward pass 109.694 ms, 9.20 s total
[ 2023-10-08 02:25:54 ] Completed Epoch: 13 batch 782: backward pass 77.309 ms, 9.28 s total
[ 2023-10-08 02:25:54 ] Completed Epoch: 13 batch 782: computing loss 133.509 ms, 9.41 s total
EPOCH: [13], BATCH: [782/889], loss: 0.394, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 782
[ 2023-10-08 02:25:55 ] Completed saving temp checkpoint 973.232 ms, 10.39 s total
[ 2023-10-08 02:25:55 ] Completed replacing temp checkpoint with checkpoint 49.549 ms, 10.44 s total
[ 2023-10-08 02:25:55 ] Completed Epoch: 13 batch 783: moving batch data to device 4.255 ms, 10.44 s total
[ 2023-10-08 02:25:56 ] Completed Epoch: 13 batch 783: forward pass 180.644 ms, 10.62 s total
[ 2023-10-08 02:25:56 ] Completed Epoch: 13 batch 783: backward pass 78.696 ms, 10.70 s total
[ 2023-10-08 02:25:56 ] Completed Epoch: 13 batch 783: computing loss 127.323 ms, 10.83 s total
EPOCH: [13], BATCH: [783/889], loss: 0.387, loss_box_reg: 0.117, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 783
[ 2023-10-08 02:25:57 ] Completed saving temp checkpoint 1,153.341 ms, 11.98 s total
[ 2023-10-08 02:25:57 ] Completed replacing temp checkpoint with checkpoint 67.041 ms, 12.05 s total
[ 2023-10-08 02:25:57 ] Completed Epoch: 13 batch 784: moving batch data to device 2.320 ms, 12.05 s total
[ 2023-10-08 02:25:57 ] Completed Epoch: 13 batch 784: forward pass 107.866 ms, 12.16 s total
[ 2023-10-08 02:25:57 ] Completed Epoch: 13 batch 784: backward pass 82.200 ms, 12.24 s total
[ 2023-10-08 02:25:57 ] Completed Epoch: 13 batch 784: computing loss 130.671 ms, 12.37 s total
EPOCH: [13], BATCH: [784/889], loss: 0.386, loss_box_reg: 0.121, loss_classifier: 0.095, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 784
[ 2023-10-08 02:25:58 ] Completed saving temp checkpoint 1,003.810 ms, 13.37 s total
[ 2023-10-08 02:25:58 ] Completed replacing temp checkpoint with checkpoint 41.369 ms, 13.41 s total
[ 2023-10-08 02:25:58 ] Completed Epoch: 13 batch 785: moving batch data to device 5.031 ms, 13.42 s total
[ 2023-10-08 02:25:58 ] Completed Epoch: 13 batch 785: forward pass 161.500 ms, 13.58 s total
[ 2023-10-08 02:25:59 ] Completed Epoch: 13 batch 785: backward pass 69.364 ms, 13.65 s total
[ 2023-10-08 02:25:59 ] Completed Epoch: 13 batch 785: computing loss 122.374 ms, 13.77 s total
EPOCH: [13], BATCH: [785/889], loss: 0.388, loss_box_reg: 0.116, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 785
[ 2023-10-08 02:26:00 ] Completed saving temp checkpoint 1,142.941 ms, 14.92 s total
[ 2023-10-08 02:26:00 ] Completed replacing temp checkpoint with checkpoint 78.098 ms, 14.99 s total
[ 2023-10-08 02:26:00 ] Completed Epoch: 13 batch 786: moving batch data to device 6.438 ms, 15.00 s total
[ 2023-10-08 02:26:00 ] Completed Epoch: 13 batch 786: forward pass 109.226 ms, 15.11 s total
[ 2023-10-08 02:26:00 ] Completed Epoch: 13 batch 786: backward pass 92.472 ms, 15.20 s total
[ 2023-10-08 02:26:00 ] Completed Epoch: 13 batch 786: computing loss 108.613 ms, 15.31 s total
EPOCH: [13], BATCH: [786/889], loss: 0.406, loss_box_reg: 0.132, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 786
[ 2023-10-08 02:26:01 ] Completed saving temp checkpoint 1,018.567 ms, 16.33 s total
[ 2023-10-08 02:26:01 ] Completed replacing temp checkpoint with checkpoint 46.951 ms, 16.38 s total
[ 2023-10-08 02:26:01 ] Completed Epoch: 13 batch 787: moving batch data to device 8.672 ms, 16.38 s total
[ 2023-10-08 02:26:01 ] Completed Epoch: 13 batch 787: forward pass 107.121 ms, 16.49 s total
[ 2023-10-08 02:26:01 ] Completed Epoch: 13 batch 787: backward pass 80.870 ms, 16.57 s total
[ 2023-10-08 02:26:02 ] Completed Epoch: 13 batch 787: computing loss 116.795 ms, 16.69 s total
EPOCH: [13], BATCH: [787/889], loss: 0.380, loss_box_reg: 0.113, loss_classifier: 0.092, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 787
[ 2023-10-08 02:26:03 ] Completed saving temp checkpoint 1,204.313 ms, 17.89 s total
[ 2023-10-08 02:26:03 ] Completed replacing temp checkpoint with checkpoint 86.591 ms, 17.98 s total
[ 2023-10-08 02:26:03 ] Completed Epoch: 13 batch 788: moving batch data to device 6.830 ms, 17.99 s total
[ 2023-10-08 02:26:03 ] Completed Epoch: 13 batch 788: forward pass 109.409 ms, 18.10 s total
[ 2023-10-08 02:26:03 ] Completed Epoch: 13 batch 788: backward pass 79.059 ms, 18.18 s total
[ 2023-10-08 02:26:03 ] Completed Epoch: 13 batch 788: computing loss 162.441 ms, 18.34 s total
EPOCH: [13], BATCH: [788/889], loss: 0.368, loss_box_reg: 0.107, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.018, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 788
[ 2023-10-08 02:26:04 ] Completed saving temp checkpoint 904.875 ms, 19.24 s total
[ 2023-10-08 02:26:04 ] Completed replacing temp checkpoint with checkpoint 60.148 ms, 19.30 s total
[ 2023-10-08 02:26:04 ] Completed Epoch: 13 batch 789: moving batch data to device 6.771 ms, 19.31 s total
[ 2023-10-08 02:26:04 ] Completed Epoch: 13 batch 789: forward pass 104.449 ms, 19.41 s total
[ 2023-10-08 02:26:04 ] Completed Epoch: 13 batch 789: backward pass 72.062 ms, 19.49 s total
[ 2023-10-08 02:26:05 ] Completed Epoch: 13 batch 789: computing loss 119.655 ms, 19.61 s total
EPOCH: [13], BATCH: [789/889], loss: 0.342, loss_box_reg: 0.104, loss_classifier: 0.082, loss_mask: 0.124, loss_objectness: 0.012, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 789
[ 2023-10-08 02:26:06 ] Completed saving temp checkpoint 1,030.929 ms, 20.64 s total
[ 2023-10-08 02:26:06 ] Completed replacing temp checkpoint with checkpoint 78.208 ms, 20.72 s total
[ 2023-10-08 02:26:06 ] Completed Epoch: 13 batch 790: moving batch data to device 7.274 ms, 20.72 s total
[ 2023-10-08 02:26:06 ] Completed Epoch: 13 batch 790: forward pass 109.990 ms, 20.83 s total
[ 2023-10-08 02:26:06 ] Completed Epoch: 13 batch 790: backward pass 77.281 ms, 20.91 s total
[ 2023-10-08 02:26:06 ] Completed Epoch: 13 batch 790: computing loss 113.462 ms, 21.02 s total
EPOCH: [13], BATCH: [790/889], loss: 0.417, loss_box_reg: 0.127, loss_classifier: 0.105, loss_mask: 0.136, loss_objectness: 0.018, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 790
[ 2023-10-08 02:26:07 ] Completed saving temp checkpoint 1,018.498 ms, 22.04 s total
[ 2023-10-08 02:26:07 ] Completed replacing temp checkpoint with checkpoint 75.502 ms, 22.12 s total
[ 2023-10-08 02:26:07 ] Completed Epoch: 13 batch 791: moving batch data to device 6.926 ms, 22.12 s total
[ 2023-10-08 02:26:07 ] Completed Epoch: 13 batch 791: forward pass 110.271 ms, 22.23 s total
[ 2023-10-08 02:26:07 ] Completed Epoch: 13 batch 791: backward pass 75.043 ms, 22.31 s total
[ 2023-10-08 02:26:07 ] Completed Epoch: 13 batch 791: computing loss 122.963 ms, 22.43 s total
EPOCH: [13], BATCH: [791/889], loss: 0.396, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 791
[ 2023-10-08 02:26:09 ] Completed saving temp checkpoint 1,586.945 ms, 24.02 s total
[ 2023-10-08 02:26:09 ] Completed replacing temp checkpoint with checkpoint 90.792 ms, 24.11 s total
[ 2023-10-08 02:26:09 ] Completed Epoch: 13 batch 792: moving batch data to device 7.698 ms, 24.12 s total
[ 2023-10-08 02:26:09 ] Completed Epoch: 13 batch 792: forward pass 107.002 ms, 24.22 s total
[ 2023-10-08 02:26:09 ] Completed Epoch: 13 batch 792: backward pass 79.864 ms, 24.30 s total
[ 2023-10-08 02:26:09 ] Completed Epoch: 13 batch 792: computing loss 119.083 ms, 24.42 s total
EPOCH: [13], BATCH: [792/889], loss: 0.374, loss_box_reg: 0.115, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 792
[ 2023-10-08 02:26:11 ] Completed saving temp checkpoint 1,402.829 ms, 25.83 s total
[ 2023-10-08 02:26:11 ] Completed replacing temp checkpoint with checkpoint 60.979 ms, 25.89 s total
[ 2023-10-08 02:26:11 ] Completed Epoch: 13 batch 793: moving batch data to device 4.929 ms, 25.89 s total
[ 2023-10-08 02:26:11 ] Completed Epoch: 13 batch 793: forward pass 109.623 ms, 26.00 s total
[ 2023-10-08 02:26:11 ] Completed Epoch: 13 batch 793: backward pass 69.783 ms, 26.07 s total
[ 2023-10-08 02:26:11 ] Completed Epoch: 13 batch 793: computing loss 128.275 ms, 26.20 s total
EPOCH: [13], BATCH: [793/889], loss: 0.392, loss_box_reg: 0.120, loss_classifier: 0.100, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 793
[ 2023-10-08 02:26:13 ] Completed saving temp checkpoint 1,805.598 ms, 28.01 s total
[ 2023-10-08 02:26:13 ] Completed replacing temp checkpoint with checkpoint 72.384 ms, 28.08 s total
[ 2023-10-08 02:26:13 ] Completed Epoch: 13 batch 794: moving batch data to device 4.696 ms, 28.08 s total
[ 2023-10-08 02:26:13 ] Completed Epoch: 13 batch 794: forward pass 105.159 ms, 28.19 s total
[ 2023-10-08 02:26:13 ] Completed Epoch: 13 batch 794: backward pass 69.920 ms, 28.26 s total
[ 2023-10-08 02:26:13 ] Completed Epoch: 13 batch 794: computing loss 130.483 ms, 28.39 s total
EPOCH: [13], BATCH: [794/889], loss: 0.388, loss_box_reg: 0.115, loss_classifier: 0.097, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 794
[ 2023-10-08 02:26:14 ] Completed saving temp checkpoint 1,061.192 ms, 29.45 s total
[ 2023-10-08 02:26:14 ] Completed replacing temp checkpoint with checkpoint 46.991 ms, 29.50 s total
[ 2023-10-08 02:26:14 ] Completed Epoch: 13 batch 795: moving batch data to device 5.901 ms, 29.50 s total
[ 2023-10-08 02:26:15 ] Completed Epoch: 13 batch 795: forward pass 106.036 ms, 29.61 s total
[ 2023-10-08 02:26:15 ] Completed Epoch: 13 batch 795: backward pass 87.228 ms, 29.70 s total
[ 2023-10-08 02:26:15 ] Completed Epoch: 13 batch 795: computing loss 106.971 ms, 29.80 s total
EPOCH: [13], BATCH: [795/889], loss: 0.406, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.137, loss_objectness: 0.017, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 795
[ 2023-10-08 02:26:16 ] Completed saving temp checkpoint 1,217.055 ms, 31.02 s total
[ 2023-10-08 02:26:16 ] Completed replacing temp checkpoint with checkpoint 61.465 ms, 31.08 s total
[ 2023-10-08 02:26:16 ] Completed Epoch: 13 batch 796: moving batch data to device 5.154 ms, 31.09 s total
[ 2023-10-08 02:26:16 ] Completed Epoch: 13 batch 796: forward pass 107.304 ms, 31.19 s total
[ 2023-10-08 02:26:16 ] Completed Epoch: 13 batch 796: backward pass 47.777 ms, 31.24 s total
[ 2023-10-08 02:26:16 ] Completed Epoch: 13 batch 796: computing loss 144.752 ms, 31.39 s total
EPOCH: [13], BATCH: [796/889], loss: 0.382, loss_box_reg: 0.118, loss_classifier: 0.094, loss_mask: 0.124, loss_objectness: 0.016, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 796
[ 2023-10-08 02:26:17 ] Completed saving temp checkpoint 1,040.598 ms, 32.43 s total
[ 2023-10-08 02:26:17 ] Completed replacing temp checkpoint with checkpoint 50.519 ms, 32.48 s total
[ 2023-10-08 02:26:17 ] Completed Epoch: 13 batch 797: moving batch data to device 6.844 ms, 32.48 s total
[ 2023-10-08 02:26:17 ] Completed Epoch: 13 batch 797: forward pass 104.774 ms, 32.59 s total
[ 2023-10-08 02:26:18 ] Completed Epoch: 13 batch 797: backward pass 54.265 ms, 32.64 s total
[ 2023-10-08 02:26:18 ] Completed Epoch: 13 batch 797: computing loss 144.447 ms, 32.79 s total
EPOCH: [13], BATCH: [797/889], loss: 0.371, loss_box_reg: 0.110, loss_classifier: 0.094, loss_mask: 0.128, loss_objectness: 0.013, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 797
[ 2023-10-08 02:26:19 ] Completed saving temp checkpoint 1,207.208 ms, 34.00 s total
[ 2023-10-08 02:26:19 ] Completed replacing temp checkpoint with checkpoint 66.175 ms, 34.06 s total
[ 2023-10-08 02:26:19 ] Completed Epoch: 13 batch 798: moving batch data to device 6.521 ms, 34.07 s total
[ 2023-10-08 02:26:19 ] Completed Epoch: 13 batch 798: forward pass 107.656 ms, 34.18 s total
[ 2023-10-08 02:26:19 ] Completed Epoch: 13 batch 798: backward pass 56.871 ms, 34.23 s total
[ 2023-10-08 02:26:19 ] Completed Epoch: 13 batch 798: computing loss 130.064 ms, 34.36 s total
EPOCH: [13], BATCH: [798/889], loss: 0.402, loss_box_reg: 0.120, loss_classifier: 0.096, loss_mask: 0.139, loss_objectness: 0.016, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 798
[ 2023-10-08 02:26:20 ] Completed saving temp checkpoint 1,004.668 ms, 35.37 s total
[ 2023-10-08 02:26:20 ] Completed replacing temp checkpoint with checkpoint 34.458 ms, 35.40 s total
[ 2023-10-08 02:26:20 ] Completed Epoch: 13 batch 799: moving batch data to device 5.552 ms, 35.41 s total
[ 2023-10-08 02:26:20 ] Completed Epoch: 13 batch 799: forward pass 106.868 ms, 35.51 s total
[ 2023-10-08 02:26:20 ] Completed Epoch: 13 batch 799: backward pass 45.985 ms, 35.56 s total
[ 2023-10-08 02:26:21 ] Completed Epoch: 13 batch 799: computing loss 145.736 ms, 35.71 s total
EPOCH: [13], BATCH: [799/889], loss: 0.420, loss_box_reg: 0.130, loss_classifier: 0.104, loss_mask: 0.138, loss_objectness: 0.020, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 799
[ 2023-10-08 02:26:22 ] Completed saving temp checkpoint 1,138.065 ms, 36.84 s total
[ 2023-10-08 02:26:22 ] Completed replacing temp checkpoint with checkpoint 60.114 ms, 36.90 s total
[ 2023-10-08 02:26:22 ] Completed Epoch: 13 batch 800: moving batch data to device 4.718 ms, 36.91 s total
[ 2023-10-08 02:26:22 ] Completed Epoch: 13 batch 800: forward pass 103.390 ms, 37.01 s total
[ 2023-10-08 02:26:22 ] Completed Epoch: 13 batch 800: backward pass 43.151 ms, 37.06 s total
[ 2023-10-08 02:26:22 ] Completed Epoch: 13 batch 800: computing loss 125.583 ms, 37.18 s total
EPOCH: [13], BATCH: [800/889], loss: 0.372, loss_box_reg: 0.110, loss_classifier: 0.091, loss_mask: 0.128, loss_objectness: 0.016, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 800
[ 2023-10-08 02:26:23 ] Completed saving temp checkpoint 1,068.080 ms, 38.25 s total
[ 2023-10-08 02:26:23 ] Completed replacing temp checkpoint with checkpoint 72.516 ms, 38.32 s total
[ 2023-10-08 02:26:23 ] Completed Epoch: 13 batch 801: moving batch data to device 8.333 ms, 38.33 s total
[ 2023-10-08 02:26:23 ] Completed Epoch: 13 batch 801: forward pass 101.339 ms, 38.43 s total
[ 2023-10-08 02:26:23 ] Completed Epoch: 13 batch 801: backward pass 74.927 ms, 38.51 s total
[ 2023-10-08 02:26:24 ] Completed Epoch: 13 batch 801: computing loss 111.953 ms, 38.62 s total
EPOCH: [13], BATCH: [801/889], loss: 0.385, loss_box_reg: 0.115, loss_classifier: 0.098, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 801
[ 2023-10-08 02:26:25 ] Completed saving temp checkpoint 1,529.212 ms, 40.15 s total
[ 2023-10-08 02:26:25 ] Completed replacing temp checkpoint with checkpoint 75.241 ms, 40.22 s total
[ 2023-10-08 02:26:25 ] Completed Epoch: 13 batch 802: moving batch data to device 8.201 ms, 40.23 s total
[ 2023-10-08 02:26:25 ] Completed Epoch: 13 batch 802: forward pass 105.243 ms, 40.34 s total
[ 2023-10-08 02:26:25 ] Completed Epoch: 13 batch 802: backward pass 82.027 ms, 40.42 s total
[ 2023-10-08 02:26:25 ] Completed Epoch: 13 batch 802: computing loss 89.041 ms, 40.51 s total
EPOCH: [13], BATCH: [802/889], loss: 0.393, loss_box_reg: 0.122, loss_classifier: 0.099, loss_mask: 0.132, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 802
[ 2023-10-08 02:26:27 ] Completed saving temp checkpoint 1,418.284 ms, 41.93 s total
[ 2023-10-08 02:26:27 ] Completed replacing temp checkpoint with checkpoint 83.941 ms, 42.01 s total
[ 2023-10-08 02:26:27 ] Completed Epoch: 13 batch 803: moving batch data to device 8.324 ms, 42.02 s total
[ 2023-10-08 02:26:27 ] Completed Epoch: 13 batch 803: forward pass 107.593 ms, 42.12 s total
[ 2023-10-08 02:26:27 ] Completed Epoch: 13 batch 803: backward pass 68.038 ms, 42.19 s total
[ 2023-10-08 02:26:27 ] Completed Epoch: 13 batch 803: computing loss 129.408 ms, 42.32 s total
EPOCH: [13], BATCH: [803/889], loss: 0.339, loss_box_reg: 0.101, loss_classifier: 0.084, loss_mask: 0.115, loss_objectness: 0.013, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 803
[ 2023-10-08 02:26:29 ] Completed saving temp checkpoint 1,907.435 ms, 44.23 s total
[ 2023-10-08 02:26:29 ] Completed replacing temp checkpoint with checkpoint 99.468 ms, 44.33 s total
[ 2023-10-08 02:26:29 ] Completed Epoch: 13 batch 804: moving batch data to device 7.710 ms, 44.34 s total
[ 2023-10-08 02:26:29 ] Completed Epoch: 13 batch 804: forward pass 103.227 ms, 44.44 s total
[ 2023-10-08 02:26:29 ] Completed Epoch: 13 batch 804: backward pass 37.987 ms, 44.48 s total
[ 2023-10-08 02:26:30 ] Completed Epoch: 13 batch 804: computing loss 155.717 ms, 44.63 s total
EPOCH: [13], BATCH: [804/889], loss: 0.354, loss_box_reg: 0.106, loss_classifier: 0.092, loss_mask: 0.122, loss_objectness: 0.012, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 804
[ 2023-10-08 02:26:31 ] Completed saving temp checkpoint 1,479.754 ms, 46.11 s total
[ 2023-10-08 02:26:31 ] Completed replacing temp checkpoint with checkpoint 97.524 ms, 46.21 s total
[ 2023-10-08 02:26:31 ] Completed Epoch: 13 batch 805: moving batch data to device 7.410 ms, 46.22 s total
[ 2023-10-08 02:26:31 ] Completed Epoch: 13 batch 805: forward pass 109.655 ms, 46.33 s total
[ 2023-10-08 02:26:31 ] Completed Epoch: 13 batch 805: backward pass 73.832 ms, 46.40 s total
[ 2023-10-08 02:26:31 ] Completed Epoch: 13 batch 805: computing loss 123.169 ms, 46.53 s total
EPOCH: [13], BATCH: [805/889], loss: 0.366, loss_box_reg: 0.108, loss_classifier: 0.089, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 805
[ 2023-10-08 02:26:33 ] Completed saving temp checkpoint 1,479.033 ms, 48.00 s total
[ 2023-10-08 02:26:33 ] Completed replacing temp checkpoint with checkpoint 89.870 ms, 48.09 s total
[ 2023-10-08 02:26:33 ] Completed Epoch: 13 batch 806: moving batch data to device 6.765 ms, 48.10 s total
[ 2023-10-08 02:26:33 ] Completed Epoch: 13 batch 806: forward pass 106.857 ms, 48.21 s total
[ 2023-10-08 02:26:33 ] Completed Epoch: 13 batch 806: backward pass 66.236 ms, 48.27 s total
[ 2023-10-08 02:26:33 ] Completed Epoch: 13 batch 806: computing loss 133.700 ms, 48.41 s total
EPOCH: [13], BATCH: [806/889], loss: 0.388, loss_box_reg: 0.119, loss_classifier: 0.094, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 806
[ 2023-10-08 02:26:34 ] Completed saving temp checkpoint 943.396 ms, 49.35 s total
[ 2023-10-08 02:26:34 ] Completed replacing temp checkpoint with checkpoint 46.622 ms, 49.40 s total
[ 2023-10-08 02:26:34 ] Completed Epoch: 13 batch 807: moving batch data to device 8.149 ms, 49.41 s total
[ 2023-10-08 02:26:34 ] Completed Epoch: 13 batch 807: forward pass 106.590 ms, 49.51 s total
[ 2023-10-08 02:26:34 ] Completed Epoch: 13 batch 807: backward pass 79.566 ms, 49.59 s total
[ 2023-10-08 02:26:35 ] Completed Epoch: 13 batch 807: computing loss 108.155 ms, 49.70 s total
EPOCH: [13], BATCH: [807/889], loss: 0.399, loss_box_reg: 0.115, loss_classifier: 0.102, loss_mask: 0.128, loss_objectness: 0.019, loss_rpn_box_reg: 0.036
Saving checkpoint at epoch 13 train batch 807
[ 2023-10-08 02:26:36 ] Completed saving temp checkpoint 1,050.638 ms, 50.75 s total
[ 2023-10-08 02:26:36 ] Completed replacing temp checkpoint with checkpoint 82.395 ms, 50.83 s total
[ 2023-10-08 02:26:36 ] Completed Epoch: 13 batch 808: moving batch data to device 7.323 ms, 50.84 s total
[ 2023-10-08 02:26:36 ] Completed Epoch: 13 batch 808: forward pass 104.392 ms, 50.94 s total
[ 2023-10-08 02:26:36 ] Completed Epoch: 13 batch 808: backward pass 66.547 ms, 51.01 s total
[ 2023-10-08 02:26:36 ] Completed Epoch: 13 batch 808: computing loss 119.460 ms, 51.13 s total
EPOCH: [13], BATCH: [808/889], loss: 0.403, loss_box_reg: 0.129, loss_classifier: 0.095, loss_mask: 0.139, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 808
[ 2023-10-08 02:26:37 ] Completed saving temp checkpoint 972.568 ms, 52.10 s total
[ 2023-10-08 02:26:37 ] Completed replacing temp checkpoint with checkpoint 50.536 ms, 52.15 s total
[ 2023-10-08 02:26:37 ] Completed Epoch: 13 batch 809: moving batch data to device 4.591 ms, 52.16 s total
[ 2023-10-08 02:26:37 ] Completed Epoch: 13 batch 809: forward pass 132.686 ms, 52.29 s total
[ 2023-10-08 02:26:37 ] Completed Epoch: 13 batch 809: backward pass 71.356 ms, 52.36 s total
[ 2023-10-08 02:26:37 ] Completed Epoch: 13 batch 809: computing loss 110.183 ms, 52.47 s total
EPOCH: [13], BATCH: [809/889], loss: 0.393, loss_box_reg: 0.123, loss_classifier: 0.095, loss_mask: 0.127, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 809
[ 2023-10-08 02:26:38 ] Completed saving temp checkpoint 1,103.840 ms, 53.58 s total
[ 2023-10-08 02:26:39 ] Completed replacing temp checkpoint with checkpoint 60.823 ms, 53.64 s total
[ 2023-10-08 02:26:39 ] Completed Epoch: 13 batch 810: moving batch data to device 5.635 ms, 53.64 s total
[ 2023-10-08 02:26:39 ] Completed Epoch: 13 batch 810: forward pass 105.186 ms, 53.75 s total
[ 2023-10-08 02:26:39 ] Completed Epoch: 13 batch 810: backward pass 37.261 ms, 53.79 s total
[ 2023-10-08 02:26:39 ] Completed Epoch: 13 batch 810: computing loss 151.887 ms, 53.94 s total
EPOCH: [13], BATCH: [810/889], loss: 0.410, loss_box_reg: 0.126, loss_classifier: 0.105, loss_mask: 0.134, loss_objectness: 0.018, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 810
[ 2023-10-08 02:26:40 ] Completed saving temp checkpoint 1,016.028 ms, 54.95 s total
[ 2023-10-08 02:26:40 ] Completed replacing temp checkpoint with checkpoint 69.132 ms, 55.02 s total
[ 2023-10-08 02:26:40 ] Completed Epoch: 13 batch 811: moving batch data to device 5.990 ms, 55.03 s total
[ 2023-10-08 02:26:40 ] Completed Epoch: 13 batch 811: forward pass 101.741 ms, 55.13 s total
[ 2023-10-08 02:26:40 ] Completed Epoch: 13 batch 811: backward pass 69.973 ms, 55.20 s total
[ 2023-10-08 02:26:40 ] Completed Epoch: 13 batch 811: computing loss 129.029 ms, 55.33 s total
EPOCH: [13], BATCH: [811/889], loss: 0.399, loss_box_reg: 0.121, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.020, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 811
[ 2023-10-08 02:26:41 ] Completed saving temp checkpoint 1,128.455 ms, 56.46 s total
[ 2023-10-08 02:26:41 ] Completed replacing temp checkpoint with checkpoint 32.398 ms, 56.49 s total
[ 2023-10-08 02:26:41 ] Completed Epoch: 13 batch 812: moving batch data to device 7.400 ms, 56.50 s total
[ 2023-10-08 02:26:42 ] Completed Epoch: 13 batch 812: forward pass 107.533 ms, 56.61 s total
[ 2023-10-08 02:26:42 ] Completed Epoch: 13 batch 812: backward pass 75.779 ms, 56.68 s total
[ 2023-10-08 02:26:42 ] Completed Epoch: 13 batch 812: computing loss 120.474 ms, 56.80 s total
EPOCH: [13], BATCH: [812/889], loss: 0.382, loss_box_reg: 0.116, loss_classifier: 0.096, loss_mask: 0.132, loss_objectness: 0.013, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 812
[ 2023-10-08 02:26:43 ] Completed saving temp checkpoint 1,009.791 ms, 57.81 s total
[ 2023-10-08 02:26:43 ] Completed replacing temp checkpoint with checkpoint 49.024 ms, 57.86 s total
[ 2023-10-08 02:26:43 ] Completed Epoch: 13 batch 813: moving batch data to device 3.616 ms, 57.86 s total
[ 2023-10-08 02:26:43 ] Completed Epoch: 13 batch 813: forward pass 100.764 ms, 57.96 s total
[ 2023-10-08 02:26:43 ] Completed Epoch: 13 batch 813: backward pass 79.968 ms, 58.04 s total
[ 2023-10-08 02:26:43 ] Completed Epoch: 13 batch 813: computing loss 110.746 ms, 58.16 s total
EPOCH: [13], BATCH: [813/889], loss: 0.413, loss_box_reg: 0.124, loss_classifier: 0.104, loss_mask: 0.143, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 813
[ 2023-10-08 02:26:44 ] Completed saving temp checkpoint 1,151.599 ms, 59.31 s total
[ 2023-10-08 02:26:44 ] Completed replacing temp checkpoint with checkpoint 82.688 ms, 59.39 s total
[ 2023-10-08 02:26:44 ] Completed Epoch: 13 batch 814: moving batch data to device 8.034 ms, 59.40 s total
[ 2023-10-08 02:26:44 ] Completed Epoch: 13 batch 814: forward pass 104.091 ms, 59.50 s total
[ 2023-10-08 02:26:44 ] Completed Epoch: 13 batch 814: backward pass 38.700 ms, 59.54 s total
[ 2023-10-08 02:26:45 ] Completed Epoch: 13 batch 814: computing loss 157.299 ms, 59.70 s total
EPOCH: [13], BATCH: [814/889], loss: 0.393, loss_box_reg: 0.123, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.016, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 814
[ 2023-10-08 02:26:46 ] Completed saving temp checkpoint 1,260.269 ms, 60.96 s total
[ 2023-10-08 02:26:46 ] Completed replacing temp checkpoint with checkpoint 52.901 ms, 61.01 s total
[ 2023-10-08 02:26:46 ] Completed Epoch: 13 batch 815: moving batch data to device 5.539 ms, 61.02 s total
[ 2023-10-08 02:26:46 ] Completed Epoch: 13 batch 815: forward pass 109.539 ms, 61.13 s total
[ 2023-10-08 02:26:46 ] Completed Epoch: 13 batch 815: backward pass 77.919 ms, 61.20 s total
[ 2023-10-08 02:26:46 ] Completed Epoch: 13 batch 815: computing loss 114.431 ms, 61.32 s total
EPOCH: [13], BATCH: [815/889], loss: 0.368, loss_box_reg: 0.112, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 815
[ 2023-10-08 02:26:48 ] Completed saving temp checkpoint 1,983.797 ms, 63.30 s total
[ 2023-10-08 02:26:48 ] Completed replacing temp checkpoint with checkpoint 82.546 ms, 63.38 s total
[ 2023-10-08 02:26:48 ] Completed Epoch: 13 batch 816: moving batch data to device 7.523 ms, 63.39 s total
[ 2023-10-08 02:26:48 ] Completed Epoch: 13 batch 816: forward pass 104.256 ms, 63.50 s total
[ 2023-10-08 02:26:48 ] Completed Epoch: 13 batch 816: backward pass 66.891 ms, 63.56 s total
[ 2023-10-08 02:26:49 ] Completed Epoch: 13 batch 816: computing loss 120.948 ms, 63.68 s total
EPOCH: [13], BATCH: [816/889], loss: 0.405, loss_box_reg: 0.120, loss_classifier: 0.108, loss_mask: 0.131, loss_objectness: 0.019, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 816
[ 2023-10-08 02:26:50 ] Completed saving temp checkpoint 1,547.277 ms, 65.23 s total
[ 2023-10-08 02:26:50 ] Completed replacing temp checkpoint with checkpoint 39.171 ms, 65.27 s total
[ 2023-10-08 02:26:50 ] Completed Epoch: 13 batch 817: moving batch data to device 5.060 ms, 65.28 s total
[ 2023-10-08 02:26:50 ] Completed Epoch: 13 batch 817: forward pass 107.186 ms, 65.38 s total
[ 2023-10-08 02:26:50 ] Completed Epoch: 13 batch 817: backward pass 82.853 ms, 65.47 s total
[ 2023-10-08 02:26:50 ] Completed Epoch: 13 batch 817: computing loss 111.663 ms, 65.58 s total
EPOCH: [13], BATCH: [817/889], loss: 0.382, loss_box_reg: 0.113, loss_classifier: 0.098, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 817
[ 2023-10-08 02:26:52 ] Completed saving temp checkpoint 1,914.849 ms, 67.49 s total
[ 2023-10-08 02:26:52 ] Completed replacing temp checkpoint with checkpoint 95.059 ms, 67.59 s total
[ 2023-10-08 02:26:52 ] Completed Epoch: 13 batch 818: moving batch data to device 7.368 ms, 67.59 s total
[ 2023-10-08 02:26:53 ] Completed Epoch: 13 batch 818: forward pass 106.965 ms, 67.70 s total
[ 2023-10-08 02:26:53 ] Completed Epoch: 13 batch 818: backward pass 71.189 ms, 67.77 s total
[ 2023-10-08 02:26:53 ] Completed Epoch: 13 batch 818: computing loss 96.764 ms, 67.87 s total
EPOCH: [13], BATCH: [818/889], loss: 0.337, loss_box_reg: 0.095, loss_classifier: 0.084, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 818
[ 2023-10-08 02:26:54 ] Completed saving temp checkpoint 1,034.467 ms, 68.90 s total
[ 2023-10-08 02:26:54 ] Completed replacing temp checkpoint with checkpoint 63.347 ms, 68.97 s total
[ 2023-10-08 02:26:54 ] Completed Epoch: 13 batch 819: moving batch data to device 5.018 ms, 68.97 s total
[ 2023-10-08 02:26:54 ] Completed Epoch: 13 batch 819: forward pass 101.666 ms, 69.07 s total
[ 2023-10-08 02:26:54 ] Completed Epoch: 13 batch 819: backward pass 48.865 ms, 69.12 s total
[ 2023-10-08 02:26:54 ] Completed Epoch: 13 batch 819: computing loss 136.163 ms, 69.26 s total
EPOCH: [13], BATCH: [819/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.103, loss_mask: 0.136, loss_objectness: 0.015, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 819
[ 2023-10-08 02:26:55 ] Completed saving temp checkpoint 1,111.956 ms, 70.37 s total
[ 2023-10-08 02:26:55 ] Completed replacing temp checkpoint with checkpoint 49.617 ms, 70.42 s total
[ 2023-10-08 02:26:55 ] Completed Epoch: 13 batch 820: moving batch data to device 6.188 ms, 70.43 s total
[ 2023-10-08 02:26:55 ] Completed Epoch: 13 batch 820: forward pass 103.665 ms, 70.53 s total
[ 2023-10-08 02:26:55 ] Completed Epoch: 13 batch 820: backward pass 43.804 ms, 70.57 s total
[ 2023-10-08 02:26:56 ] Completed Epoch: 13 batch 820: computing loss 143.786 ms, 70.72 s total
EPOCH: [13], BATCH: [820/889], loss: 0.364, loss_box_reg: 0.109, loss_classifier: 0.087, loss_mask: 0.131, loss_objectness: 0.016, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 820
[ 2023-10-08 02:26:57 ] Completed saving temp checkpoint 1,010.743 ms, 71.73 s total
[ 2023-10-08 02:26:57 ] Completed replacing temp checkpoint with checkpoint 71.493 ms, 71.80 s total
[ 2023-10-08 02:26:57 ] Completed Epoch: 13 batch 821: moving batch data to device 7.793 ms, 71.81 s total
[ 2023-10-08 02:26:57 ] Completed Epoch: 13 batch 821: forward pass 101.827 ms, 71.91 s total
[ 2023-10-08 02:26:57 ] Completed Epoch: 13 batch 821: backward pass 79.941 ms, 71.99 s total
[ 2023-10-08 02:26:57 ] Completed Epoch: 13 batch 821: computing loss 87.822 ms, 72.08 s total
EPOCH: [13], BATCH: [821/889], loss: 0.406, loss_box_reg: 0.122, loss_classifier: 0.104, loss_mask: 0.138, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 821
[ 2023-10-08 02:26:58 ] Completed saving temp checkpoint 1,510.073 ms, 73.59 s total
[ 2023-10-08 02:26:59 ] Completed replacing temp checkpoint with checkpoint 61.659 ms, 73.65 s total
[ 2023-10-08 02:26:59 ] Completed Epoch: 13 batch 822: moving batch data to device 5.408 ms, 73.65 s total
[ 2023-10-08 02:26:59 ] Completed Epoch: 13 batch 822: forward pass 109.814 ms, 73.76 s total
[ 2023-10-08 02:26:59 ] Completed Epoch: 13 batch 822: backward pass 72.975 ms, 73.84 s total
[ 2023-10-08 02:26:59 ] Completed Epoch: 13 batch 822: computing loss 117.050 ms, 73.95 s total
EPOCH: [13], BATCH: [822/889], loss: 0.388, loss_box_reg: 0.118, loss_classifier: 0.105, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 822
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 02:40:24 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:40:24 ] Completed importing Timer 0.025 ms, 0.00 s total
[ 2023-10-08 02:40:25 ] Completed importing everything else 528.271 ms, 0.53 s total
[ 2023-10-08 02:40:25 ] Completed defined other functions 0.025 ms, 0.53 s total
| distributed init (rank 5): env://
| distributed init (rank 3): env://
| distributed init (rank 4): env://
| distributed init (rank 2): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 02:40:27 ] Completed main preliminaries 2,791.290 ms, 3.32 s total
loading annotations into memory...
Done (t=11.34s)
creating index...
index created!
loading annotations into memory...
Done (t=0.28s)
creating index...
index created!
[ 2023-10-08 02:40:41 ] Completed loading data 13,291.573 ms, 16.61 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 02:40:41 ] Completed creating data samplers 106.787 ms, 16.72 s total
[ 2023-10-08 02:40:41 ] Completed creating data loaders 0.260 ms, 16.72 s total
[ 2023-10-08 02:40:41 ] Completed creating model and .to(device) 648.079 ms, 17.37 s total
[ 2023-10-08 02:40:42 ] Completed preparing model for distributed training 746.923 ms, 18.11 s total
[ 2023-10-08 02:40:42 ] Completed optimizer and scaler 0.638 ms, 18.11 s total
[ 2023-10-08 02:40:42 ] Completed learning rate schedulers 0.239 ms, 18.11 s total
[ 2023-10-08 02:40:43 ] Completed init coco evaluator 971.612 ms, 19.09 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 02:40:44 ] Completed retrieving checkpoint 862.084 ms, 19.95 s total
EPOCH :: 13
[ 2023-10-08 02:40:44 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:40:44 ] Completed training preliminaries 0.903 ms, 0.00 s total
Training / resuming epoch 13 from training step 822
[ 2023-10-08 02:40:45 ] Completed Epoch: 13 batch 822: moving batch data to device 545.213 ms, 0.55 s total
[ 2023-10-08 02:40:46 ] Completed Epoch: 13 batch 822: forward pass 1,098.499 ms, 1.64 s total
[ 2023-10-08 02:40:46 ] Completed Epoch: 13 batch 822: backward pass 175.768 ms, 1.82 s total
[ 2023-10-08 02:40:46 ] Completed Epoch: 13 batch 822: computing loss 341.749 ms, 2.16 s total
EPOCH: [13], BATCH: [822/889], loss: 0.387, loss_box_reg: 0.118, loss_classifier: 0.104, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 822
[ 2023-10-08 02:40:47 ] Completed saving temp checkpoint 985.608 ms, 3.15 s total
[ 2023-10-08 02:40:47 ] Completed replacing temp checkpoint with checkpoint 164.616 ms, 3.31 s total
[ 2023-10-08 02:40:47 ] Completed Epoch: 13 batch 823: moving batch data to device 3.042 ms, 3.32 s total
[ 2023-10-08 02:40:47 ] Completed Epoch: 13 batch 823: forward pass 110.312 ms, 3.43 s total
[ 2023-10-08 02:40:47 ] Completed Epoch: 13 batch 823: backward pass 71.621 ms, 3.50 s total
[ 2023-10-08 02:40:48 ] Completed Epoch: 13 batch 823: computing loss 150.115 ms, 3.65 s total
EPOCH: [13], BATCH: [823/889], loss: 0.383, loss_box_reg: 0.118, loss_classifier: 0.096, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 823
[ 2023-10-08 02:40:49 ] Completed saving temp checkpoint 891.374 ms, 4.54 s total
[ 2023-10-08 02:40:49 ] Completed replacing temp checkpoint with checkpoint 67.526 ms, 4.61 s total
[ 2023-10-08 02:40:49 ] Completed Epoch: 13 batch 824: moving batch data to device 3.458 ms, 4.61 s total
[ 2023-10-08 02:40:49 ] Completed Epoch: 13 batch 824: forward pass 113.747 ms, 4.72 s total
[ 2023-10-08 02:40:49 ] Completed Epoch: 13 batch 824: backward pass 115.117 ms, 4.84 s total
[ 2023-10-08 02:40:49 ] Completed Epoch: 13 batch 824: computing loss 103.215 ms, 4.94 s total
EPOCH: [13], BATCH: [824/889], loss: 0.400, loss_box_reg: 0.120, loss_classifier: 0.106, loss_mask: 0.135, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 824
[ 2023-10-08 02:40:50 ] Completed saving temp checkpoint 984.908 ms, 5.93 s total
[ 2023-10-08 02:40:50 ] Completed replacing temp checkpoint with checkpoint 80.284 ms, 6.01 s total
[ 2023-10-08 02:40:50 ] Completed Epoch: 13 batch 825: moving batch data to device 5.979 ms, 6.01 s total
[ 2023-10-08 02:40:50 ] Completed Epoch: 13 batch 825: forward pass 112.484 ms, 6.13 s total
[ 2023-10-08 02:40:50 ] Completed Epoch: 13 batch 825: backward pass 66.146 ms, 6.19 s total
[ 2023-10-08 02:40:50 ] Completed Epoch: 13 batch 825: computing loss 148.216 ms, 6.34 s total
EPOCH: [13], BATCH: [825/889], loss: 0.386, loss_box_reg: 0.117, loss_classifier: 0.099, loss_mask: 0.125, loss_objectness: 0.017, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 825
[ 2023-10-08 02:40:51 ] Completed saving temp checkpoint 755.174 ms, 7.10 s total
[ 2023-10-08 02:40:51 ] Completed replacing temp checkpoint with checkpoint 45.923 ms, 7.14 s total
[ 2023-10-08 02:40:51 ] Completed Epoch: 13 batch 826: moving batch data to device 14.290 ms, 7.16 s total
[ 2023-10-08 02:40:51 ] Completed Epoch: 13 batch 826: forward pass 104.859 ms, 7.26 s total
[ 2023-10-08 02:40:51 ] Completed Epoch: 13 batch 826: backward pass 93.229 ms, 7.35 s total
[ 2023-10-08 02:40:51 ] Completed Epoch: 13 batch 826: computing loss 121.982 ms, 7.48 s total
EPOCH: [13], BATCH: [826/889], loss: 0.379, loss_box_reg: 0.118, loss_classifier: 0.098, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 826
[ 2023-10-08 02:40:52 ] Completed saving temp checkpoint 755.437 ms, 8.23 s total
[ 2023-10-08 02:40:52 ] Completed replacing temp checkpoint with checkpoint 54.361 ms, 8.29 s total
[ 2023-10-08 02:40:52 ] Completed Epoch: 13 batch 827: moving batch data to device 3.854 ms, 8.29 s total
[ 2023-10-08 02:40:52 ] Completed Epoch: 13 batch 827: forward pass 109.785 ms, 8.40 s total
[ 2023-10-08 02:40:52 ] Completed Epoch: 13 batch 827: backward pass 70.907 ms, 8.47 s total
[ 2023-10-08 02:40:53 ] Completed Epoch: 13 batch 827: computing loss 142.617 ms, 8.61 s total
EPOCH: [13], BATCH: [827/889], loss: 0.428, loss_box_reg: 0.133, loss_classifier: 0.109, loss_mask: 0.141, loss_objectness: 0.018, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 827
[ 2023-10-08 02:40:53 ] Completed saving temp checkpoint 773.102 ms, 9.39 s total
[ 2023-10-08 02:40:53 ] Completed replacing temp checkpoint with checkpoint 54.793 ms, 9.44 s total
[ 2023-10-08 02:40:53 ] Completed Epoch: 13 batch 828: moving batch data to device 4.061 ms, 9.44 s total
[ 2023-10-08 02:40:54 ] Completed Epoch: 13 batch 828: forward pass 106.652 ms, 9.55 s total
[ 2023-10-08 02:40:54 ] Completed Epoch: 13 batch 828: backward pass 69.128 ms, 9.62 s total
[ 2023-10-08 02:40:54 ] Completed Epoch: 13 batch 828: computing loss 140.697 ms, 9.76 s total
EPOCH: [13], BATCH: [828/889], loss: 0.408, loss_box_reg: 0.125, loss_classifier: 0.099, loss_mask: 0.135, loss_objectness: 0.018, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 828
[ 2023-10-08 02:40:55 ] Completed saving temp checkpoint 1,215.317 ms, 10.98 s total
[ 2023-10-08 02:40:55 ] Completed replacing temp checkpoint with checkpoint 71.579 ms, 11.05 s total
[ 2023-10-08 02:40:55 ] Completed Epoch: 13 batch 829: moving batch data to device 6.765 ms, 11.05 s total
[ 2023-10-08 02:40:55 ] Completed Epoch: 13 batch 829: forward pass 105.732 ms, 11.16 s total
[ 2023-10-08 02:40:55 ] Completed Epoch: 13 batch 829: backward pass 75.999 ms, 11.24 s total
[ 2023-10-08 02:40:55 ] Completed Epoch: 13 batch 829: computing loss 120.688 ms, 11.36 s total
EPOCH: [13], BATCH: [829/889], loss: 0.354, loss_box_reg: 0.105, loss_classifier: 0.093, loss_mask: 0.118, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 829
[ 2023-10-08 02:40:57 ] Completed saving temp checkpoint 1,478.132 ms, 12.83 s total
[ 2023-10-08 02:40:57 ] Completed replacing temp checkpoint with checkpoint 103.960 ms, 12.94 s total
[ 2023-10-08 02:40:57 ] Completed Epoch: 13 batch 830: moving batch data to device 3.460 ms, 12.94 s total
[ 2023-10-08 02:40:57 ] Completed Epoch: 13 batch 830: forward pass 106.327 ms, 13.05 s total
[ 2023-10-08 02:40:57 ] Completed Epoch: 13 batch 830: backward pass 72.184 ms, 13.12 s total
[ 2023-10-08 02:40:57 ] Completed Epoch: 13 batch 830: computing loss 128.551 ms, 13.25 s total
EPOCH: [13], BATCH: [830/889], loss: 0.391, loss_box_reg: 0.113, loss_classifier: 0.099, loss_mask: 0.127, loss_objectness: 0.020, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 830
[ 2023-10-08 02:40:59 ] Completed saving temp checkpoint 1,793.859 ms, 15.04 s total
[ 2023-10-08 02:40:59 ] Completed replacing temp checkpoint with checkpoint 108.564 ms, 15.15 s total
[ 2023-10-08 02:40:59 ] Completed Epoch: 13 batch 831: moving batch data to device 7.232 ms, 15.16 s total
[ 2023-10-08 02:40:59 ] Completed Epoch: 13 batch 831: forward pass 106.903 ms, 15.27 s total
[ 2023-10-08 02:40:59 ] Completed Epoch: 13 batch 831: backward pass 79.642 ms, 15.35 s total
[ 2023-10-08 02:41:00 ] Completed Epoch: 13 batch 831: computing loss 307.734 ms, 15.65 s total
EPOCH: [13], BATCH: [831/889], loss: 0.363, loss_box_reg: 0.111, loss_classifier: 0.090, loss_mask: 0.126, loss_objectness: 0.013, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 831
[ 2023-10-08 02:41:01 ] Completed saving temp checkpoint 1,034.901 ms, 16.69 s total
[ 2023-10-08 02:41:01 ] Completed replacing temp checkpoint with checkpoint 61.668 ms, 16.75 s total
[ 2023-10-08 02:41:01 ] Completed Epoch: 13 batch 832: moving batch data to device 5.433 ms, 16.76 s total
[ 2023-10-08 02:41:01 ] Completed Epoch: 13 batch 832: forward pass 105.231 ms, 16.86 s total
[ 2023-10-08 02:41:01 ] Completed Epoch: 13 batch 832: backward pass 79.684 ms, 16.94 s total
[ 2023-10-08 02:41:01 ] Completed Epoch: 13 batch 832: computing loss 118.137 ms, 17.06 s total
EPOCH: [13], BATCH: [832/889], loss: 0.368, loss_box_reg: 0.107, loss_classifier: 0.093, loss_mask: 0.125, loss_objectness: 0.017, loss_rpn_box_reg: 0.026
Saving checkpoint at epoch 13 train batch 832
[ 2023-10-08 02:41:02 ] Completed saving temp checkpoint 1,194.090 ms, 18.25 s total
[ 2023-10-08 02:41:02 ] Completed replacing temp checkpoint with checkpoint 63.585 ms, 18.32 s total
[ 2023-10-08 02:41:02 ] Completed Epoch: 13 batch 833: moving batch data to device 6.940 ms, 18.32 s total
[ 2023-10-08 02:41:02 ] Completed Epoch: 13 batch 833: forward pass 101.426 ms, 18.42 s total
[ 2023-10-08 02:41:02 ] Completed Epoch: 13 batch 833: backward pass 29.383 ms, 18.45 s total
[ 2023-10-08 02:41:03 ] Completed Epoch: 13 batch 833: computing loss 157.103 ms, 18.61 s total
EPOCH: [13], BATCH: [833/889], loss: 0.386, loss_box_reg: 0.111, loss_classifier: 0.093, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 833
[ 2023-10-08 02:41:04 ] Completed saving temp checkpoint 1,820.994 ms, 20.43 s total
[ 2023-10-08 02:41:04 ] Completed replacing temp checkpoint with checkpoint 63.041 ms, 20.49 s total
[ 2023-10-08 02:41:04 ] Completed Epoch: 13 batch 834: moving batch data to device 6.277 ms, 20.50 s total
[ 2023-10-08 02:41:05 ] Completed Epoch: 13 batch 834: forward pass 106.916 ms, 20.61 s total
[ 2023-10-08 02:41:05 ] Completed Epoch: 13 batch 834: backward pass 76.387 ms, 20.68 s total
[ 2023-10-08 02:41:05 ] Completed Epoch: 13 batch 834: computing loss 118.289 ms, 20.80 s total
EPOCH: [13], BATCH: [834/889], loss: 0.392, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.016, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 834
[ 2023-10-08 02:41:06 ] Completed saving temp checkpoint 1,453.132 ms, 22.26 s total
[ 2023-10-08 02:41:06 ] Completed replacing temp checkpoint with checkpoint 68.477 ms, 22.32 s total
[ 2023-10-08 02:41:06 ] Completed Epoch: 13 batch 835: moving batch data to device 6.913 ms, 22.33 s total
[ 2023-10-08 02:41:06 ] Completed Epoch: 13 batch 835: forward pass 106.877 ms, 22.44 s total
[ 2023-10-08 02:41:06 ] Completed Epoch: 13 batch 835: backward pass 73.181 ms, 22.51 s total
[ 2023-10-08 02:41:07 ] Completed Epoch: 13 batch 835: computing loss 122.793 ms, 22.63 s total
EPOCH: [13], BATCH: [835/889], loss: 0.361, loss_box_reg: 0.108, loss_classifier: 0.092, loss_mask: 0.123, loss_objectness: 0.013, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 835
[ 2023-10-08 02:41:08 ] Completed saving temp checkpoint 1,084.111 ms, 23.72 s total
[ 2023-10-08 02:41:08 ] Completed replacing temp checkpoint with checkpoint 46.109 ms, 23.76 s total
[ 2023-10-08 02:41:08 ] Completed Epoch: 13 batch 836: moving batch data to device 7.955 ms, 23.77 s total
[ 2023-10-08 02:41:08 ] Completed Epoch: 13 batch 836: forward pass 104.733 ms, 23.88 s total
[ 2023-10-08 02:41:08 ] Completed Epoch: 13 batch 836: backward pass 80.728 ms, 23.96 s total
[ 2023-10-08 02:41:08 ] Completed Epoch: 13 batch 836: computing loss 113.490 ms, 24.07 s total
EPOCH: [13], BATCH: [836/889], loss: 0.401, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.133, loss_objectness: 0.015, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 836
[ 2023-10-08 02:41:09 ] Completed saving temp checkpoint 1,201.190 ms, 25.27 s total
[ 2023-10-08 02:41:09 ] Completed replacing temp checkpoint with checkpoint 66.010 ms, 25.34 s total
[ 2023-10-08 02:41:09 ] Completed Epoch: 13 batch 837: moving batch data to device 6.065 ms, 25.34 s total
[ 2023-10-08 02:41:09 ] Completed Epoch: 13 batch 837: forward pass 106.865 ms, 25.45 s total
[ 2023-10-08 02:41:10 ] Completed Epoch: 13 batch 837: backward pass 77.581 ms, 25.53 s total
[ 2023-10-08 02:41:10 ] Completed Epoch: 13 batch 837: computing loss 118.921 ms, 25.65 s total
EPOCH: [13], BATCH: [837/889], loss: 0.404, loss_box_reg: 0.120, loss_classifier: 0.106, loss_mask: 0.133, loss_objectness: 0.018, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 837
[ 2023-10-08 02:41:11 ] Completed saving temp checkpoint 1,079.150 ms, 26.73 s total
[ 2023-10-08 02:41:11 ] Completed replacing temp checkpoint with checkpoint 84.861 ms, 26.81 s total
[ 2023-10-08 02:41:11 ] Completed Epoch: 13 batch 838: moving batch data to device 7.596 ms, 26.82 s total
[ 2023-10-08 02:41:11 ] Completed Epoch: 13 batch 838: forward pass 104.497 ms, 26.92 s total
[ 2023-10-08 02:41:11 ] Completed Epoch: 13 batch 838: backward pass 73.936 ms, 27.00 s total
[ 2023-10-08 02:41:11 ] Completed Epoch: 13 batch 838: computing loss 123.188 ms, 27.12 s total
EPOCH: [13], BATCH: [838/889], loss: 0.410, loss_box_reg: 0.125, loss_classifier: 0.107, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 838
[ 2023-10-08 02:41:13 ] Completed saving temp checkpoint 1,683.567 ms, 28.80 s total
[ 2023-10-08 02:41:13 ] Completed replacing temp checkpoint with checkpoint 72.235 ms, 28.88 s total
[ 2023-10-08 02:41:13 ] Completed Epoch: 13 batch 839: moving batch data to device 7.313 ms, 28.88 s total
[ 2023-10-08 02:41:13 ] Completed Epoch: 13 batch 839: forward pass 103.041 ms, 28.99 s total
[ 2023-10-08 02:41:13 ] Completed Epoch: 13 batch 839: backward pass 69.082 ms, 29.06 s total
[ 2023-10-08 02:41:13 ] Completed Epoch: 13 batch 839: computing loss 174.886 ms, 29.23 s total
EPOCH: [13], BATCH: [839/889], loss: 0.361, loss_box_reg: 0.108, loss_classifier: 0.090, loss_mask: 0.126, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 839
[ 2023-10-08 02:41:15 ] Completed saving temp checkpoint 1,718.643 ms, 30.95 s total
[ 2023-10-08 02:41:15 ] Completed replacing temp checkpoint with checkpoint 105.196 ms, 31.06 s total
[ 2023-10-08 02:41:15 ] Completed Epoch: 13 batch 840: moving batch data to device 7.403 ms, 31.06 s total
[ 2023-10-08 02:41:15 ] Completed Epoch: 13 batch 840: forward pass 105.783 ms, 31.17 s total
[ 2023-10-08 02:41:15 ] Completed Epoch: 13 batch 840: backward pass 54.329 ms, 31.22 s total
[ 2023-10-08 02:41:15 ] Completed Epoch: 13 batch 840: computing loss 136.513 ms, 31.36 s total
EPOCH: [13], BATCH: [840/889], loss: 0.377, loss_box_reg: 0.112, loss_classifier: 0.094, loss_mask: 0.134, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 840
[ 2023-10-08 02:41:17 ] Completed saving temp checkpoint 2,057.722 ms, 33.42 s total
[ 2023-10-08 02:41:17 ] Completed replacing temp checkpoint with checkpoint 102.556 ms, 33.52 s total
[ 2023-10-08 02:41:18 ] Completed Epoch: 13 batch 841: moving batch data to device 7.729 ms, 33.53 s total
[ 2023-10-08 02:41:18 ] Completed Epoch: 13 batch 841: forward pass 111.006 ms, 33.64 s total
[ 2023-10-08 02:41:18 ] Completed Epoch: 13 batch 841: backward pass 88.302 ms, 33.73 s total
[ 2023-10-08 02:41:18 ] Completed Epoch: 13 batch 841: computing loss 109.186 ms, 33.84 s total
EPOCH: [13], BATCH: [841/889], loss: 0.377, loss_box_reg: 0.117, loss_classifier: 0.097, loss_mask: 0.127, loss_objectness: 0.014, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 841
[ 2023-10-08 02:41:20 ] Completed saving temp checkpoint 1,801.240 ms, 35.64 s total
[ 2023-10-08 02:41:20 ] Completed replacing temp checkpoint with checkpoint 92.218 ms, 35.73 s total
[ 2023-10-08 02:41:20 ] Completed Epoch: 13 batch 842: moving batch data to device 7.358 ms, 35.74 s total
[ 2023-10-08 02:41:20 ] Completed Epoch: 13 batch 842: forward pass 102.076 ms, 35.84 s total
[ 2023-10-08 02:41:20 ] Completed Epoch: 13 batch 842: backward pass 70.585 ms, 35.91 s total
[ 2023-10-08 02:41:20 ] Completed Epoch: 13 batch 842: computing loss 130.372 ms, 36.04 s total
EPOCH: [13], BATCH: [842/889], loss: 0.395, loss_box_reg: 0.123, loss_classifier: 0.104, loss_mask: 0.128, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 842
[ 2023-10-08 02:41:22 ] Completed saving temp checkpoint 1,581.807 ms, 37.62 s total
[ 2023-10-08 02:41:22 ] Completed replacing temp checkpoint with checkpoint 78.916 ms, 37.70 s total
[ 2023-10-08 02:41:22 ] Completed Epoch: 13 batch 843: moving batch data to device 7.227 ms, 37.71 s total
[ 2023-10-08 02:41:22 ] Completed Epoch: 13 batch 843: forward pass 107.600 ms, 37.82 s total
[ 2023-10-08 02:41:22 ] Completed Epoch: 13 batch 843: backward pass 85.833 ms, 37.90 s total
[ 2023-10-08 02:41:22 ] Completed Epoch: 13 batch 843: computing loss 85.449 ms, 37.99 s total
EPOCH: [13], BATCH: [843/889], loss: 0.354, loss_box_reg: 0.104, loss_classifier: 0.088, loss_mask: 0.124, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 843
[ 2023-10-08 02:41:23 ] Completed saving temp checkpoint 1,353.618 ms, 39.34 s total
[ 2023-10-08 02:41:23 ] Completed replacing temp checkpoint with checkpoint 58.132 ms, 39.40 s total
[ 2023-10-08 02:41:23 ] Completed Epoch: 13 batch 844: moving batch data to device 5.445 ms, 39.40 s total
[ 2023-10-08 02:41:23 ] Completed Epoch: 13 batch 844: forward pass 101.375 ms, 39.50 s total
[ 2023-10-08 02:41:24 ] Completed Epoch: 13 batch 844: backward pass 70.743 ms, 39.58 s total
[ 2023-10-08 02:41:24 ] Completed Epoch: 13 batch 844: computing loss 123.645 ms, 39.70 s total
EPOCH: [13], BATCH: [844/889], loss: 0.359, loss_box_reg: 0.106, loss_classifier: 0.090, loss_mask: 0.126, loss_objectness: 0.014, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 844
[ 2023-10-08 02:41:25 ] Completed saving temp checkpoint 1,531.075 ms, 41.23 s total
[ 2023-10-08 02:41:25 ] Completed replacing temp checkpoint with checkpoint 67.891 ms, 41.30 s total
[ 2023-10-08 02:41:25 ] Completed Epoch: 13 batch 845: moving batch data to device 8.298 ms, 41.31 s total
[ 2023-10-08 02:41:25 ] Completed Epoch: 13 batch 845: forward pass 102.856 ms, 41.41 s total
[ 2023-10-08 02:41:25 ] Completed Epoch: 13 batch 845: backward pass 65.025 ms, 41.47 s total
[ 2023-10-08 02:41:26 ] Completed Epoch: 13 batch 845: computing loss 130.913 ms, 41.61 s total
EPOCH: [13], BATCH: [845/889], loss: 0.398, loss_box_reg: 0.125, loss_classifier: 0.101, loss_mask: 0.130, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 845
[ 2023-10-08 02:41:27 ] Completed saving temp checkpoint 1,366.414 ms, 42.97 s total
[ 2023-10-08 02:41:27 ] Completed replacing temp checkpoint with checkpoint 97.167 ms, 43.07 s total
[ 2023-10-08 02:41:27 ] Completed Epoch: 13 batch 846: moving batch data to device 7.931 ms, 43.08 s total
[ 2023-10-08 02:41:27 ] Completed Epoch: 13 batch 846: forward pass 106.414 ms, 43.18 s total
[ 2023-10-08 02:41:27 ] Completed Epoch: 13 batch 846: backward pass 73.278 ms, 43.26 s total
[ 2023-10-08 02:41:27 ] Completed Epoch: 13 batch 846: computing loss 115.888 ms, 43.37 s total
EPOCH: [13], BATCH: [846/889], loss: 0.393, loss_box_reg: 0.118, loss_classifier: 0.101, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 846
[ 2023-10-08 02:41:29 ] Completed saving temp checkpoint 1,552.495 ms, 44.92 s total
[ 2023-10-08 02:41:29 ] Completed replacing temp checkpoint with checkpoint 104.921 ms, 45.03 s total
[ 2023-10-08 02:41:29 ] Completed Epoch: 13 batch 847: moving batch data to device 7.652 ms, 45.04 s total
[ 2023-10-08 02:41:29 ] Completed Epoch: 13 batch 847: forward pass 106.008 ms, 45.14 s total
[ 2023-10-08 02:41:29 ] Completed Epoch: 13 batch 847: backward pass 82.103 ms, 45.23 s total
[ 2023-10-08 02:41:29 ] Completed Epoch: 13 batch 847: computing loss 107.136 ms, 45.33 s total
EPOCH: [13], BATCH: [847/889], loss: 0.368, loss_box_reg: 0.110, loss_classifier: 0.093, loss_mask: 0.131, loss_objectness: 0.011, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 847
[ 2023-10-08 02:41:31 ] Completed saving temp checkpoint 1,425.781 ms, 46.76 s total
[ 2023-10-08 02:41:31 ] Completed replacing temp checkpoint with checkpoint 96.268 ms, 46.85 s total
[ 2023-10-08 02:41:31 ] Completed Epoch: 13 batch 848: moving batch data to device 9.449 ms, 46.86 s total
[ 2023-10-08 02:41:31 ] Completed Epoch: 13 batch 848: forward pass 106.714 ms, 46.97 s total
[ 2023-10-08 02:41:31 ] Completed Epoch: 13 batch 848: backward pass 80.140 ms, 47.05 s total
[ 2023-10-08 02:41:31 ] Completed Epoch: 13 batch 848: computing loss 112.575 ms, 47.16 s total
EPOCH: [13], BATCH: [848/889], loss: 0.406, loss_box_reg: 0.127, loss_classifier: 0.100, loss_mask: 0.140, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 848
[ 2023-10-08 02:41:33 ] Completed saving temp checkpoint 1,647.153 ms, 48.81 s total
[ 2023-10-08 02:41:33 ] Completed replacing temp checkpoint with checkpoint 72.359 ms, 48.88 s total
[ 2023-10-08 02:41:33 ] Completed Epoch: 13 batch 849: moving batch data to device 4.879 ms, 48.89 s total
[ 2023-10-08 02:41:33 ] Completed Epoch: 13 batch 849: forward pass 107.096 ms, 49.00 s total
[ 2023-10-08 02:41:33 ] Completed Epoch: 13 batch 849: backward pass 77.321 ms, 49.07 s total
[ 2023-10-08 02:41:33 ] Completed Epoch: 13 batch 849: computing loss 114.650 ms, 49.19 s total
EPOCH: [13], BATCH: [849/889], loss: 0.375, loss_box_reg: 0.112, loss_classifier: 0.096, loss_mask: 0.127, loss_objectness: 0.017, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 849
[ 2023-10-08 02:41:35 ] Completed saving temp checkpoint 1,395.223 ms, 50.58 s total
[ 2023-10-08 02:41:35 ] Completed replacing temp checkpoint with checkpoint 84.898 ms, 50.67 s total
[ 2023-10-08 02:41:35 ] Completed Epoch: 13 batch 850: moving batch data to device 7.643 ms, 50.67 s total
[ 2023-10-08 02:41:35 ] Completed Epoch: 13 batch 850: forward pass 111.407 ms, 50.79 s total
[ 2023-10-08 02:41:35 ] Completed Epoch: 13 batch 850: backward pass 78.779 ms, 50.87 s total
[ 2023-10-08 02:41:35 ] Completed Epoch: 13 batch 850: computing loss 109.389 ms, 50.97 s total
EPOCH: [13], BATCH: [850/889], loss: 0.373, loss_box_reg: 0.110, loss_classifier: 0.097, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 850
[ 2023-10-08 02:41:37 ] Completed saving temp checkpoint 1,600.079 ms, 52.57 s total
[ 2023-10-08 02:41:37 ] Completed replacing temp checkpoint with checkpoint 79.330 ms, 52.65 s total
[ 2023-10-08 02:41:37 ] Completed Epoch: 13 batch 851: moving batch data to device 7.268 ms, 52.66 s total
[ 2023-10-08 02:41:37 ] Completed Epoch: 13 batch 851: forward pass 108.193 ms, 52.77 s total
[ 2023-10-08 02:41:37 ] Completed Epoch: 13 batch 851: backward pass 54.649 ms, 52.82 s total
[ 2023-10-08 02:41:37 ] Completed Epoch: 13 batch 851: computing loss 138.169 ms, 52.96 s total
EPOCH: [13], BATCH: [851/889], loss: 0.356, loss_box_reg: 0.103, loss_classifier: 0.088, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 851
[ 2023-10-08 02:41:39 ] Completed saving temp checkpoint 1,631.381 ms, 54.59 s total
[ 2023-10-08 02:41:39 ] Completed replacing temp checkpoint with checkpoint 48.380 ms, 54.64 s total
[ 2023-10-08 02:41:39 ] Completed Epoch: 13 batch 852: moving batch data to device 7.231 ms, 54.65 s total
[ 2023-10-08 02:41:39 ] Completed Epoch: 13 batch 852: forward pass 109.790 ms, 54.76 s total
[ 2023-10-08 02:41:39 ] Completed Epoch: 13 batch 852: backward pass 65.554 ms, 54.82 s total
[ 2023-10-08 02:41:39 ] Completed Epoch: 13 batch 852: computing loss 124.638 ms, 54.95 s total
EPOCH: [13], BATCH: [852/889], loss: 0.440, loss_box_reg: 0.131, loss_classifier: 0.106, loss_mask: 0.141, loss_objectness: 0.019, loss_rpn_box_reg: 0.044
Saving checkpoint at epoch 13 train batch 852
[ 2023-10-08 02:41:41 ] Completed saving temp checkpoint 2,002.150 ms, 56.95 s total
[ 2023-10-08 02:41:41 ] Completed replacing temp checkpoint with checkpoint 96.762 ms, 57.05 s total
[ 2023-10-08 02:41:41 ] Completed Epoch: 13 batch 853: moving batch data to device 7.616 ms, 57.06 s total
[ 2023-10-08 02:41:41 ] Completed Epoch: 13 batch 853: forward pass 107.976 ms, 57.16 s total
[ 2023-10-08 02:41:41 ] Completed Epoch: 13 batch 853: backward pass 75.869 ms, 57.24 s total
[ 2023-10-08 02:41:41 ] Completed Epoch: 13 batch 853: computing loss 120.108 ms, 57.36 s total
EPOCH: [13], BATCH: [853/889], loss: 0.384, loss_box_reg: 0.114, loss_classifier: 0.099, loss_mask: 0.131, loss_objectness: 0.015, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 853
[ 2023-10-08 02:41:43 ] Completed saving temp checkpoint 1,397.791 ms, 58.76 s total
[ 2023-10-08 02:41:43 ] Completed replacing temp checkpoint with checkpoint 95.555 ms, 58.85 s total
[ 2023-10-08 02:41:43 ] Completed Epoch: 13 batch 854: moving batch data to device 7.382 ms, 58.86 s total
[ 2023-10-08 02:41:43 ] Completed Epoch: 13 batch 854: forward pass 105.391 ms, 58.97 s total
[ 2023-10-08 02:41:43 ] Completed Epoch: 13 batch 854: backward pass 80.915 ms, 59.05 s total
[ 2023-10-08 02:41:43 ] Completed Epoch: 13 batch 854: computing loss 102.261 ms, 59.15 s total
EPOCH: [13], BATCH: [854/889], loss: 0.392, loss_box_reg: 0.123, loss_classifier: 0.097, loss_mask: 0.129, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 854
[ 2023-10-08 02:41:45 ] Completed saving temp checkpoint 1,937.740 ms, 61.09 s total
[ 2023-10-08 02:41:45 ] Completed replacing temp checkpoint with checkpoint 82.431 ms, 61.17 s total
[ 2023-10-08 02:41:45 ] Completed Epoch: 13 batch 855: moving batch data to device 5.779 ms, 61.17 s total
[ 2023-10-08 02:41:45 ] Completed Epoch: 13 batch 855: forward pass 100.034 ms, 61.27 s total
[ 2023-10-08 02:41:45 ] Completed Epoch: 13 batch 855: backward pass 73.366 ms, 61.35 s total
[ 2023-10-08 02:41:45 ] Completed Epoch: 13 batch 855: computing loss 112.731 ms, 61.46 s total
EPOCH: [13], BATCH: [855/889], loss: 0.361, loss_box_reg: 0.109, loss_classifier: 0.087, loss_mask: 0.128, loss_objectness: 0.014, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 855
[ 2023-10-08 02:41:47 ] Completed saving temp checkpoint 1,198.864 ms, 62.66 s total
[ 2023-10-08 02:41:47 ] Completed replacing temp checkpoint with checkpoint 88.377 ms, 62.75 s total
[ 2023-10-08 02:41:47 ] Completed Epoch: 13 batch 856: moving batch data to device 7.070 ms, 62.76 s total
[ 2023-10-08 02:41:47 ] Completed Epoch: 13 batch 856: forward pass 109.910 ms, 62.87 s total
[ 2023-10-08 02:41:47 ] Completed Epoch: 13 batch 856: backward pass 74.744 ms, 62.94 s total
[ 2023-10-08 02:41:47 ] Completed Epoch: 13 batch 856: computing loss 118.726 ms, 63.06 s total
EPOCH: [13], BATCH: [856/889], loss: 0.390, loss_box_reg: 0.119, loss_classifier: 0.092, loss_mask: 0.134, loss_objectness: 0.013, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 856
[ 2023-10-08 02:41:48 ] Completed saving temp checkpoint 1,303.259 ms, 64.36 s total
[ 2023-10-08 02:41:48 ] Completed replacing temp checkpoint with checkpoint 69.103 ms, 64.43 s total
[ 2023-10-08 02:41:48 ] Completed Epoch: 13 batch 857: moving batch data to device 8.068 ms, 64.44 s total
[ 2023-10-08 02:41:49 ] Completed Epoch: 13 batch 857: forward pass 112.232 ms, 64.55 s total
[ 2023-10-08 02:41:49 ] Completed Epoch: 13 batch 857: backward pass 36.399 ms, 64.59 s total
[ 2023-10-08 02:41:49 ] Completed Epoch: 13 batch 857: computing loss 154.290 ms, 64.74 s total
EPOCH: [13], BATCH: [857/889], loss: 0.398, loss_box_reg: 0.124, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.032
Saving checkpoint at epoch 13 train batch 857
[ 2023-10-08 02:41:50 ] Completed saving temp checkpoint 1,163.714 ms, 65.91 s total
[ 2023-10-08 02:41:50 ] Completed replacing temp checkpoint with checkpoint 89.530 ms, 66.00 s total
[ 2023-10-08 02:41:50 ] Completed Epoch: 13 batch 858: moving batch data to device 6.809 ms, 66.00 s total
[ 2023-10-08 02:41:50 ] Completed Epoch: 13 batch 858: forward pass 101.696 ms, 66.10 s total
[ 2023-10-08 02:41:50 ] Completed Epoch: 13 batch 858: backward pass 66.704 ms, 66.17 s total
[ 2023-10-08 02:41:50 ] Completed Epoch: 13 batch 858: computing loss 131.091 ms, 66.30 s total
EPOCH: [13], BATCH: [858/889], loss: 0.398, loss_box_reg: 0.122, loss_classifier: 0.102, loss_mask: 0.134, loss_objectness: 0.016, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 858
[ 2023-10-08 02:41:52 ] Completed saving temp checkpoint 1,261.359 ms, 67.56 s total
[ 2023-10-08 02:41:52 ] Completed replacing temp checkpoint with checkpoint 80.531 ms, 67.64 s total
[ 2023-10-08 02:41:52 ] Completed Epoch: 13 batch 859: moving batch data to device 7.190 ms, 67.65 s total
[ 2023-10-08 02:41:52 ] Completed Epoch: 13 batch 859: forward pass 106.145 ms, 67.76 s total
[ 2023-10-08 02:41:52 ] Completed Epoch: 13 batch 859: backward pass 65.293 ms, 67.82 s total
[ 2023-10-08 02:41:52 ] Completed Epoch: 13 batch 859: computing loss 129.365 ms, 67.95 s total
EPOCH: [13], BATCH: [859/889], loss: 0.366, loss_box_reg: 0.108, loss_classifier: 0.091, loss_mask: 0.129, loss_objectness: 0.015, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 859
[ 2023-10-08 02:41:53 ] Completed saving temp checkpoint 1,377.400 ms, 69.33 s total
[ 2023-10-08 02:41:53 ] Completed replacing temp checkpoint with checkpoint 74.328 ms, 69.40 s total
[ 2023-10-08 02:41:53 ] Completed Epoch: 13 batch 860: moving batch data to device 7.759 ms, 69.41 s total
[ 2023-10-08 02:41:53 ] Completed Epoch: 13 batch 860: forward pass 104.613 ms, 69.52 s total
[ 2023-10-08 02:41:54 ] Completed Epoch: 13 batch 860: backward pass 43.648 ms, 69.56 s total
[ 2023-10-08 02:41:54 ] Completed Epoch: 13 batch 860: computing loss 150.584 ms, 69.71 s total
EPOCH: [13], BATCH: [860/889], loss: 0.419, loss_box_reg: 0.124, loss_classifier: 0.109, loss_mask: 0.136, loss_objectness: 0.019, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 860
[ 2023-10-08 02:41:55 ] Completed saving temp checkpoint 1,712.039 ms, 71.42 s total
[ 2023-10-08 02:41:55 ] Completed replacing temp checkpoint with checkpoint 76.037 ms, 71.50 s total
[ 2023-10-08 02:41:55 ] Completed Epoch: 13 batch 861: moving batch data to device 7.066 ms, 71.51 s total
[ 2023-10-08 02:41:56 ] Completed Epoch: 13 batch 861: forward pass 105.576 ms, 71.61 s total
[ 2023-10-08 02:41:56 ] Completed Epoch: 13 batch 861: backward pass 72.186 ms, 71.68 s total
[ 2023-10-08 02:41:56 ] Completed Epoch: 13 batch 861: computing loss 127.944 ms, 71.81 s total
EPOCH: [13], BATCH: [861/889], loss: 0.365, loss_box_reg: 0.104, loss_classifier: 0.091, loss_mask: 0.125, loss_objectness: 0.016, loss_rpn_box_reg: 0.029
Saving checkpoint at epoch 13 train batch 861
[ 2023-10-08 02:41:57 ] Completed saving temp checkpoint 1,442.723 ms, 73.25 s total
[ 2023-10-08 02:41:57 ] Completed replacing temp checkpoint with checkpoint 66.759 ms, 73.32 s total
[ 2023-10-08 02:41:57 ] Completed Epoch: 13 batch 862: moving batch data to device 7.845 ms, 73.33 s total
[ 2023-10-08 02:41:57 ] Completed Epoch: 13 batch 862: forward pass 105.749 ms, 73.43 s total
[ 2023-10-08 02:41:57 ] Completed Epoch: 13 batch 862: backward pass 42.272 ms, 73.48 s total
[ 2023-10-08 02:41:58 ] Completed Epoch: 13 batch 862: computing loss 127.388 ms, 73.60 s total
EPOCH: [13], BATCH: [862/889], loss: 0.403, loss_box_reg: 0.120, loss_classifier: 0.099, loss_mask: 0.137, loss_objectness: 0.019, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 862
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 02:55:12 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:55:12 ] Completed importing Timer 0.021 ms, 0.00 s total
[ 2023-10-08 02:55:13 ] Completed importing everything else 558.017 ms, 0.56 s total
[ 2023-10-08 02:55:13 ] Completed defined other functions 0.032 ms, 0.56 s total
| distributed init (rank 4): env://
| distributed init (rank 3): env://
| distributed init (rank 2): env://
| distributed init (rank 0): env://
| distributed init (rank 5): env://
| distributed init (rank 1): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 02:55:20 ] Completed main preliminaries 7,531.250 ms, 8.09 s total
loading annotations into memory...
Done (t=10.56s)
creating index...
index created!
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
[ 2023-10-08 02:55:32 ] Completed loading data 12,260.571 ms, 20.35 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 02:55:33 ] Completed creating data samplers 93.827 ms, 20.44 s total
[ 2023-10-08 02:55:33 ] Completed creating data loaders 0.206 ms, 20.44 s total
[ 2023-10-08 02:55:33 ] Completed creating model and .to(device) 651.415 ms, 21.10 s total
[ 2023-10-08 02:55:35 ] Completed preparing model for distributed training 2,006.848 ms, 23.10 s total
[ 2023-10-08 02:55:35 ] Completed optimizer and scaler 0.630 ms, 23.10 s total
[ 2023-10-08 02:55:35 ] Completed learning rate schedulers 0.257 ms, 23.10 s total
[ 2023-10-08 02:55:36 ] Completed init coco evaluator 953.271 ms, 24.06 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 02:55:37 ] Completed retrieving checkpoint 880.884 ms, 24.94 s total
EPOCH :: 13
[ 2023-10-08 02:55:37 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:55:37 ] Completed training preliminaries 0.924 ms, 0.00 s total
Training / resuming epoch 13 from training step 862
[ 2023-10-08 02:55:37 ] Completed Epoch: 13 batch 862: moving batch data to device 470.412 ms, 0.47 s total
[ 2023-10-08 02:55:39 ] Completed Epoch: 13 batch 862: forward pass 1,195.002 ms, 1.67 s total
[ 2023-10-08 02:55:39 ] Completed Epoch: 13 batch 862: backward pass 168.974 ms, 1.84 s total
[ 2023-10-08 02:55:39 ] Completed Epoch: 13 batch 862: computing loss 183.903 ms, 2.02 s total
EPOCH: [13], BATCH: [862/889], loss: 0.406, loss_box_reg: 0.121, loss_classifier: 0.101, loss_mask: 0.137, loss_objectness: 0.018, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 862
[ 2023-10-08 02:55:40 ] Completed saving temp checkpoint 1,033.535 ms, 3.05 s total
[ 2023-10-08 02:55:40 ] Completed replacing temp checkpoint with checkpoint 175.965 ms, 3.23 s total
[ 2023-10-08 02:55:40 ] Completed Epoch: 13 batch 863: moving batch data to device 50.340 ms, 3.28 s total
[ 2023-10-08 02:55:40 ] Completed Epoch: 13 batch 863: forward pass 112.870 ms, 3.39 s total
[ 2023-10-08 02:55:40 ] Completed Epoch: 13 batch 863: backward pass 82.227 ms, 3.47 s total
[ 2023-10-08 02:55:41 ] Completed Epoch: 13 batch 863: computing loss 217.741 ms, 3.69 s total
EPOCH: [13], BATCH: [863/889], loss: 0.409, loss_box_reg: 0.122, loss_classifier: 0.104, loss_mask: 0.133, loss_objectness: 0.016, loss_rpn_box_reg: 0.033
Saving checkpoint at epoch 13 train batch 863
[ 2023-10-08 02:55:42 ] Completed saving temp checkpoint 1,079.712 ms, 4.77 s total
[ 2023-10-08 02:55:42 ] Completed replacing temp checkpoint with checkpoint 53.007 ms, 4.82 s total
[ 2023-10-08 02:55:42 ] Completed Epoch: 13 batch 864: moving batch data to device 6.150 ms, 4.83 s total
[ 2023-10-08 02:55:42 ] Completed Epoch: 13 batch 864: forward pass 107.105 ms, 4.94 s total
[ 2023-10-08 02:55:42 ] Completed Epoch: 13 batch 864: backward pass 92.955 ms, 5.03 s total
[ 2023-10-08 02:55:42 ] Completed Epoch: 13 batch 864: computing loss 127.873 ms, 5.16 s total
EPOCH: [13], BATCH: [864/889], loss: 0.349, loss_box_reg: 0.104, loss_classifier: 0.088, loss_mask: 0.125, loss_objectness: 0.012, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 864
[ 2023-10-08 02:55:43 ] Completed saving temp checkpoint 999.740 ms, 6.16 s total
[ 2023-10-08 02:55:43 ] Completed replacing temp checkpoint with checkpoint 67.743 ms, 6.23 s total
[ 2023-10-08 02:55:43 ] Completed Epoch: 13 batch 865: moving batch data to device 3.425 ms, 6.23 s total
[ 2023-10-08 02:55:43 ] Completed Epoch: 13 batch 865: forward pass 130.505 ms, 6.36 s total
[ 2023-10-08 02:55:43 ] Completed Epoch: 13 batch 865: backward pass 45.068 ms, 6.41 s total
[ 2023-10-08 02:55:44 ] Completed Epoch: 13 batch 865: computing loss 213.918 ms, 6.62 s total
EPOCH: [13], BATCH: [865/889], loss: 0.363, loss_box_reg: 0.107, loss_classifier: 0.090, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 865
[ 2023-10-08 02:55:45 ] Completed saving temp checkpoint 1,103.682 ms, 7.72 s total
[ 2023-10-08 02:55:45 ] Completed replacing temp checkpoint with checkpoint 72.869 ms, 7.80 s total
[ 2023-10-08 02:55:45 ] Completed Epoch: 13 batch 866: moving batch data to device 4.344 ms, 7.80 s total
[ 2023-10-08 02:55:45 ] Completed Epoch: 13 batch 866: forward pass 110.813 ms, 7.91 s total
[ 2023-10-08 02:55:45 ] Completed Epoch: 13 batch 866: backward pass 82.114 ms, 7.99 s total
[ 2023-10-08 02:55:45 ] Completed Epoch: 13 batch 866: computing loss 132.278 ms, 8.13 s total
EPOCH: [13], BATCH: [866/889], loss: 0.382, loss_box_reg: 0.113, loss_classifier: 0.099, loss_mask: 0.127, loss_objectness: 0.016, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 866
[ 2023-10-08 02:55:46 ] Completed saving temp checkpoint 974.616 ms, 9.10 s total
[ 2023-10-08 02:55:46 ] Completed replacing temp checkpoint with checkpoint 61.430 ms, 9.16 s total
[ 2023-10-08 02:55:46 ] Completed Epoch: 13 batch 867: moving batch data to device 13.207 ms, 9.17 s total
[ 2023-10-08 02:55:46 ] Completed Epoch: 13 batch 867: forward pass 106.532 ms, 9.28 s total
[ 2023-10-08 02:55:46 ] Completed Epoch: 13 batch 867: backward pass 120.244 ms, 9.40 s total
[ 2023-10-08 02:55:47 ] Completed Epoch: 13 batch 867: computing loss 91.187 ms, 9.49 s total
EPOCH: [13], BATCH: [867/889], loss: 0.394, loss_box_reg: 0.118, loss_classifier: 0.102, loss_mask: 0.131, loss_objectness: 0.017, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 867
[ 2023-10-08 02:55:48 ] Completed saving temp checkpoint 1,061.323 ms, 10.55 s total
[ 2023-10-08 02:55:48 ] Completed replacing temp checkpoint with checkpoint 74.484 ms, 10.63 s total
[ 2023-10-08 02:55:48 ] Completed Epoch: 13 batch 868: moving batch data to device 3.620 ms, 10.63 s total
[ 2023-10-08 02:55:48 ] Completed Epoch: 13 batch 868: forward pass 107.625 ms, 10.74 s total
[ 2023-10-08 02:55:48 ] Completed Epoch: 13 batch 868: backward pass 37.824 ms, 10.78 s total
[ 2023-10-08 02:55:48 ] Completed Epoch: 13 batch 868: computing loss 161.016 ms, 10.94 s total
EPOCH: [13], BATCH: [868/889], loss: 0.363, loss_box_reg: 0.108, loss_classifier: 0.091, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 868
[ 2023-10-08 02:55:49 ] Completed saving temp checkpoint 937.239 ms, 11.88 s total
[ 2023-10-08 02:55:49 ] Completed replacing temp checkpoint with checkpoint 67.957 ms, 11.94 s total
[ 2023-10-08 02:55:49 ] Completed Epoch: 13 batch 869: moving batch data to device 8.251 ms, 11.95 s total
[ 2023-10-08 02:55:49 ] Completed Epoch: 13 batch 869: forward pass 106.959 ms, 12.06 s total
[ 2023-10-08 02:55:49 ] Completed Epoch: 13 batch 869: backward pass 41.414 ms, 12.10 s total
[ 2023-10-08 02:55:49 ] Completed Epoch: 13 batch 869: computing loss 153.956 ms, 12.25 s total
EPOCH: [13], BATCH: [869/889], loss: 0.418, loss_box_reg: 0.125, loss_classifier: 0.106, loss_mask: 0.135, loss_objectness: 0.019, loss_rpn_box_reg: 0.034
Saving checkpoint at epoch 13 train batch 869
[ 2023-10-08 02:55:51 ] Completed saving temp checkpoint 1,463.701 ms, 13.72 s total
[ 2023-10-08 02:55:51 ] Completed replacing temp checkpoint with checkpoint 93.267 ms, 13.81 s total
[ 2023-10-08 02:55:51 ] Completed Epoch: 13 batch 870: moving batch data to device 4.940 ms, 13.82 s total
[ 2023-10-08 02:55:51 ] Completed Epoch: 13 batch 870: forward pass 104.950 ms, 13.92 s total
[ 2023-10-08 02:55:51 ] Completed Epoch: 13 batch 870: backward pass 53.033 ms, 13.97 s total
[ 2023-10-08 02:55:51 ] Completed Epoch: 13 batch 870: computing loss 136.424 ms, 14.11 s total
EPOCH: [13], BATCH: [870/889], loss: 0.370, loss_box_reg: 0.113, loss_classifier: 0.093, loss_mask: 0.127, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 870
[ 2023-10-08 02:55:53 ] Completed saving temp checkpoint 1,563.056 ms, 15.67 s total
[ 2023-10-08 02:55:53 ] Completed replacing temp checkpoint with checkpoint 97.868 ms, 15.77 s total
[ 2023-10-08 02:55:53 ] Completed Epoch: 13 batch 871: moving batch data to device 7.551 ms, 15.78 s total
[ 2023-10-08 02:55:53 ] Completed Epoch: 13 batch 871: forward pass 106.629 ms, 15.89 s total
[ 2023-10-08 02:55:53 ] Completed Epoch: 13 batch 871: backward pass 39.484 ms, 15.92 s total
[ 2023-10-08 02:55:53 ] Completed Epoch: 13 batch 871: computing loss 251.439 ms, 16.18 s total
EPOCH: [13], BATCH: [871/889], loss: 0.396, loss_box_reg: 0.126, loss_classifier: 0.100, loss_mask: 0.132, loss_objectness: 0.017, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 871
[ 2023-10-08 02:55:55 ] Completed saving temp checkpoint 1,681.177 ms, 17.86 s total
[ 2023-10-08 02:55:55 ] Completed replacing temp checkpoint with checkpoint 110.883 ms, 17.97 s total
[ 2023-10-08 02:55:55 ] Completed Epoch: 13 batch 872: moving batch data to device 6.834 ms, 17.98 s total
[ 2023-10-08 02:55:55 ] Completed Epoch: 13 batch 872: forward pass 105.078 ms, 18.08 s total
[ 2023-10-08 02:55:55 ] Completed Epoch: 13 batch 872: backward pass 67.977 ms, 18.15 s total
[ 2023-10-08 02:55:55 ] Completed Epoch: 13 batch 872: computing loss 122.443 ms, 18.27 s total
EPOCH: [13], BATCH: [872/889], loss: 0.375, loss_box_reg: 0.111, loss_classifier: 0.094, loss_mask: 0.126, loss_objectness: 0.016, loss_rpn_box_reg: 0.028
Saving checkpoint at epoch 13 train batch 872
[ 2023-10-08 02:55:57 ] Completed saving temp checkpoint 1,466.420 ms, 19.74 s total
[ 2023-10-08 02:55:57 ] Completed replacing temp checkpoint with checkpoint 75.728 ms, 19.81 s total
[ 2023-10-08 02:55:57 ] Completed Epoch: 13 batch 873: moving batch data to device 5.494 ms, 19.82 s total
[ 2023-10-08 02:55:57 ] Completed Epoch: 13 batch 873: forward pass 103.595 ms, 19.92 s total
[ 2023-10-08 02:55:57 ] Completed Epoch: 13 batch 873: backward pass 41.272 ms, 19.96 s total
[ 2023-10-08 02:55:57 ] Completed Epoch: 13 batch 873: computing loss 150.226 ms, 20.11 s total
EPOCH: [13], BATCH: [873/889], loss: 0.378, loss_box_reg: 0.113, loss_classifier: 0.094, loss_mask: 0.135, loss_objectness: 0.015, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 873
[ 2023-10-08 02:55:59 ] Completed saving temp checkpoint 1,534.362 ms, 21.65 s total
[ 2023-10-08 02:55:59 ] Completed replacing temp checkpoint with checkpoint 93.873 ms, 21.74 s total
[ 2023-10-08 02:55:59 ] Completed Epoch: 13 batch 874: moving batch data to device 6.975 ms, 21.75 s total
[ 2023-10-08 02:55:59 ] Completed Epoch: 13 batch 874: forward pass 105.375 ms, 21.85 s total
[ 2023-10-08 02:55:59 ] Completed Epoch: 13 batch 874: backward pass 68.161 ms, 21.92 s total
[ 2023-10-08 02:55:59 ] Completed Epoch: 13 batch 874: computing loss 126.179 ms, 22.05 s total
EPOCH: [13], BATCH: [874/889], loss: 0.380, loss_box_reg: 0.111, loss_classifier: 0.099, loss_mask: 0.130, loss_objectness: 0.014, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 874
[ 2023-10-08 02:56:00 ] Completed saving temp checkpoint 1,009.542 ms, 23.06 s total
[ 2023-10-08 02:56:00 ] Completed replacing temp checkpoint with checkpoint 57.433 ms, 23.12 s total
[ 2023-10-08 02:56:00 ] Completed Epoch: 13 batch 875: moving batch data to device 7.985 ms, 23.12 s total
[ 2023-10-08 02:56:00 ] Completed Epoch: 13 batch 875: forward pass 104.050 ms, 23.23 s total
[ 2023-10-08 02:56:00 ] Completed Epoch: 13 batch 875: backward pass 80.793 ms, 23.31 s total
[ 2023-10-08 02:56:00 ] Completed Epoch: 13 batch 875: computing loss 170.539 ms, 23.48 s total
EPOCH: [13], BATCH: [875/889], loss: 0.416, loss_box_reg: 0.127, loss_classifier: 0.109, loss_mask: 0.133, loss_objectness: 0.017, loss_rpn_box_reg: 0.030
Saving checkpoint at epoch 13 train batch 875
[ 2023-10-08 02:56:02 ] Completed saving temp checkpoint 1,136.823 ms, 24.62 s total
[ 2023-10-08 02:56:02 ] Completed replacing temp checkpoint with checkpoint 80.415 ms, 24.70 s total
[ 2023-10-08 02:56:02 ] Completed Epoch: 13 batch 876: moving batch data to device 7.565 ms, 24.70 s total
[ 2023-10-08 02:56:02 ] Completed Epoch: 13 batch 876: forward pass 102.977 ms, 24.81 s total
[ 2023-10-08 02:56:02 ] Completed Epoch: 13 batch 876: backward pass 75.389 ms, 24.88 s total
[ 2023-10-08 02:56:02 ] Completed Epoch: 13 batch 876: computing loss 110.172 ms, 24.99 s total
EPOCH: [13], BATCH: [876/889], loss: 0.409, loss_box_reg: 0.123, loss_classifier: 0.101, loss_mask: 0.137, loss_objectness: 0.021, loss_rpn_box_reg: 0.027
Saving checkpoint at epoch 13 train batch 876
[ 2023-10-08 02:56:03 ] Completed saving temp checkpoint 987.229 ms, 25.98 s total
[ 2023-10-08 02:56:03 ] Completed replacing temp checkpoint with checkpoint 47.654 ms, 26.03 s total
[ 2023-10-08 02:56:03 ] Completed Epoch: 13 batch 877: moving batch data to device 7.673 ms, 26.03 s total
[ 2023-10-08 02:56:03 ] Completed Epoch: 13 batch 877: forward pass 100.335 ms, 26.14 s total
[ 2023-10-08 02:56:03 ] Completed Epoch: 13 batch 877: backward pass 77.758 ms, 26.21 s total
[ 2023-10-08 02:56:03 ] Completed Epoch: 13 batch 877: computing loss 111.754 ms, 26.32 s total
EPOCH: [13], BATCH: [877/889], loss: 0.393, loss_box_reg: 0.120, loss_classifier: 0.096, loss_mask: 0.136, loss_objectness: 0.017, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 877
[ 2023-10-08 02:56:05 ] Completed saving temp checkpoint 1,173.579 ms, 27.50 s total
[ 2023-10-08 02:56:05 ] Completed replacing temp checkpoint with checkpoint 84.847 ms, 27.58 s total
[ 2023-10-08 02:56:05 ] Completed Epoch: 13 batch 878: moving batch data to device 8.185 ms, 27.59 s total
[ 2023-10-08 02:56:05 ] Completed Epoch: 13 batch 878: forward pass 105.848 ms, 27.70 s total
[ 2023-10-08 02:56:05 ] Completed Epoch: 13 batch 878: backward pass 51.418 ms, 27.75 s total
[ 2023-10-08 02:56:05 ] Completed Epoch: 13 batch 878: computing loss 143.287 ms, 27.89 s total
EPOCH: [13], BATCH: [878/889], loss: 0.386, loss_box_reg: 0.117, loss_classifier: 0.100, loss_mask: 0.129, loss_objectness: 0.017, loss_rpn_box_reg: 0.023
Saving checkpoint at epoch 13 train batch 878
[ 2023-10-08 02:56:06 ] Completed saving temp checkpoint 1,108.064 ms, 29.00 s total
[ 2023-10-08 02:56:06 ] Completed replacing temp checkpoint with checkpoint 45.825 ms, 29.05 s total
[ 2023-10-08 02:56:06 ] Completed Epoch: 13 batch 879: moving batch data to device 11.600 ms, 29.06 s total
[ 2023-10-08 02:56:06 ] Completed Epoch: 13 batch 879: forward pass 110.700 ms, 29.17 s total
[ 2023-10-08 02:56:06 ] Completed Epoch: 13 batch 879: backward pass 74.012 ms, 29.24 s total
[ 2023-10-08 02:56:06 ] Completed Epoch: 13 batch 879: computing loss 122.604 ms, 29.36 s total
EPOCH: [13], BATCH: [879/889], loss: 0.418, loss_box_reg: 0.130, loss_classifier: 0.107, loss_mask: 0.131, loss_objectness: 0.019, loss_rpn_box_reg: 0.031
Saving checkpoint at epoch 13 train batch 879
[ 2023-10-08 02:56:08 ] Completed saving temp checkpoint 1,164.365 ms, 30.53 s total
[ 2023-10-08 02:56:08 ] Completed replacing temp checkpoint with checkpoint 54.965 ms, 30.58 s total
[ 2023-10-08 02:56:08 ] Completed Epoch: 13 batch 880: moving batch data to device 4.831 ms, 30.59 s total
[ 2023-10-08 02:56:08 ] Completed Epoch: 13 batch 880: forward pass 101.522 ms, 30.69 s total
[ 2023-10-08 02:56:08 ] Completed Epoch: 13 batch 880: backward pass 52.775 ms, 30.74 s total
[ 2023-10-08 02:56:08 ] Completed Epoch: 13 batch 880: computing loss 139.454 ms, 30.88 s total
EPOCH: [13], BATCH: [880/889], loss: 0.398, loss_box_reg: 0.123, loss_classifier: 0.105, loss_mask: 0.134, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 880
[ 2023-10-08 02:56:09 ] Completed saving temp checkpoint 1,074.830 ms, 31.96 s total
[ 2023-10-08 02:56:09 ] Completed replacing temp checkpoint with checkpoint 46.695 ms, 32.00 s total
[ 2023-10-08 02:56:09 ] Completed Epoch: 13 batch 881: moving batch data to device 6.141 ms, 32.01 s total
[ 2023-10-08 02:56:09 ] Completed Epoch: 13 batch 881: forward pass 107.303 ms, 32.12 s total
[ 2023-10-08 02:56:09 ] Completed Epoch: 13 batch 881: backward pass 78.679 ms, 32.20 s total
[ 2023-10-08 02:56:09 ] Completed Epoch: 13 batch 881: computing loss 110.465 ms, 32.31 s total
EPOCH: [13], BATCH: [881/889], loss: 0.360, loss_box_reg: 0.107, loss_classifier: 0.091, loss_mask: 0.123, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 881
[ 2023-10-08 02:56:11 ] Completed saving temp checkpoint 1,230.609 ms, 33.54 s total
[ 2023-10-08 02:56:11 ] Completed replacing temp checkpoint with checkpoint 83.552 ms, 33.62 s total
[ 2023-10-08 02:56:11 ] Completed Epoch: 13 batch 882: moving batch data to device 8.926 ms, 33.63 s total
[ 2023-10-08 02:56:11 ] Completed Epoch: 13 batch 882: forward pass 103.895 ms, 33.73 s total
[ 2023-10-08 02:56:11 ] Completed Epoch: 13 batch 882: backward pass 62.530 ms, 33.80 s total
[ 2023-10-08 02:56:11 ] Completed Epoch: 13 batch 882: computing loss 109.247 ms, 33.91 s total
EPOCH: [13], BATCH: [882/889], loss: 0.370, loss_box_reg: 0.114, loss_classifier: 0.092, loss_mask: 0.129, loss_objectness: 0.012, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 882
[ 2023-10-08 02:56:12 ] Completed saving temp checkpoint 1,439.795 ms, 35.35 s total
[ 2023-10-08 02:56:12 ] Completed replacing temp checkpoint with checkpoint 71.803 ms, 35.42 s total
[ 2023-10-08 02:56:12 ] Completed Epoch: 13 batch 883: moving batch data to device 6.398 ms, 35.42 s total
[ 2023-10-08 02:56:13 ] Completed Epoch: 13 batch 883: forward pass 103.889 ms, 35.53 s total
[ 2023-10-08 02:56:13 ] Completed Epoch: 13 batch 883: backward pass 78.984 ms, 35.61 s total
[ 2023-10-08 02:56:13 ] Completed Epoch: 13 batch 883: computing loss 91.017 ms, 35.70 s total
EPOCH: [13], BATCH: [883/889], loss: 0.361, loss_box_reg: 0.111, loss_classifier: 0.090, loss_mask: 0.124, loss_objectness: 0.013, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 883
[ 2023-10-08 02:56:15 ] Completed saving temp checkpoint 1,909.669 ms, 37.61 s total
[ 2023-10-08 02:56:15 ] Completed replacing temp checkpoint with checkpoint 93.062 ms, 37.70 s total
[ 2023-10-08 02:56:15 ] Completed Epoch: 13 batch 884: moving batch data to device 7.969 ms, 37.71 s total
[ 2023-10-08 02:56:15 ] Completed Epoch: 13 batch 884: forward pass 106.843 ms, 37.81 s total
[ 2023-10-08 02:56:15 ] Completed Epoch: 13 batch 884: backward pass 53.372 ms, 37.87 s total
[ 2023-10-08 02:56:15 ] Completed Epoch: 13 batch 884: computing loss 128.459 ms, 38.00 s total
EPOCH: [13], BATCH: [884/889], loss: 0.348, loss_box_reg: 0.105, loss_classifier: 0.083, loss_mask: 0.125, loss_objectness: 0.013, loss_rpn_box_reg: 0.022
Saving checkpoint at epoch 13 train batch 884
[ 2023-10-08 02:56:16 ] Completed saving temp checkpoint 1,341.290 ms, 39.34 s total
[ 2023-10-08 02:56:16 ] Completed replacing temp checkpoint with checkpoint 48.099 ms, 39.39 s total
[ 2023-10-08 02:56:16 ] Completed Epoch: 13 batch 885: moving batch data to device 6.728 ms, 39.39 s total
[ 2023-10-08 02:56:17 ] Completed Epoch: 13 batch 885: forward pass 106.381 ms, 39.50 s total
[ 2023-10-08 02:56:17 ] Completed Epoch: 13 batch 885: backward pass 46.268 ms, 39.55 s total
[ 2023-10-08 02:56:17 ] Completed Epoch: 13 batch 885: computing loss 138.231 ms, 39.68 s total
EPOCH: [13], BATCH: [885/889], loss: 0.351, loss_box_reg: 0.103, loss_classifier: 0.087, loss_mask: 0.125, loss_objectness: 0.015, loss_rpn_box_reg: 0.021
Saving checkpoint at epoch 13 train batch 885
[ 2023-10-08 02:56:19 ] Completed saving temp checkpoint 1,931.988 ms, 41.62 s total
[ 2023-10-08 02:56:19 ] Completed replacing temp checkpoint with checkpoint 98.443 ms, 41.71 s total
[ 2023-10-08 02:56:19 ] Completed Epoch: 13 batch 886: moving batch data to device 8.433 ms, 41.72 s total
[ 2023-10-08 02:56:19 ] Completed Epoch: 13 batch 886: forward pass 113.442 ms, 41.84 s total
[ 2023-10-08 02:56:19 ] Completed Epoch: 13 batch 886: backward pass 80.379 ms, 41.92 s total
[ 2023-10-08 02:56:19 ] Completed Epoch: 13 batch 886: computing loss 109.302 ms, 42.03 s total
EPOCH: [13], BATCH: [886/889], loss: 0.365, loss_box_reg: 0.111, loss_classifier: 0.088, loss_mask: 0.129, loss_objectness: 0.014, loss_rpn_box_reg: 0.024
Saving checkpoint at epoch 13 train batch 886
[ 2023-10-08 02:56:20 ] Completed saving temp checkpoint 1,210.242 ms, 43.24 s total
[ 2023-10-08 02:56:20 ] Completed replacing temp checkpoint with checkpoint 49.097 ms, 43.28 s total
[ 2023-10-08 02:56:20 ] Completed Epoch: 13 batch 887: moving batch data to device 5.494 ms, 43.29 s total
[ 2023-10-08 02:56:20 ] Completed Epoch: 13 batch 887: forward pass 105.226 ms, 43.40 s total
[ 2023-10-08 02:56:20 ] Completed Epoch: 13 batch 887: backward pass 85.913 ms, 43.48 s total
[ 2023-10-08 02:56:21 ] Completed Epoch: 13 batch 887: computing loss 104.786 ms, 43.59 s total
EPOCH: [13], BATCH: [887/889], loss: 0.317, loss_box_reg: 0.095, loss_classifier: 0.072, loss_mask: 0.117, loss_objectness: 0.013, loss_rpn_box_reg: 0.020
Saving checkpoint at epoch 13 train batch 887
[ 2023-10-08 02:56:22 ] Completed saving temp checkpoint 1,153.446 ms, 44.74 s total
[ 2023-10-08 02:56:22 ] Completed replacing temp checkpoint with checkpoint 85.360 ms, 44.83 s total
[ 2023-10-08 02:56:22 ] Completed Epoch: 13 batch 888: moving batch data to device 7.453 ms, 44.83 s total
[ 2023-10-08 02:56:22 ] Completed Epoch: 13 batch 888: forward pass 106.699 ms, 44.94 s total
[ 2023-10-08 02:56:22 ] Completed Epoch: 13 batch 888: backward pass 80.609 ms, 45.02 s total
[ 2023-10-08 02:56:22 ] Completed Epoch: 13 batch 888: computing loss 108.982 ms, 45.13 s total
EPOCH: [13], BATCH: [888/889], loss: 0.358, loss_box_reg: 0.106, loss_classifier: 0.087, loss_mask: 0.126, loss_objectness: 0.014, loss_rpn_box_reg: 0.025
Saving checkpoint at epoch 13 train batch 888
[ 2023-10-08 02:56:24 ] Completed saving temp checkpoint 1,389.753 ms, 46.52 s total
[ 2023-10-08 02:56:24 ] Completed replacing temp checkpoint with checkpoint 64.811 ms, 46.58 s total
[ 2023-10-08 02:56:24 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 02:56:24 ] Completed starting evaluation routine 0.116 ms, 0.00 s total
[ 2023-10-08 02:56:24 ] Completed evaluation preliminaries 17.236 ms, 0.02 s total
Evaluating / resuming epoch 13 from eval step 0
[ 2023-10-08 02:56:24 ] Completed Epoch 13 batch: 0 moving to device 254.961 ms, 0.27 s total
[ 2023-10-08 02:56:24 ] Completed Epoch 13 batch: 0 forward through model 93.982 ms, 0.37 s total
[ 2023-10-08 02:56:24 ] Completed Epoch 13 batch: 0 outputs back to cpu 3.281 ms, 0.37 s total
[ 2023-10-08 02:56:24 ] Completed Epoch 13 batch: 0 update evaluator 17.399 ms, 0.39 s total
Saving checkpoint at epoch 13 eval batch 0
[ 2023-10-08 02:56:26 ] Completed saving temp checkpoint 1,833.747 ms, 2.22 s total
[ 2023-10-08 02:56:26 ] Completed replacing temp checkpoint with checkpoint 110.866 ms, 2.33 s total
[ 2023-10-08 02:56:26 ] Completed Epoch 13 batch: 1 moving to device 9.863 ms, 2.34 s total
[ 2023-10-08 02:56:26 ] Completed Epoch 13 batch: 1 forward through model 112.951 ms, 2.45 s total
[ 2023-10-08 02:56:26 ] Completed Epoch 13 batch: 1 outputs back to cpu 34.341 ms, 2.49 s total
[ 2023-10-08 02:56:26 ] Completed Epoch 13 batch: 1 update evaluator 63.790 ms, 2.55 s total
Saving checkpoint at epoch 13 eval batch 1
[ 2023-10-08 02:56:27 ] Completed saving temp checkpoint 997.910 ms, 3.55 s total
[ 2023-10-08 02:56:27 ] Completed replacing temp checkpoint with checkpoint 71.666 ms, 3.62 s total
[ 2023-10-08 02:56:27 ] Completed Epoch 13 batch: 2 moving to device 5.771 ms, 3.63 s total
[ 2023-10-08 02:56:27 ] Completed Epoch 13 batch: 2 forward through model 119.182 ms, 3.75 s total
[ 2023-10-08 02:56:27 ] Completed Epoch 13 batch: 2 outputs back to cpu 33.820 ms, 3.78 s total
[ 2023-10-08 02:56:28 ] Completed Epoch 13 batch: 2 update evaluator 76.541 ms, 3.86 s total
Saving checkpoint at epoch 13 eval batch 2
[ 2023-10-08 02:56:29 ] Completed saving temp checkpoint 1,106.024 ms, 4.96 s total
[ 2023-10-08 02:56:29 ] Completed replacing temp checkpoint with checkpoint 90.148 ms, 5.05 s total
[ 2023-10-08 02:56:29 ] Completed Epoch 13 batch: 3 moving to device 1.613 ms, 5.06 s total
[ 2023-10-08 02:56:29 ] Completed Epoch 13 batch: 3 forward through model 76.913 ms, 5.13 s total
[ 2023-10-08 02:56:29 ] Completed Epoch 13 batch: 3 outputs back to cpu 0.632 ms, 5.13 s total
[ 2023-10-08 02:56:29 ] Completed Epoch 13 batch: 3 update evaluator 11.017 ms, 5.14 s total
Saving checkpoint at epoch 13 eval batch 3
[ 2023-10-08 02:56:30 ] Completed saving temp checkpoint 961.210 ms, 6.10 s total
[ 2023-10-08 02:56:30 ] Completed replacing temp checkpoint with checkpoint 78.496 ms, 6.18 s total
[ 2023-10-08 02:56:30 ] Completed Epoch 13 batch: 4 moving to device 1.717 ms, 6.19 s total
[ 2023-10-08 02:56:30 ] Completed Epoch 13 batch: 4 forward through model 44.572 ms, 6.23 s total
[ 2023-10-08 02:56:30 ] Completed Epoch 13 batch: 4 outputs back to cpu 1.907 ms, 6.23 s total
[ 2023-10-08 02:56:30 ] Completed Epoch 13 batch: 4 update evaluator 10.184 ms, 6.24 s total
Saving checkpoint at epoch 13 eval batch 4
[ 2023-10-08 02:56:31 ] Completed saving temp checkpoint 1,055.995 ms, 7.30 s total
[ 2023-10-08 02:56:31 ] Completed replacing temp checkpoint with checkpoint 56.911 ms, 7.35 s total
[ 2023-10-08 02:56:31 ] Completed Epoch 13 batch: 5 moving to device 1.952 ms, 7.36 s total
[ 2023-10-08 02:56:31 ] Completed Epoch 13 batch: 5 forward through model 96.128 ms, 7.45 s total
[ 2023-10-08 02:56:31 ] Completed Epoch 13 batch: 5 outputs back to cpu 10.011 ms, 7.46 s total
[ 2023-10-08 02:56:31 ] Completed Epoch 13 batch: 5 update evaluator 18.029 ms, 7.48 s total
Saving checkpoint at epoch 13 eval batch 5
[ 2023-10-08 02:56:32 ] Completed saving temp checkpoint 1,055.148 ms, 8.54 s total
[ 2023-10-08 02:56:32 ] Completed replacing temp checkpoint with checkpoint 69.606 ms, 8.61 s total
[ 2023-10-08 02:56:32 ] Completed Epoch 13 batch: 6 moving to device 3.607 ms, 8.61 s total
[ 2023-10-08 02:56:32 ] Completed Epoch 13 batch: 6 forward through model 49.807 ms, 8.66 s total
[ 2023-10-08 02:56:32 ] Completed Epoch 13 batch: 6 outputs back to cpu 1.794 ms, 8.66 s total
[ 2023-10-08 02:56:32 ] Completed Epoch 13 batch: 6 update evaluator 15.729 ms, 8.68 s total
Saving checkpoint at epoch 13 eval batch 6
[ 2023-10-08 02:56:34 ] Completed saving temp checkpoint 1,157.452 ms, 9.83 s total
[ 2023-10-08 02:56:34 ] Completed replacing temp checkpoint with checkpoint 62.869 ms, 9.90 s total
[ 2023-10-08 02:56:34 ] Completed Epoch 13 batch: 7 moving to device 1.237 ms, 9.90 s total
[ 2023-10-08 02:56:34 ] Completed Epoch 13 batch: 7 forward through model 71.608 ms, 9.97 s total
[ 2023-10-08 02:56:34 ] Completed Epoch 13 batch: 7 outputs back to cpu 0.495 ms, 9.97 s total
[ 2023-10-08 02:56:34 ] Completed Epoch 13 batch: 7 update evaluator 5.886 ms, 9.98 s total
Saving checkpoint at epoch 13 eval batch 7
[ 2023-10-08 02:56:35 ] Completed saving temp checkpoint 1,058.776 ms, 11.03 s total
[ 2023-10-08 02:56:35 ] Completed replacing temp checkpoint with checkpoint 62.169 ms, 11.10 s total
[ 2023-10-08 02:56:35 ] Completed Epoch 13 batch: 8 moving to device 1.170 ms, 11.10 s total
[ 2023-10-08 02:56:35 ] Completed Epoch 13 batch: 8 forward through model 55.248 ms, 11.15 s total
[ 2023-10-08 02:56:35 ] Completed Epoch 13 batch: 8 outputs back to cpu 13.509 ms, 11.17 s total
[ 2023-10-08 02:56:35 ] Completed Epoch 13 batch: 8 update evaluator 23.607 ms, 11.19 s total
Saving checkpoint at epoch 13 eval batch 8
[ 2023-10-08 02:56:36 ] Completed saving temp checkpoint 1,370.896 ms, 12.56 s total
[ 2023-10-08 02:56:36 ] Completed replacing temp checkpoint with checkpoint 60.258 ms, 12.62 s total
[ 2023-10-08 02:56:36 ] Completed Epoch 13 batch: 9 moving to device 2.620 ms, 12.62 s total
[ 2023-10-08 02:56:36 ] Completed Epoch 13 batch: 9 forward through model 59.546 ms, 12.68 s total
[ 2023-10-08 02:56:36 ] Completed Epoch 13 batch: 9 outputs back to cpu 23.720 ms, 12.71 s total
[ 2023-10-08 02:56:36 ] Completed Epoch 13 batch: 9 update evaluator 45.755 ms, 12.75 s total
Saving checkpoint at epoch 13 eval batch 9
[ 2023-10-08 02:56:38 ] Completed saving temp checkpoint 1,240.061 ms, 13.99 s total
[ 2023-10-08 02:56:38 ] Completed replacing temp checkpoint with checkpoint 74.170 ms, 14.07 s total
[ 2023-10-08 02:56:38 ] Completed Epoch 13 batch: 10 moving to device 3.015 ms, 14.07 s total
[ 2023-10-08 02:56:38 ] Completed Epoch 13 batch: 10 forward through model 61.416 ms, 14.13 s total
[ 2023-10-08 02:56:38 ] Completed Epoch 13 batch: 10 outputs back to cpu 25.763 ms, 14.16 s total
[ 2023-10-08 02:56:38 ] Completed Epoch 13 batch: 10 update evaluator 50.560 ms, 14.21 s total
Saving checkpoint at epoch 13 eval batch 10
[ 2023-10-08 02:56:40 ] Completed saving temp checkpoint 1,844.147 ms, 16.05 s total
[ 2023-10-08 02:56:40 ] Completed replacing temp checkpoint with checkpoint 106.231 ms, 16.16 s total
[ 2023-10-08 02:56:40 ] Completed Epoch 13 batch: 11 moving to device 4.444 ms, 16.16 s total
[ 2023-10-08 02:56:40 ] Completed Epoch 13 batch: 11 forward through model 82.146 ms, 16.25 s total
[ 2023-10-08 02:56:40 ] Completed Epoch 13 batch: 11 outputs back to cpu 0.585 ms, 16.25 s total
[ 2023-10-08 02:56:40 ] Completed Epoch 13 batch: 11 update evaluator 10.273 ms, 16.26 s total
Saving checkpoint at epoch 13 eval batch 11
[ 2023-10-08 02:56:41 ] Completed saving temp checkpoint 1,109.043 ms, 17.37 s total
[ 2023-10-08 02:56:41 ] Completed replacing temp checkpoint with checkpoint 53.566 ms, 17.42 s total
[ 2023-10-08 02:56:41 ] Completed Epoch 13 batch: 12 moving to device 3.107 ms, 17.42 s total
[ 2023-10-08 02:56:41 ] Completed Epoch 13 batch: 12 forward through model 49.196 ms, 17.47 s total
[ 2023-10-08 02:56:41 ] Completed Epoch 13 batch: 12 outputs back to cpu 2.761 ms, 17.47 s total
[ 2023-10-08 02:56:41 ] Completed Epoch 13 batch: 12 update evaluator 15.456 ms, 17.49 s total
Saving checkpoint at epoch 13 eval batch 12
[ 2023-10-08 02:56:43 ] Completed saving temp checkpoint 1,693.807 ms, 19.18 s total
[ 2023-10-08 02:56:43 ] Completed replacing temp checkpoint with checkpoint 113.595 ms, 19.30 s total
[ 2023-10-08 02:56:43 ] Completed Epoch 13 batch: 13 moving to device 4.056 ms, 19.30 s total
[ 2023-10-08 02:56:43 ] Completed Epoch 13 batch: 13 forward through model 51.970 ms, 19.35 s total
[ 2023-10-08 02:56:43 ] Completed Epoch 13 batch: 13 outputs back to cpu 9.337 ms, 19.36 s total
[ 2023-10-08 02:56:43 ] Completed Epoch 13 batch: 13 update evaluator 19.254 ms, 19.38 s total
Saving checkpoint at epoch 13 eval batch 13
[ 2023-10-08 02:56:45 ] Completed saving temp checkpoint 1,536.467 ms, 20.92 s total
[ 2023-10-08 02:56:45 ] Completed replacing temp checkpoint with checkpoint 90.446 ms, 21.01 s total
[ 2023-10-08 02:56:45 ] Completed Epoch 13 batch: 14 moving to device 3.544 ms, 21.01 s total
[ 2023-10-08 02:56:45 ] Completed Epoch 13 batch: 14 forward through model 54.640 ms, 21.07 s total
[ 2023-10-08 02:56:45 ] Completed Epoch 13 batch: 14 outputs back to cpu 12.330 ms, 21.08 s total
[ 2023-10-08 02:56:45 ] Completed Epoch 13 batch: 14 update evaluator 26.097 ms, 21.10 s total
Saving checkpoint at epoch 13 eval batch 14
[ 2023-10-08 02:56:46 ] Completed saving temp checkpoint 1,496.570 ms, 22.60 s total
[ 2023-10-08 02:56:46 ] Completed replacing temp checkpoint with checkpoint 75.320 ms, 22.68 s total
[ 2023-10-08 02:56:46 ] Completed Epoch 13 batch: 15 moving to device 4.263 ms, 22.68 s total
[ 2023-10-08 02:56:46 ] Completed Epoch 13 batch: 15 forward through model 102.364 ms, 22.78 s total
[ 2023-10-08 02:56:47 ] Completed Epoch 13 batch: 15 outputs back to cpu 32.962 ms, 22.82 s total
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[ 2023-10-08 03:10:00 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 03:10:00 ] Completed importing Timer 0.021 ms, 0.00 s total
[ 2023-10-08 03:10:01 ] Completed importing everything else 590.312 ms, 0.59 s total
[ 2023-10-08 03:10:01 ] Completed defined other functions 0.021 ms, 0.59 s total
| distributed init (rank 3): env://
| distributed init (rank 0): env://
| distributed init (rank 4): env://
| distributed init (rank 2): env://
| distributed init (rank 5): env://
| distributed init (rank 1): env://
Namespace(data_path='/mnt/.node1/Open-Datasets/coco', dataset='coco', model='maskrcnn_resnet101_fpn', device='cuda', batch_size=2, epochs=26, workers=4, opt='sgd', lr=0.02, momentum=0.9, weight_decay=0.0001, norm_weight_decay=None, lr_scheduler='multisteplr', lr_step_size=8, lr_steps=[16, 22], lr_gamma=0.1, print_freq=1, output_dir='.', start_epoch=0, resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/checkpoint.isc', prev_resume='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/c0d07169-cc2a-45f7-a55c-1cfc3cb731b4/checkpoint.isc', tboard_path='/mnt/Client/Adamstn3rh22tykvgyhdkclook3rnk7q/adaadam4qalumfvjdstjpx7zyvlebh2u/outputs/maskrcnn_resnet101_fpn/f8fa15ea-dcc3-49f2-b57e-184f6a7bb619/tb', aspect_ratio_group_factor=3, rpn_score_thresh=None, trainable_backbone_layers=None, data_augmentation='hflip', sync_bn=False, test_only=False, use_deterministic_algorithms=False, world_size=66, dist_url='env://', weights=None, weights_backbone=None, amp=False, use_copypaste=False, backend='pil', use_v2=False, rank=0, gpu=0, distributed=True, dist_backend='nccl')
[ 2023-10-08 03:10:08 ] Completed main preliminaries 7,424.900 ms, 8.02 s total
loading annotations into memory...
Done (t=10.19s)
creating index...
index created!
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
[ 2023-10-08 03:10:20 ] Completed loading data 11,899.787 ms, 19.92 s total
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158]
[ 2023-10-08 03:10:20 ] Completed creating data samplers 96.352 ms, 20.01 s total
[ 2023-10-08 03:10:20 ] Completed creating data loaders 0.206 ms, 20.01 s total
[ 2023-10-08 03:10:21 ] Completed creating model and .to(device) 645.497 ms, 20.66 s total
[ 2023-10-08 03:10:23 ] Completed preparing model for distributed training 2,345.412 ms, 23.00 s total
[ 2023-10-08 03:10:23 ] Completed optimizer and scaler 0.607 ms, 23.00 s total
[ 2023-10-08 03:10:23 ] Completed learning rate schedulers 0.258 ms, 23.00 s total
[ 2023-10-08 03:10:24 ] Completed init coco evaluator 951.904 ms, 23.96 s total
RESUMING FROM CURRENT JOB
[ 2023-10-08 03:10:25 ] Completed retrieving checkpoint 988.624 ms, 24.94 s total
EPOCH :: 13
[ 2023-10-08 03:10:25 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 03:10:25 ] Completed training preliminaries 0.853 ms, 0.00 s total
Training / resuming epoch 13 from training step 889
[ 2023-10-08 03:10:26 ] Completed Start 0.000 ms, 0.00 s total
[ 2023-10-08 03:10:26 ] Completed starting evaluation routine 0.099 ms, 0.00 s total
[ 2023-10-08 03:10:26 ] Completed evaluation preliminaries 38.742 ms, 0.04 s total
Evaluating / resuming epoch 13 from eval step 15
[ 2023-10-08 03:10:26 ] Completed Epoch 13 batch: 15 moving to device 317.782 ms, 0.36 s total
[ 2023-10-08 03:10:27 ] Completed Epoch 13 batch: 15 forward through model 680.338 ms, 1.04 s total
[ 2023-10-08 03:10:27 ] Completed Epoch 13 batch: 15 outputs back to cpu 34.189 ms, 1.07 s total
[ 2023-10-08 03:10:27 ] Completed Epoch 13 batch: 15 update evaluator 60.929 ms, 1.13 s total
Saving checkpoint at epoch 13 eval batch 15
[ 2023-10-08 03:10:28 ] Completed saving temp checkpoint 880.177 ms, 2.01 s total
[ 2023-10-08 03:10:28 ] Completed replacing temp checkpoint with checkpoint 188.812 ms, 2.20 s total
[ 2023-10-08 03:10:28 ] Completed Epoch 13 batch: 16 moving to device 2.228 ms, 2.20 s total
[ 2023-10-08 03:10:28 ] Completed Epoch 13 batch: 16 forward through model 116.323 ms, 2.32 s total
[ 2023-10-08 03:10:28 ] Completed Epoch 13 batch: 16 outputs back to cpu 28.430 ms, 2.35 s total
[ 2023-10-08 03:10:28 ] Completed Epoch 13 batch: 16 update evaluator 59.894 ms, 2.41 s total
Saving checkpoint at epoch 13 eval batch 16
[ 2023-10-08 03:10:30 ] Completed saving temp checkpoint 1,429.590 ms, 3.84 s total
[ 2023-10-08 03:10:30 ] Completed replacing temp checkpoint with checkpoint 64.579 ms, 3.90 s total
[ 2023-10-08 03:10:30 ] Completed Epoch 13 batch: 17 moving to device 1.801 ms, 3.90 s total
[ 2023-10-08 03:10:30 ] Completed Epoch 13 batch: 17 forward through model 123.185 ms, 4.03 s total
[ 2023-10-08 03:10:30 ] Completed Epoch 13 batch: 17 outputs back to cpu 23.601 ms, 4.05 s total
[ 2023-10-08 03:10:30 ] Completed Epoch 13 batch: 17 update evaluator 57.694 ms, 4.11 s total
Saving checkpoint at epoch 13 eval batch 17
[ 2023-10-08 03:10:31 ] Completed saving temp checkpoint 1,328.422 ms, 5.44 s total
[ 2023-10-08 03:10:31 ] Completed replacing temp checkpoint with checkpoint 91.615 ms, 5.53 s total
[ 2023-10-08 03:10:31 ] Completed Epoch 13 batch: 18 moving to device 8.407 ms, 5.54 s total
[ 2023-10-08 03:10:32 ] Completed Epoch 13 batch: 18 forward through model 55.460 ms, 5.59 s total
[ 2023-10-08 03:10:32 ] Completed Epoch 13 batch: 18 outputs back to cpu 38.000 ms, 5.63 s total
[ 2023-10-08 03:10:32 ] Completed Epoch 13 batch: 18 update evaluator 77.280 ms, 5.71 s total
Saving checkpoint at epoch 13 eval batch 18
[ 2023-10-08 03:10:33 ] Completed saving temp checkpoint 1,681.993 ms, 7.39 s total
[ 2023-10-08 03:10:33 ] Completed replacing temp checkpoint with checkpoint 70.107 ms, 7.46 s total
[ 2023-10-08 03:10:33 ] Completed Epoch 13 batch: 19 moving to device 1.701 ms, 7.46 s total
[ 2023-10-08 03:10:34 ] Completed Epoch 13 batch: 19 forward through model 92.426 ms, 7.55 s total
[ 2023-10-08 03:10:34 ] Completed Epoch 13 batch: 19 outputs back to cpu 2.035 ms, 7.56 s total
[ 2023-10-08 03:10:34 ] Completed Epoch 13 batch: 19 update evaluator 18.178 ms, 7.57 s total
Saving checkpoint at epoch 13 eval batch 19
[ 2023-10-08 03:10:35 ] Completed saving temp checkpoint 1,016.275 ms, 8.59 s total
[ 2023-10-08 03:10:35 ] Completed replacing temp checkpoint with checkpoint 75.480 ms, 8.67 s total
[ 2023-10-08 03:10:35 ] Completed Epoch 13 batch: 20 moving to device 1.634 ms, 8.67 s total
[ 2023-10-08 03:10:35 ] Completed Epoch 13 batch: 20 forward through model 85.484 ms, 8.75 s total
[ 2023-10-08 03:10:35 ] Completed Epoch 13 batch: 20 outputs back to cpu 0.236 ms, 8.75 s total
[ 2023-10-08 03:10:35 ] Completed Epoch 13 batch: 20 update evaluator 3.921 ms, 8.76 s total
Saving checkpoint at epoch 13 eval batch 20
[ 2023-10-08 03:10:36 ] Completed saving temp checkpoint 1,176.232 ms, 9.93 s total
[ 2023-10-08 03:10:36 ] Completed replacing temp checkpoint with checkpoint 67.784 ms, 10.00 s total
[ 2023-10-08 03:10:36 ] Completed Epoch 13 batch: 21 moving to device 1.686 ms, 10.00 s total
[ 2023-10-08 03:10:36 ] Completed Epoch 13 batch: 21 forward through model 51.000 ms, 10.05 s total
[ 2023-10-08 03:10:36 ] Completed Epoch 13 batch: 21 outputs back to cpu 2.648 ms, 10.06 s total
[ 2023-10-08 03:10:36 ] Completed Epoch 13 batch: 21 update evaluator 18.715 ms, 10.08 s total
Saving checkpoint at epoch 13 eval batch 21
[ 2023-10-08 03:10:37 ] Completed saving temp checkpoint 1,094.010 ms, 11.17 s total
[ 2023-10-08 03:10:37 ] Completed replacing temp checkpoint with checkpoint 72.610 ms, 11.24 s total
[ 2023-10-08 03:10:37 ] Completed Epoch 13 batch: 22 moving to device 2.736 ms, 11.24 s total
[ 2023-10-08 03:10:37 ] Completed Epoch 13 batch: 22 forward through model 53.906 ms, 11.30 s total
[ 2023-10-08 03:10:37 ] Completed Epoch 13 batch: 22 outputs back to cpu 0.532 ms, 11.30 s total
[ 2023-10-08 03:10:37 ] Completed Epoch 13 batch: 22 update evaluator 7.192 ms, 11.31 s total
Saving checkpoint at epoch 13 eval batch 22
[ 2023-10-08 03:10:38 ] Completed saving temp checkpoint 1,226.853 ms, 12.53 s total
[ 2023-10-08 03:10:39 ] Completed replacing temp checkpoint with checkpoint 65.240 ms, 12.60 s total
[ 2023-10-08 03:10:39 ] Completed Epoch 13 batch: 23 moving to device 2.618 ms, 12.60 s total
[ 2023-10-08 03:10:39 ] Completed Epoch 13 batch: 23 forward through model 54.172 ms, 12.65 s total
[ 2023-10-08 03:10:39 ] Completed Epoch 13 batch: 23 outputs back to cpu 18.088 ms, 12.67 s total
[ 2023-10-08 03:10:39 ] Completed Epoch 13 batch: 23 update evaluator 28.648 ms, 12.70 s total
Saving checkpoint at epoch 13 eval batch 23
[ 2023-10-08 03:10:40 ] Completed saving temp checkpoint 1,077.279 ms, 13.78 s total
[ 2023-10-08 03:10:40 ] Completed replacing temp checkpoint with checkpoint 79.412 ms, 13.86 s total
[ 2023-10-08 03:10:40 ] Completed Epoch 13 batch: 24 moving to device 3.492 ms, 13.86 s total
[ 2023-10-08 03:10:40 ] Completed Epoch 13 batch: 24 forward through model 74.582 ms, 13.94 s total
[ 2023-10-08 03:10:40 ] Completed Epoch 13 batch: 24 outputs back to cpu 33.964 ms, 13.97 s total
[ 2023-10-08 03:10:40 ] Completed Epoch 13 batch: 24 update evaluator 84.066 ms, 14.05 s total
Saving checkpoint at epoch 13 eval batch 24
[ 2023-10-08 03:10:41 ] Completed saving temp checkpoint 1,207.352 ms, 15.26 s total
[ 2023-10-08 03:10:41 ] Completed replacing temp checkpoint with checkpoint 85.369 ms, 15.35 s total
[ 2023-10-08 03:10:41 ] Completed Epoch 13 batch: 25 moving to device 4.297 ms, 15.35 s total
[ 2023-10-08 03:10:41 ] Completed Epoch 13 batch: 25 forward through model 49.299 ms, 15.40 s total
[ 2023-10-08 03:10:41 ] Completed Epoch 13 batch: 25 outputs back to cpu 1.788 ms, 15.40 s total
[ 2023-10-08 03:10:41 ] Completed Epoch 13 batch: 25 update evaluator 19.332 ms, 15.42 s total
Saving checkpoint at epoch 13 eval batch 25
[ 2023-10-08 03:10:42 ] Completed saving temp checkpoint 1,090.093 ms, 16.51 s total
[ 2023-10-08 03:10:43 ] Completed replacing temp checkpoint with checkpoint 76.838 ms, 16.59 s total
[ 2023-10-08 03:10:43 ] Completed Epoch 13 batch: 26 moving to device 3.862 ms, 16.59 s total
[ 2023-10-08 03:10:43 ] Completed Epoch 13 batch: 26 forward through model 52.686 ms, 16.65 s total
[ 2023-10-08 03:10:43 ] Completed Epoch 13 batch: 26 outputs back to cpu 15.129 ms, 16.66 s total
[ 2023-10-08 03:10:43 ] Completed Epoch 13 batch: 26 update evaluator 30.085 ms, 16.69 s total
Saving checkpoint at epoch 13 eval batch 26
[ 2023-10-08 03:10:44 ] Completed saving temp checkpoint 1,292.559 ms, 17.98 s total
[ 2023-10-08 03:10:44 ] Completed replacing temp checkpoint with checkpoint 84.476 ms, 18.07 s total
[ 2023-10-08 03:10:44 ] Completed Epoch 13 batch: 27 moving to device 4.664 ms, 18.07 s total
[ 2023-10-08 03:10:44 ] Completed Epoch 13 batch: 27 forward through model 55.312 ms, 18.13 s total
[ 2023-10-08 03:10:44 ] Completed Epoch 13 batch: 27 outputs back to cpu 15.942 ms, 18.14 s total
[ 2023-10-08 03:10:44 ] Completed Epoch 13 batch: 27 update evaluator 30.231 ms, 18.17 s total
Saving checkpoint at epoch 13 eval batch 27
[ 2023-10-08 03:10:46 ] Completed saving temp checkpoint 1,672.633 ms, 19.85 s total
[ 2023-10-08 03:10:46 ] Completed replacing temp checkpoint with checkpoint 51.342 ms, 19.90 s total
[ 2023-10-08 03:10:46 ] Completed Epoch 13 batch: 28 moving to device 4.533 ms, 19.90 s total
[ 2023-10-08 03:10:46 ] Completed Epoch 13 batch: 28 forward through model 92.320 ms, 19.99 s total
[ 2023-10-08 03:10:46 ] Completed Epoch 13 batch: 28 outputs back to cpu 9.571 ms, 20.00 s total
[ 2023-10-08 03:10:46 ] Completed Epoch 13 batch: 28 update evaluator 20.959 ms, 20.03 s total
Saving checkpoint at epoch 13 eval batch 28
[ 2023-10-08 03:10:48 ] Completed saving temp checkpoint 1,763.604 ms, 21.79 s total
[ 2023-10-08 03:10:48 ] Completed replacing temp checkpoint with checkpoint 92.948 ms, 21.88 s total
[ 2023-10-08 03:10:48 ] Completed Epoch 13 batch: 29 moving to device 2.655 ms, 21.88 s total
[ 2023-10-08 03:10:48 ] Completed Epoch 13 batch: 29 forward through model 47.961 ms, 21.93 s total
[ 2023-10-08 03:10:48 ] Completed Epoch 13 batch: 29 outputs back to cpu 1.459 ms, 21.93 s total
[ 2023-10-08 03:10:48 ] Completed Epoch 13 batch: 29 update evaluator 14.077 ms, 21.95 s total
Saving checkpoint at epoch 13 eval batch 29
[ 2023-10-08 03:10:49 ] Completed saving temp checkpoint 1,244.354 ms, 23.19 s total
[ 2023-10-08 03:10:49 ] Completed replacing temp checkpoint with checkpoint 60.819 ms, 23.25 s total
[ 2023-10-08 03:10:49 ] Completed Epoch 13 batch: 30 moving to device 2.840 ms, 23.26 s total
[ 2023-10-08 03:10:49 ] Completed Epoch 13 batch: 30 forward through model 76.834 ms, 23.33 s total
[ 2023-10-08 03:10:49 ] Completed Epoch 13 batch: 30 outputs back to cpu 5.922 ms, 23.34 s total
[ 2023-10-08 03:10:49 ] Completed Epoch 13 batch: 30 update evaluator 17.096 ms, 23.36 s total
Saving checkpoint at epoch 13 eval batch 30
[ 2023-10-08 03:10:51 ] Completed saving temp checkpoint 1,436.404 ms, 24.79 s total
[ 2023-10-08 03:10:51 ] Completed replacing temp checkpoint with checkpoint 80.917 ms, 24.87 s total
[ 2023-10-08 03:10:51 ] Completed Epoch 13 batch: 31 moving to device 3.006 ms, 24.88 s total
[ 2023-10-08 03:10:51 ] Completed Epoch 13 batch: 31 forward through model 43.127 ms, 24.92 s total
[ 2023-10-08 03:10:51 ] Completed Epoch 13 batch: 31 outputs back to cpu 17.243 ms, 24.94 s total
[ 2023-10-08 03:10:51 ] Completed Epoch 13 batch: 31 update evaluator 28.655 ms, 24.97 s total
Saving checkpoint at epoch 13 eval batch 31
[ 2023-10-08 03:10:54 ] Completed saving temp checkpoint 2,732.500 ms, 27.70 s total
[ 2023-10-08 03:10:54 ] Completed replacing temp checkpoint with checkpoint 66.451 ms, 27.76 s total
[ 2023-10-08 03:10:54 ] Completed Epoch 13 batch: 32 moving to device 2.839 ms, 27.77 s total
[ 2023-10-08 03:10:54 ] Completed Epoch 13 batch: 32 forward through model 54.000 ms, 27.82 s total
[ 2023-10-08 03:10:54 ] Completed Epoch 13 batch: 32 outputs back to cpu 3.098 ms, 27.82 s total
[ 2023-10-08 03:10:54 ] Completed Epoch 13 batch: 32 update evaluator 25.493 ms, 27.85 s total
Saving checkpoint at epoch 13 eval batch 32
[ 2023-10-08 03:10:55 ] Completed saving temp checkpoint 1,447.735 ms, 29.30 s total
[ 2023-10-08 03:10:55 ] Completed replacing temp checkpoint with checkpoint 53.265 ms, 29.35 s total
[ 2023-10-08 03:10:55 ] Completed Epoch 13 batch: 33 moving to device 2.733 ms, 29.35 s total
[ 2023-10-08 03:10:55 ] Completed Epoch 13 batch: 33 forward through model 45.863 ms, 29.40 s total
[ 2023-10-08 03:10:55 ] Completed Epoch 13 batch: 33 outputs back to cpu 7.476 ms, 29.41 s total
[ 2023-10-08 03:10:55 ] Completed Epoch 13 batch: 33 update evaluator 17.130 ms, 29.42 s total
Saving checkpoint at epoch 13 eval batch 33
[ 2023-10-08 03:10:57 ] Completed saving temp checkpoint 1,319.541 ms, 30.74 s total
[ 2023-10-08 03:10:57 ] Completed replacing temp checkpoint with checkpoint 88.119 ms, 30.83 s total
[ 2023-10-08 03:10:57 ] Completed Epoch 13 batch: 34 moving to device 3.924 ms, 30.84 s total
[ 2023-10-08 03:10:57 ] Completed Epoch 13 batch: 34 forward through model 63.677 ms, 30.90 s total
[ 2023-10-08 03:10:57 ] Completed Epoch 13 batch: 34 outputs back to cpu 33.674 ms, 30.93 s total
[ 2023-10-08 03:10:57 ] Completed Epoch 13 batch: 34 update evaluator 61.421 ms, 30.99 s total
Saving checkpoint at epoch 13 eval batch 34
[ 2023-10-08 03:10:58 ] Completed saving temp checkpoint 1,440.214 ms, 32.43 s total
[ 2023-10-08 03:10:58 ] Completed replacing temp checkpoint with checkpoint 90.246 ms, 32.52 s total
[ 2023-10-08 03:10:58 ] Completed Epoch 13 batch: 35 moving to device 4.571 ms, 32.53 s total
[ 2023-10-08 03:10:59 ] Completed Epoch 13 batch: 35 forward through model 50.663 ms, 32.58 s total
[ 2023-10-08 03:10:59 ] Completed Epoch 13 batch: 35 outputs back to cpu 1.841 ms, 32.58 s total
[ 2023-10-08 03:10:59 ] Completed Epoch 13 batch: 35 update evaluator 21.221 ms, 32.60 s total
Saving checkpoint at epoch 13 eval batch 35
[ 2023-10-08 03:11:00 ] Completed saving temp checkpoint 1,319.335 ms, 33.92 s total
[ 2023-10-08 03:11:00 ] Completed replacing temp checkpoint with checkpoint 61.727 ms, 33.98 s total
[ 2023-10-08 03:11:00 ] Completed Epoch 13 batch: 36 moving to device 4.079 ms, 33.99 s total
[ 2023-10-08 03:11:00 ] Completed Epoch 13 batch: 36 forward through model 61.349 ms, 34.05 s total
[ 2023-10-08 03:11:00 ] Completed Epoch 13 batch: 36 outputs back to cpu 23.733 ms, 34.07 s total
[ 2023-10-08 03:11:00 ] Completed Epoch 13 batch: 36 update evaluator 40.745 ms, 34.11 s total
Saving checkpoint at epoch 13 eval batch 36
[ 2023-10-08 03:11:02 ] Completed saving temp checkpoint 1,798.649 ms, 35.91 s total
[ 2023-10-08 03:11:02 ] Completed replacing temp checkpoint with checkpoint 71.565 ms, 35.98 s total
[ 2023-10-08 03:11:02 ] Completed Epoch 13 batch: 37 moving to device 3.534 ms, 35.99 s total
[ 2023-10-08 03:11:02 ] Completed Epoch 13 batch: 37 forward through model 71.762 ms, 36.06 s total
[ 2023-10-08 03:11:02 ] Completed Epoch 13 batch: 37 outputs back to cpu 23.564 ms, 36.08 s total
[ 2023-10-08 03:11:02 ] Completed Epoch 13 batch: 37 update evaluator 55.991 ms, 36.14 s total
Saving checkpoint at epoch 13 eval batch 37
[ 2023-10-08 03:11:03 ] Completed saving temp checkpoint 1,340.792 ms, 37.48 s total
[ 2023-10-08 03:11:04 ] Completed replacing temp checkpoint with checkpoint 93.663 ms, 37.57 s total
[ 2023-10-08 03:11:04 ] Completed Epoch 13 batch: 38 moving to device 4.057 ms, 37.58 s total
[ 2023-10-08 03:11:04 ] Completed Epoch 13 batch: 38 forward through model 50.859 ms, 37.63 s total
[ 2023-10-08 03:11:04 ] Completed Epoch 13 batch: 38 outputs back to cpu 15.756 ms, 37.64 s total
[ 2023-10-08 03:11:04 ] Completed Epoch 13 batch: 38 update evaluator 31.196 ms, 37.68 s total
Saving checkpoint at epoch 13 eval batch 38
[ 2023-10-08 03:11:05 ] Completed saving temp checkpoint 1,482.176 ms, 39.16 s total
[ 2023-10-08 03:11:05 ] Completed replacing temp checkpoint with checkpoint 74.504 ms, 39.23 s total
[ 2023-10-08 03:11:05 ] Completed Epoch 13 batch: 39 moving to device 4.184 ms, 39.24 s total
[ 2023-10-08 03:11:05 ] Completed Epoch 13 batch: 39 forward through model 46.789 ms, 39.28 s total
[ 2023-10-08 03:11:05 ] Completed Epoch 13 batch: 39 outputs back to cpu 14.758 ms, 39.30 s total
[ 2023-10-08 03:11:05 ] Completed Epoch 13 batch: 39 update evaluator 26.291 ms, 39.32 s total
Saving checkpoint at epoch 13 eval batch 39
[ 2023-10-08 03:11:07 ] Completed saving temp checkpoint 1,756.377 ms, 41.08 s total
[ 2023-10-08 03:11:07 ] Completed replacing temp checkpoint with checkpoint 75.036 ms, 41.16 s total
[ 2023-10-08 03:11:07 ] Completed Epoch 13 batch: 40 moving to device 3.187 ms, 41.16 s total
[ 2023-10-08 03:11:07 ] Completed Epoch 13 batch: 40 forward through model 42.730 ms, 41.20 s total
[ 2023-10-08 03:11:07 ] Completed Epoch 13 batch: 40 outputs back to cpu 0.812 ms, 41.20 s total
[ 2023-10-08 03:11:07 ] Completed Epoch 13 batch: 40 update evaluator 10.331 ms, 41.21 s total
Saving checkpoint at epoch 13 eval batch 40
[ 2023-10-08 03:11:09 ] Completed saving temp checkpoint 1,837.132 ms, 43.05 s total
[ 2023-10-08 03:11:09 ] Completed replacing temp checkpoint with checkpoint 71.679 ms, 43.12 s total
[ 2023-10-08 03:11:09 ] Completed Epoch 13 batch: 41 moving to device 3.599 ms, 43.12 s total
[ 2023-10-08 03:11:09 ] Completed Epoch 13 batch: 41 forward through model 44.046 ms, 43.17 s total
[ 2023-10-08 03:11:09 ] Completed Epoch 13 batch: 41 outputs back to cpu 0.888 ms, 43.17 s total
[ 2023-10-08 03:11:09 ] Completed Epoch 13 batch: 41 update evaluator 9.602 ms, 43.18 s total
Saving checkpoint at epoch 13 eval batch 41
[ 2023-10-08 03:11:11 ] Completed saving temp checkpoint 1,381.114 ms, 44.56 s total
[ 2023-10-08 03:11:11 ] Completed replacing temp checkpoint with checkpoint 87.767 ms, 44.65 s total
[ 2023-10-08 03:11:11 ] Completed Epoch 13 batch: 42 moving to device 3.751 ms, 44.65 s total
[ 2023-10-08 03:11:11 ] Completed Epoch 13 batch: 42 forward through model 58.221 ms, 44.71 s total
[ 2023-10-08 03:11:11 ] Completed Epoch 13 batch: 42 outputs back to cpu 22.767 ms, 44.73 s total
[ 2023-10-08 03:11:11 ] Completed Epoch 13 batch: 42 update evaluator 38.435 ms, 44.77 s total
Saving checkpoint at epoch 13 eval batch 42
[ 2023-10-08 03:11:12 ] Completed saving temp checkpoint 1,475.712 ms, 46.25 s total
[ 2023-10-08 03:11:12 ] Completed replacing temp checkpoint with checkpoint 77.050 ms, 46.32 s total
[ 2023-10-08 03:11:12 ] Completed Epoch 13 batch: 43 moving to device 4.458 ms, 46.33 s total
[ 2023-10-08 03:11:12 ] Completed Epoch 13 batch: 43 forward through model 41.484 ms, 46.37 s total
[ 2023-10-08 03:11:12 ] Completed Epoch 13 batch: 43 outputs back to cpu 1.729 ms, 46.37 s total
[ 2023-10-08 03:11:12 ] Completed Epoch 13 batch: 43 update evaluator 16.873 ms, 46.39 s total
Saving checkpoint at epoch 13 eval batch 43
[ 2023-10-08 03:11:14 ] Completed saving temp checkpoint 1,356.612 ms, 47.75 s total
[ 2023-10-08 03:11:14 ] Completed replacing temp checkpoint with checkpoint 65.703 ms, 47.81 s total
[ 2023-10-08 03:11:14 ] Completed Epoch 13 batch: 44 moving to device 2.640 ms, 47.81 s total
[ 2023-10-08 03:11:14 ] Completed Epoch 13 batch: 44 forward through model 44.760 ms, 47.86 s total
[ 2023-10-08 03:11:14 ] Completed Epoch 13 batch: 44 outputs back to cpu 1.067 ms, 47.86 s total
[ 2023-10-08 03:11:14 ] Completed Epoch 13 batch: 44 update evaluator 9.824 ms, 47.87 s total
Saving checkpoint at epoch 13 eval batch 44
[ 2023-10-08 03:11:15 ] Completed saving temp checkpoint 1,424.392 ms, 49.29 s total
[ 2023-10-08 03:11:15 ] Completed replacing temp checkpoint with checkpoint 60.214 ms, 49.35 s total
[ 2023-10-08 03:11:15 ] Completed Epoch 13 batch: 45 moving to device 2.624 ms, 49.36 s total
[ 2023-10-08 03:11:15 ] Completed Epoch 13 batch: 45 forward through model 92.847 ms, 49.45 s total
[ 2023-10-08 03:11:15 ] Completed Epoch 13 batch: 45 outputs back to cpu 24.480 ms, 49.47 s total
[ 2023-10-08 03:11:15 ] Completed Epoch 13 batch: 45 update evaluator 43.413 ms, 49.52 s total
Saving checkpoint at epoch 13 eval batch 45
[ 2023-10-08 03:11:17 ] Completed saving temp checkpoint 1,534.886 ms, 51.05 s total
[ 2023-10-08 03:11:17 ] Completed replacing temp checkpoint with checkpoint 70.143 ms, 51.12 s total
[ 2023-10-08 03:11:17 ] Completed Epoch 13 batch: 46 moving to device 4.401 ms, 51.13 s total
[ 2023-10-08 03:11:17 ] Completed Epoch 13 batch: 46 forward through model 52.652 ms, 51.18 s total
[ 2023-10-08 03:11:17 ] Completed Epoch 13 batch: 46 outputs back to cpu 10.537 ms, 51.19 s total
[ 2023-10-08 03:11:17 ] Completed Epoch 13 batch: 46 update evaluator 23.979 ms, 51.21 s total
Saving checkpoint at epoch 13 eval batch 46
[ 2023-10-08 03:11:19 ] Completed saving temp checkpoint 1,734.566 ms, 52.95 s total
[ 2023-10-08 03:11:19 ] Completed replacing temp checkpoint with checkpoint 67.724 ms, 53.02 s total
[ 2023-10-08 03:11:19 ] Completed Epoch 13 batch: 47 moving to device 4.365 ms, 53.02 s total
[ 2023-10-08 03:11:19 ] Completed Epoch 13 batch: 47 forward through model 44.784 ms, 53.07 s total
[ 2023-10-08 03:11:19 ] Completed Epoch 13 batch: 47 outputs back to cpu 1.993 ms, 53.07 s total
[
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment