Skip to content

Instantly share code, notes, and snippets.

@jackdh
Created January 26, 2019 21:56
Show Gist options
  • Save jackdh/a8fc2816b0845821585b4f022207d9ba to your computer and use it in GitHub Desktop.
Save jackdh/a8fc2816b0845821585b4f022207d9ba to your computer and use it in GitHub Desktop.
INFO:sagemaker:Creating training-job with name: ss-notebook-demo-2019-01-26-21-30-09-076
2019-01-26 21:30:09 Starting - Starting the training job...
2019-01-26 21:30:25 Starting - Launching requested ML instances......
2019-01-26 21:31:24 Starting - Preparing the instances for training......
2019-01-26 21:32:36 Downloading - Downloading input data...
2019-01-26 21:32:51 Training - Downloading the training image..
Docker entrypoint called with argument(s): train
[01/26/2019 21:33:17 INFO 139737907083072] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/default-input.json: {u'gamma2': u'0.9', u'gamma1': u'0.9', u'early_stopping_min_epochs': u'5', u'epochs': u'10', u'_workers': u'16', u'_num_kv_servers': u'auto', u'weight_decay': u'0.0001', u'crop_size': u'240', u'use_pretrained_model': u'True', u'_aux_weight': u'0.5', u'_hybrid': u'False', u'_augmentation_type': u'default', u'lr_scheduler': u'poly', u'early_stopping_patience': u'4', u'momentum': u'0.9', u'optimizer': u'sgd', u'early_stopping_tolerance': u'0.0', u'learning_rate': u'0.001', u'backbone': u'resnet-50', u'validation_mini_batch_size': u'16', u'_aux_loss': u'True', u'mini_batch_size': u'16', u'_precision_dtype': u'float32', u'early_stopping': u'False', u'algorithm': u'fcn', u'_logging_frequency': u'20', u'num_training_samples': u'8', u'_kvstore': u'device', u'_syncbn': u'False'}
[01/26/2019 21:33:17 INFO 139737907083072] Reading provided configuration from /opt/ml/input/config/hyperparameters.json: {u'learning_rate': u'0.0001', u'optimizer': u'rmsprop', u'algorithm': u'psp', u'lr_scheduler': u'poly', u'use_pretrained_model': u'True', u'backbone': u'resnet-50', u'early_stopping_min_epochs': u'10', u'epochs': u'10', u'validation_mini_batch_size': u'16', u'num_training_samples': u'367', u'num_classes': u'21', u'mini_batch_size': u'16', u'early_stopping_patience': u'2', u'early_stopping': u'True', u'crop_size': u'240'}
[01/26/2019 21:33:17 INFO 139737907083072] Final configuration: {u'gamma2': u'0.9', u'gamma1': u'0.9', u'early_stopping_min_epochs': u'10', u'epochs': u'10', u'_workers': u'16', u'_num_kv_servers': u'auto', u'weight_decay': u'0.0001', u'crop_size': u'240', u'use_pretrained_model': u'True', u'_aux_weight': u'0.5', u'_hybrid': u'False', u'_augmentation_type': u'default', u'lr_scheduler': u'poly', u'num_classes': u'21', u'early_stopping_patience': u'2', u'momentum': u'0.9', u'optimizer': u'rmsprop', u'early_stopping_tolerance': u'0.0', u'learning_rate': u'0.0001', u'backbone': u'resnet-50', u'validation_mini_batch_size': u'16', u'_aux_loss': u'True', u'mini_batch_size': u'16', u'_precision_dtype': u'float32', u'early_stopping': u'True', u'algorithm': u'psp', u'_logging_frequency': u'20', u'num_training_samples': u'367', u'_kvstore': u'device', u'_syncbn': u'False'}
[01/26/2019 21:33:17 INFO 139737907083072] Using default worker.
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator application/x-image for content type ('application/x-image', '1.0')
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator application/x-recordio for content type ('application/x-recordio', '1.0')
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator image/png for content type ('image/png', '1.0')
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator application/json for content type ('application/json', '1.0')
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator image/jpeg for content type ('image/jpeg', '1.0')
[01/26/2019 21:33:17 WARNING 139737907083072] /opt/ml/input/data/train/_annotation is not a readable image file
[01/26/2019 21:33:17 WARNING 139737907083072] label maps not provided, using defaults.
[01/26/2019 21:33:17 INFO 139737907083072] #label_map train :{'scale': 1}
[01/26/2019 21:33:17 WARNING 139737907083072] /opt/ml/input/data/validation/_annotation is not a readable image file
[01/26/2019 21:33:17 WARNING 139737907083072] label maps not provided, using defaults.
[01/26/2019 21:33:17 INFO 139737907083072] #label_map validation :{'scale': 1}
[01/26/2019 21:33:17 INFO 139737907083072] nvidia-smi took: 0.0755198001862 secs to identify 1 gpus
[01/26/2019 21:33:17 INFO 139737907083072] Number of GPUs being used: 1
[01/26/2019 21:33:17 INFO 139737907083072] Number of GPUs being used: 1
Model file is not found. Downloading.
Downloading /root/.mxnet/models/resnet50_v1s-25a187fa.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet50_v1s-25a187fa.zip...
#015 0%| | 0/57417 [00:00<?, ?KB/s]#015 0%| | 33/57417 [00:00<03:52, 246.73KB/s]#015 0%| | 102/57417 [00:00<02:31, 377.44KB/s]#015 0%| | 233/57417 [00:00<01:39, 573.26KB/s]#015 1%| | 514/57417 [00:00<01:00, 947.12KB/s]#015 2%|1 | 986/57417 [00:00<00:38, 1451.06KB/s]#015 3%|3 | 1997/57417 [00:00<00:22, 2448.48KB/s]#015 6%|6 | 3613/57417 [00:00<00:14, 3793.38KB/s]#015 12%|#2 | 7005/57417 [00:01<00:07, 6433.81KB/s]#015 18%|#7 | 10285/57417 [00:01<00:05, 8394.14KB/s]#015 24%|##3 | 13749/57417 [00:01<00:04, 10095.57KB/s]#015 29%|##9 | 16679/57417 [00:01<00:03, 11131.13KB/s]#015 35%|###5 | 20279/57417 [00:01<00:02, 12411.01KB/s]#015 41%|#### | 23303/57417 [00:01<00:02, 13158.77KB/s]#015 44%|####4 | 25468/57417 [00:01<00:02, 13098.96KB/s]#015 51%|#####1 | 29423/57417 [00:02<00:01, 14339.02KB/s]#015 57%|#####6 | 32498/57417 [00:02<00:01, 14899.20KB/s]#015 64%|######3 | 36727/57417 [00:02<00:01, 16099.86KB/s]#015 69%|######9 | 39684/57417 [00:02<00:01, 16607.59KB/s]#015 74%|#######4 | 42573/57417 [00:02<00:00, 17041.77KB/s]#015 79%|#######9 | 45393/57417 [00:02<00:00, 17467.17KB/s]#015 86%|########5 | 49210/57417 [00:02<00:00, 18048.83KB/s]#015 94%|#########3| 53888/57417 [00:02<00:00, 19065.27KB/s]#015100%|#########9| 57299/57417 [00:02<00:00, 19490.33KB/s]#01557418KB [00:02, 19318.52KB/s]
/opt/amazon/lib/python2.7/site-packages/mxnet/gluon/block.py:421: UserWarning: load_params is deprecated. Please use load_parameters.
warnings.warn("load_params is deprecated. Please use load_parameters.")
2019-01-26 21:33:13 Training - Training image download completed. Training in progress.('self.crop_size', 240)
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Batches Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Records Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Batches Seen": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Records Seen": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Max Records Seen Between Resets": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Reset Count": {"count": 1, "max": 0, "sum": 0.0, "min": 0}}, "EndTime": 1548538414.995646, "Dimensions": {"Host": "algo-1", "Meta": "init_train_data_iter", "Operation": "training", "Algorithm": "AWS/Semantic Segmentation"}, "StartTime": 1548538414.99557}
[21:33:37] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.3.x.178.0/AL2012/generic-flavor/src/src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[01/26/2019 21:33:51 INFO 139737907083072] #progress_notice. epoch: 0, iterations: 20 speed: 47.2825722267 samples/sec
[01/26/2019 21:33:54 INFO 139737907083072] #quality_metric. host: algo-1, epoch: 0, train loss: 1.6545900036891301 .
[01/26/2019 21:33:54 INFO 139737907083072] #throughput_metric. host: algo-1, epoch: 0, train throughput: 28.7913721381 samples/sec.
Process Process-23:
Traceback (most recent call last):
File "/opt/amazon/python2.7/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/opt/amazon/python2.7/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/opt/amazon/lib/python2.7/site-packages/mxnet/gluon/data/dataloader.py", line 169, in worker_loop
batch = batchify_fn([dataset[i] for i in samples])
File "/opt/amazon/lib/python2.7/site-packages/algorithm/dataset.py", line 172, in __getitem__
self._mask_check(mask)
File "/opt/amazon/lib/python2.7/site-packages/algorithm/dataset.py", line 137, in _mask_check
'number of classes.'.format(int(mask.max().asscalar())))
CustomerError: Annotation value 21 found in labels. This is greater than number of classes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment