Created
January 26, 2019 21:56
-
-
Save jackdh/a8fc2816b0845821585b4f022207d9ba to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
INFO:sagemaker:Creating training-job with name: ss-notebook-demo-2019-01-26-21-30-09-076 | |
2019-01-26 21:30:09 Starting - Starting the training job... | |
2019-01-26 21:30:25 Starting - Launching requested ML instances...... | |
2019-01-26 21:31:24 Starting - Preparing the instances for training...... | |
2019-01-26 21:32:36 Downloading - Downloading input data... | |
2019-01-26 21:32:51 Training - Downloading the training image.. | |
Docker entrypoint called with argument(s): train | |
[01/26/2019 21:33:17 INFO 139737907083072] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/default-input.json: {u'gamma2': u'0.9', u'gamma1': u'0.9', u'early_stopping_min_epochs': u'5', u'epochs': u'10', u'_workers': u'16', u'_num_kv_servers': u'auto', u'weight_decay': u'0.0001', u'crop_size': u'240', u'use_pretrained_model': u'True', u'_aux_weight': u'0.5', u'_hybrid': u'False', u'_augmentation_type': u'default', u'lr_scheduler': u'poly', u'early_stopping_patience': u'4', u'momentum': u'0.9', u'optimizer': u'sgd', u'early_stopping_tolerance': u'0.0', u'learning_rate': u'0.001', u'backbone': u'resnet-50', u'validation_mini_batch_size': u'16', u'_aux_loss': u'True', u'mini_batch_size': u'16', u'_precision_dtype': u'float32', u'early_stopping': u'False', u'algorithm': u'fcn', u'_logging_frequency': u'20', u'num_training_samples': u'8', u'_kvstore': u'device', u'_syncbn': u'False'} | |
[01/26/2019 21:33:17 INFO 139737907083072] Reading provided configuration from /opt/ml/input/config/hyperparameters.json: {u'learning_rate': u'0.0001', u'optimizer': u'rmsprop', u'algorithm': u'psp', u'lr_scheduler': u'poly', u'use_pretrained_model': u'True', u'backbone': u'resnet-50', u'early_stopping_min_epochs': u'10', u'epochs': u'10', u'validation_mini_batch_size': u'16', u'num_training_samples': u'367', u'num_classes': u'21', u'mini_batch_size': u'16', u'early_stopping_patience': u'2', u'early_stopping': u'True', u'crop_size': u'240'} | |
[01/26/2019 21:33:17 INFO 139737907083072] Final configuration: {u'gamma2': u'0.9', u'gamma1': u'0.9', u'early_stopping_min_epochs': u'10', u'epochs': u'10', u'_workers': u'16', u'_num_kv_servers': u'auto', u'weight_decay': u'0.0001', u'crop_size': u'240', u'use_pretrained_model': u'True', u'_aux_weight': u'0.5', u'_hybrid': u'False', u'_augmentation_type': u'default', u'lr_scheduler': u'poly', u'num_classes': u'21', u'early_stopping_patience': u'2', u'momentum': u'0.9', u'optimizer': u'rmsprop', u'early_stopping_tolerance': u'0.0', u'learning_rate': u'0.0001', u'backbone': u'resnet-50', u'validation_mini_batch_size': u'16', u'_aux_loss': u'True', u'mini_batch_size': u'16', u'_precision_dtype': u'float32', u'early_stopping': u'True', u'algorithm': u'psp', u'_logging_frequency': u'20', u'num_training_samples': u'367', u'_kvstore': u'device', u'_syncbn': u'False'} | |
[01/26/2019 21:33:17 INFO 139737907083072] Using default worker. | |
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator application/x-image for content type ('application/x-image', '1.0') | |
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator application/x-recordio for content type ('application/x-recordio', '1.0') | |
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator image/png for content type ('image/png', '1.0') | |
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator application/json for content type ('application/json', '1.0') | |
[01/26/2019 21:33:17 INFO 139737907083072] Loaded iterator creator image/jpeg for content type ('image/jpeg', '1.0') | |
[01/26/2019 21:33:17 WARNING 139737907083072] /opt/ml/input/data/train/_annotation is not a readable image file | |
[01/26/2019 21:33:17 WARNING 139737907083072] label maps not provided, using defaults. | |
[01/26/2019 21:33:17 INFO 139737907083072] #label_map train :{'scale': 1} | |
[01/26/2019 21:33:17 WARNING 139737907083072] /opt/ml/input/data/validation/_annotation is not a readable image file | |
[01/26/2019 21:33:17 WARNING 139737907083072] label maps not provided, using defaults. | |
[01/26/2019 21:33:17 INFO 139737907083072] #label_map validation :{'scale': 1} | |
[01/26/2019 21:33:17 INFO 139737907083072] nvidia-smi took: 0.0755198001862 secs to identify 1 gpus | |
[01/26/2019 21:33:17 INFO 139737907083072] Number of GPUs being used: 1 | |
[01/26/2019 21:33:17 INFO 139737907083072] Number of GPUs being used: 1 | |
Model file is not found. Downloading. | |
Downloading /root/.mxnet/models/resnet50_v1s-25a187fa.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet50_v1s-25a187fa.zip... | |
#015 0%| | 0/57417 [00:00<?, ?KB/s]#015 0%| | 33/57417 [00:00<03:52, 246.73KB/s]#015 0%| | 102/57417 [00:00<02:31, 377.44KB/s]#015 0%| | 233/57417 [00:00<01:39, 573.26KB/s]#015 1%| | 514/57417 [00:00<01:00, 947.12KB/s]#015 2%|1 | 986/57417 [00:00<00:38, 1451.06KB/s]#015 3%|3 | 1997/57417 [00:00<00:22, 2448.48KB/s]#015 6%|6 | 3613/57417 [00:00<00:14, 3793.38KB/s]#015 12%|#2 | 7005/57417 [00:01<00:07, 6433.81KB/s]#015 18%|#7 | 10285/57417 [00:01<00:05, 8394.14KB/s]#015 24%|##3 | 13749/57417 [00:01<00:04, 10095.57KB/s]#015 29%|##9 | 16679/57417 [00:01<00:03, 11131.13KB/s]#015 35%|###5 | 20279/57417 [00:01<00:02, 12411.01KB/s]#015 41%|#### | 23303/57417 [00:01<00:02, 13158.77KB/s]#015 44%|####4 | 25468/57417 [00:01<00:02, 13098.96KB/s]#015 51%|#####1 | 29423/57417 [00:02<00:01, 14339.02KB/s]#015 57%|#####6 | 32498/57417 [00:02<00:01, 14899.20KB/s]#015 64%|######3 | 36727/57417 [00:02<00:01, 16099.86KB/s]#015 69%|######9 | 39684/57417 [00:02<00:01, 16607.59KB/s]#015 74%|#######4 | 42573/57417 [00:02<00:00, 17041.77KB/s]#015 79%|#######9 | 45393/57417 [00:02<00:00, 17467.17KB/s]#015 86%|########5 | 49210/57417 [00:02<00:00, 18048.83KB/s]#015 94%|#########3| 53888/57417 [00:02<00:00, 19065.27KB/s]#015100%|#########9| 57299/57417 [00:02<00:00, 19490.33KB/s]#01557418KB [00:02, 19318.52KB/s] | |
/opt/amazon/lib/python2.7/site-packages/mxnet/gluon/block.py:421: UserWarning: load_params is deprecated. Please use load_parameters. | |
warnings.warn("load_params is deprecated. Please use load_parameters.") | |
2019-01-26 21:33:13 Training - Training image download completed. Training in progress.('self.crop_size', 240) | |
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Batches Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Records Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Batches Seen": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Records Seen": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Max Records Seen Between Resets": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Reset Count": {"count": 1, "max": 0, "sum": 0.0, "min": 0}}, "EndTime": 1548538414.995646, "Dimensions": {"Host": "algo-1", "Meta": "init_train_data_iter", "Operation": "training", "Algorithm": "AWS/Semantic Segmentation"}, "StartTime": 1548538414.99557} | |
[21:33:37] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.3.x.178.0/AL2012/generic-flavor/src/src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) | |
[01/26/2019 21:33:51 INFO 139737907083072] #progress_notice. epoch: 0, iterations: 20 speed: 47.2825722267 samples/sec | |
[01/26/2019 21:33:54 INFO 139737907083072] #quality_metric. host: algo-1, epoch: 0, train loss: 1.6545900036891301 . | |
[01/26/2019 21:33:54 INFO 139737907083072] #throughput_metric. host: algo-1, epoch: 0, train throughput: 28.7913721381 samples/sec. | |
Process Process-23: | |
Traceback (most recent call last): | |
File "/opt/amazon/python2.7/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap | |
self.run() | |
File "/opt/amazon/python2.7/lib/python2.7/multiprocessing/process.py", line 114, in run | |
self._target(*self._args, **self._kwargs) | |
File "/opt/amazon/lib/python2.7/site-packages/mxnet/gluon/data/dataloader.py", line 169, in worker_loop | |
batch = batchify_fn([dataset[i] for i in samples]) | |
File "/opt/amazon/lib/python2.7/site-packages/algorithm/dataset.py", line 172, in __getitem__ | |
self._mask_check(mask) | |
File "/opt/amazon/lib/python2.7/site-packages/algorithm/dataset.py", line 137, in _mask_check | |
'number of classes.'.format(int(mask.max().asscalar()))) | |
CustomerError: Annotation value 21 found in labels. This is greater than number of classes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment