Skip to content

Instantly share code, notes, and snippets.

@armandmcqueen
Created March 8, 2019 18:47
Show Gist options
  • Save armandmcqueen/23cc8b27c9eadc64c8c61a7d8c020109 to your computer and use it in GitHub Desktop.
Save armandmcqueen/23cc8b27c9eadc64c8c61a7d8c020109 to your computer and use it in GitHub Desktop.
019-03-08 18:46:29.848043: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-08 18:46:29.854419: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-08 18:46:29.988655: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-08 18:46:30.011434: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-08 18:46:30.159645: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcudnn.so.7 locally
2019-03-08 18:46:30.162392: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-08 18:46:30.256675: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcudnn.so.7 locally
2019-03-08 18:46:30.288976: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcudnn.so.7 locally
2019-03-08 18:46:30.404790: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcudnn.so.7 locally
2019-03-08 18:46:30.416447: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcudnn.so.7 locally
2019-03-08 18:46:30.586828: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcudnn.so.7 locally
2019-03-08 18:46:30.625139: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-08 18:46:31.067166: I tensorflow/stream_executor/platform/default/dso_loader.cc:161] successfully opened CUDA library libcudnn.so.7 locally
2019-03-08 18:46:32.602704: E tensorflow/stream_executor/cuda/cuda_blas.cc:694] failed to run cuBLAS routine cublasSgemmEx: CUBLAS_STATUS_EXECUTION_FAILED
2019-03-08 18:46:32.603513: I tensorflow/stream_executor/stream.cc:4825] [stream=0x25c66050,impl=0x25c660f0] did not memzero GPU location; source: 0x7ff0107fac00
2019-03-08 18:46:32.603558: I tensorflow/stream_executor/stream.cc:315] did not allocate timer: 0x7ff0107fabf0
2019-03-08 18:46:32.603567: I tensorflow/stream_executor/stream.cc:1826] [stream=0x25c66050,impl=0x25c660f0] did not enqueue 'start timer': 0x7ff0107fabf0
2019-03-08 18:46:32.603584: I tensorflow/stream_executor/stream.cc:1838] [stream=0x25c66050,impl=0x25c660f0] did not enqueue 'stop timer': 0x7ff0107fabf0
2019-03-08 18:46:32.603591: F tensorflow/stream_executor/gpu/gpu_timer.cc:65] Check failed: start_event_ != nullptr && stop_event_ != nullptr
[f1f0059c329e:24414] *** Process received signal ***
[f1f0059c329e:24414] Signal: Aborted (6)
[f1f0059c329e:24414] Signal code: (-6)
[f1f0059c329e:24414] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ff14b09a390]
[f1f0059c329e:24414] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7ff14a5e4428]
[f1f0059c329e:24414] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7ff14a5e602a]
[f1f0059c329e:24414] [ 3] 2019-03-08 18:46:32.604949: I tensorflow/stream_executor/stream.cc:4787] [stream=0x25c66050,impl=0x25c660f0] did not memcpy device-to-host; source: 0x7feaa8149d00
2019-03-08 18:46:32.605023: I tensorflow/stream_executor/stream.cc:4787] [stream=0x25c66050,impl=0x25c660f0] did not memcpy device-to-host; source: 0x7feaa8149d00
/usr/local/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so(+0x6c5ce04)[0x7ff095afee04]
[f1f0059c329e:24414] [ 4] 2019-03-08 18:46:32.605319: I tensorflow/stream_executor/stream.cc:4825] [stream=0x25c66050,impl=0x25c660f0] did not memzero GPU location; source: 0x7ff010ffbc00
2019-03-08 18:46:32.605336: I tensorflow/stream_executor/stream.cc:315] did not allocate timer: 0x7ff010ffbbf0
2019-03-08 18:46:32.605343: I tensorflow/stream_executor/stream.cc:1826] [stream=0x25c66050,impl=0x25c660f0] did not enqueue 'start timer': 0x7ff010ffbbf0
2019-03-08 18:46:32.605360: I tensorflow/stream_executor/stream.cc:1838] [stream=0x25c66050,impl=0x25c660f0] did not enqueue 'stop timer': 0x7ff010ffbbf0
2019-03-08 18:46:32.605367: F tensorflow/stream_executor/gpu/gpu_timer.cc:65] Check failed: start_event_ != nullptr && stop_event_ != nullptr
2019-03-08 18:46:32.657870: E tensorflow/stream_executor/cuda/cuda_blas.cc:694] failed to run cuBLAS routine cublasSgemmEx: CUBLAS_STATUS_EXECUTION_FAILED
2019-03-08 18:46:32.658628: I tensorflow/stream_executor/stream.cc:4787] [stream=0x273089f0,impl=0x27308a90] did not memcpy device-to-host; source: 0x7f9492b15f00
2019-03-08 18:46:32.658759: I tensorflow/stream_executor/stream.cc:4787] [stream=0x273089f0,impl=0x27308a90] did not memcpy device-to-host; source: 0x7f9492b15f00
2019-03-08 18:46:32.658961: I tensorflow/stream_executor/stream.cc:4787] [stream=0x273089f0,impl=0x27308a90] did not memcpy device-to-host; source: 0x7f9492b15e00
2019-03-08 18:46:32.659006: I tensorflow/stream_executor/stream.cc:4787] [stream=0x273089f0,impl=0x27308a90] did not memcpy device-to-host; source: 0x7f9492b15e00
2019-03-08 18:46:32.659094: I tensorflow/stream_executor/stream.cc:4787] [stream=0x273089f0,impl=0x27308a90] did not memcpy device-to-host; source: 0x7f9492a95d00
2019-03-08 18:46:32.659135: I tensorflow/stream_executor/stream.cc:4787] [stream=0x273089f0,impl=0x27308a90] did not memcpy device-to-host; source: 0x7f9492a95d00
2019-03-08 18:46:32.738744: E tensorflow/stream_executor/cuda/cuda_blas.cc:694] failed to run cuBLAS routine cublasSgemmEx: CUBLAS_STATUS_EXECUTION_FAILED
2019-03-08 18:46:32.754039: E tensorflow/stream_executor/cuda/cuda_blas.cc:694] failed to run cuBLAS routine cublasSgemmEx: CUBLAS_STATUS_EXECUTION_FAILED
2019-03-08 18:46:32.755964: I tensorflow/stream_executor/stream.cc:4787] [stream=0x277fa330,impl=0x277fce50] did not memcpy device-to-host; source: 0x7f3dcafdbf00
2019-03-08 18:46:32.756091: I tensorflow/stream_executor/stream.cc:4787] [stream=0x277fa330,impl=0x277fce50] did not memcpy device-to-host; source: 0x7f3dcaebbd00
2019-03-08 18:46:32.756130: I tensorflow/stream_executor/stream.cc:4787] [stream=0x277fa330,impl=0x277fce50] did not memcpy device-to-host; source: 0x7f3dcaebbd00
2019-03-08 18:46:32.812762: E tensorflow/stream_executor/cuda/cuda_blas.cc:694] failed to run cuBLAS routine cublasSgemmEx: CUBLAS_STATUS_EXECUTION_FAILED
2019-03-08 18:46:32.812868: I tensorflow/stream_executor/stream.cc:1852] [stream=0x27263c00,impl=0x27263ca0] did not wait for [stream=0x3f4718c0,impl=0x272635c0]
2019-03-08 18:46:32.812896: I tensorflow/stream_executor/stream.cc:4800] [stream=0x27263c00,impl=0x27263ca0] did not memcpy host-to-device; source: 0x7f0bc207b200
2019-03-08 18:46:32.812940: F tensorflow/core/common_runtime/gpu/gpu_util.cc:339] CPU->GPU Memcpy failed
[f1f0059c329e:24413] *** Process received signal ***
[f1f0059c329e:24413] Signal: Aborted (6)
[f1f0059c329e:24413] Signal code: (-6)
[f1f0059c329e:24413] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f0de86be390]
[f1f0059c329e:24413] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f0de7c08428]
[f1f0059c329e:24413] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f0de7c0a02a]
[f1f0059c329e:24413] [ 3] 2019-03-08 18:46:32.813422: I tensorflow/stream_executor/stream.cc:4787] [stream=0x3f4718c0,impl=0x272635c0] did not memcpy device-to-host; source: 0x7f0b8893b700
2019-03-08 18:46:32.814065: I tensorflow/stream_executor/stream.cc:4825] [stream=0x3f4718c0,impl=0x272635c0] did not memzero GPU location; source: 0x7f0cad7fcc00
2019-03-08 18:46:32.814085: I tensorflow/stream_executor/stream.cc:315] did not allocate timer: 0x7f0cad7fcbf0
2019-03-08 18:46:32.814094: I tensorflow/stream_executor/stream.cc:1826] [stream=0x3f4718c0,impl=0x272635c0] did not enqueue 'start timer': 0x7f0cad7fcbf0
2019-03-08 18:46:32.814109: I tensorflow/stream_executor/stream.cc:1838] [stream=0x3f4718c0,impl=0x272635c0] did not enqueue 'stop timer': 0x7f0cad7fcbf0
2019-03-08 18:46:32.814116: F tensorflow/stream_executor/gpu/gpu_timer.cc:65] Check failed: start_event_ != nullptr && stop_event_ != nullptr
2019-03-08 18:46:32.937764: E tensorflow/stream_executor/cuda/cuda_blas.cc:694] failed to run cuBLAS routine cublasSgemmEx: CUBLAS_STATUS_EXECUTION_FAILED
2019-03-08 18:46:32.938853: I tensorflow/stream_executor/stream.cc:4787] [stream=0x3b514fb0,impl=0x3b515050] did not memcpy device-to-host; source: 0x7f968f2c7600
2019-03-08 18:46:32.938926: I tensorflow/stream_executor/stream.cc:4787] [stream=0x3b514fb0,impl=0x3b515050] did not memcpy device-to-host; source: 0x7f968f2c7600
[0308 18:46:33 @input_source.py:176] EnqueueThread QueueInput/input_queue Exited.
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1320, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1408, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(20643840, 1), b.shape=(1, 4), m=20643840, n=4, k=1
[[{{node fpn/fpn/upsample_lat3/Tensordot/MatMul}}]]
[[gradients/rpn_losses_batch/level3/boolean_mask_31/GatherV2_grad/Shape/_8415]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tensorpack-mask-rcnn/MaskRCNN/train.py", line 651, in <module>
launch_train_with_config(traincfg, trainer)
File "/tensorpack-mask-rcnn/tensorpack/train/interface.py", line 94, in launch_train_with_config
extra_callbacks=config.extra_callbacks)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 343, in train_with_defaults
steps_per_epoch, starting_epoch, max_epoch)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 315, in train
self.main_loop(steps_per_epoch, starting_epoch, max_epoch)
File "/tensorpack-mask-rcnn/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 280, in main_loop
self.run_step() # implemented by subclass
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 180, in run_step
self.hooked_sess.run(self.train_op)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 694, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1189, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1288, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1273, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1345, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1109, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(20643840, 1), b.shape=(1, 4), m=20643840, n=4, k=1
[[node fpn/fpn/upsample_lat3/Tensordot/MatMul (defined at tensorpack-mask-rcnn/tensorpack/models/pool.py:130) ]]
[[gradients/rpn_losses_batch/level3/boolean_mask_31/GatherV2_grad/Shape/_8415]]
Original stack trace for 'fpn/fpn/upsample_lat3/Tensordot/MatMul':
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 651, in <module>
launch_train_with_config(traincfg, trainer)
File "tensorpack-mask-rcnn/tensorpack/train/interface.py", line 84, in launch_train_with_config
model._build_graph_get_cost, model.get_optimizer)
File "tensorpack-mask-rcnn/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1320, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1408, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(20643840, 1), b.shape=(1, 4), m=20643840, n=4, k=1
[[{{node fpn/fpn/upsample_lat3/Tensordot/MatMul}}]]
[[gradients/GatherV2_7_grad/Shape/_8953]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tensorpack-mask-rcnn/MaskRCNN/train.py", line 651, in <module>
launch_train_with_config(traincfg, trainer)
File "/tensorpack-mask-rcnn/tensorpack/train/interface.py", line 94, in launch_train_with_config
extra_callbacks=config.extra_callbacks)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 343, in train_with_defaults
steps_per_epoch, starting_epoch, max_epoch)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 315, in train
self.main_loop(steps_per_epoch, starting_epoch, max_epoch)
File "/tensorpack-mask-rcnn/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 280, in main_loop
self.run_step() # implemented by subclass
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 180, in run_step
self.hooked_sess.run(self.train_op)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 694, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1189, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1288, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1273, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1345, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1109, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(20643840, 1), b.shape=(1, 4), m=20643840, n=4, k=1
[[node fpn/fpn/upsample_lat3/Tensordot/MatMul (defined at tensorpack-mask-rcnn/tensorpack/models/pool.py:130) ]]
[[gradients/GatherV2_7_grad/Shape/_8953]]
Original stack trace for 'fpn/fpn/upsample_lat3/Tensordot/MatMul':
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 651, in <module>
launch_train_with_config(traincfg, trainer)
File "tensorpack-mask-rcnn/tensorpack/train/interface.py", line 84, in launch_train_with_config
model._build_graph_get_cost, model.get_optimizer)
File "tensorpack-mask-rcnn/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 216, in setup_gTraceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1320, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1408, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(20643840, 1), b.shape=(1, 4), m=20643840, n=4, k=1
[[{{node fpn/fpn/upsample_lat3/Tensordot/MatMul}}]]
[[gradients/rpn_losses_batch/level3/boolean_mask_31/GatherV2_grad/Shape/_8415]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tensorpack-mask-rcnn/MaskRCNN/train.py", line 651, in <module>
launch_train_with_config(traincfg, trainer)
File "/tensorpack-mask-rcnn/tensorpack/train/interface.py", line 94, in launch_train_with_config
extra_callbacks=config.extra_callbacks)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 343, in train_with_defaults
steps_per_epoch, starting_epoch, max_epoch)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 315, in train
self.main_loop(steps_per_epoch, starting_epoch, max_epoch)
File "/tensorpack-mask-rcnn/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 280, in main_loop
self.run_step() # implemented by subclass
File "/tensorpack-mask-rcnn/tensorpack/train/base.py", line 180, in run_step
self.hooked_sess.run(self.train_op)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 694, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1189, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1288, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1273, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1345, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1109, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(20643840, 1), b.shape=(1, 4), m=20643840, n=4, k=1
[[node fpn/fpn/upsample_lat3/Tensordot/MatMul (defined at tensorpack-mask-rcnn/tensorpack/models/pool.py:130) ]]
[[gradients/rpn_losses_batch/level3/boolean_mask_31/GatherV2_grad/Shape/_8415]]
Original stack trace for 'fpn/fpn/upsample_lat3/Tensordot/MatMul':
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 651, in <module>
launch_train_with_config(traincfg, trainer)
File "tensorpack-mask-rcnn/tensorpack/train/interface.py", line 84, in launch_train_with_config
model._build_graph_get_cost, model.get_optimizer)
File "tensorpack-mask-rcnn/tensorpack/utils/argtools.py", line 176, in wrapper
return func(*args, **kwargs)
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 216, in setup_graph
train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn)
File "tensorpack-mask-rcnn/tensorpack/train/trainers.py", line 410, in _setup_graph
grads = self._make_get_grad_fn(input, get_cost_fn, get_opt_fn)()
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 283, in get_grad_fn
return compute_grad_from_inputs(*inputs)
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 247, in compute_grad_from_inputs
cost = get_cost_fn(*inputs)
File "tensorpack-mask-rcnn/tensorpack/tfutils/tower.py", line 286, in __call__
output = self._tower_fn(*args)
File "tensorpack-mask-rcnn/tensorpack/graph_builder/model_desc.py", line 262, in _build_graph_get_cost
ret = self.build_graph(*inputs)
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 124, in build_graph
features = self.backbone(images)
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 193, in backbone
p23456 = fpn_model('fpn', c2345, fp16=self.fp16)
File "tensorpack-mask-rcnn/tensorpack/models/registry.py", line 128, in wrapped_func
outputs = func(*args, **actual_args)
File "tensorpack-mask-rcnn/MaskRCNN/model_fpn.py", line 80, in fpn_model
lat = lat + upsample2x('upsample_lat{}'.format(6 - idx), lat_sum_5432[-1])
File "tensorpack-mask-rcnn/MaskRCNN/model_fpn.py", line 57, in upsample2x
data_format='channels_first')
File "tensorpack-mask-rcnn/tensorpack/models/registry.py", line 128, in wrapped_func
outputs = func(*args, **actual_args)
File "tensorpack-mask-rcnn/tensorpack/models/pool.py", line 130, in FixedUnPooling
ret = tf.tensordot(x, mat, axes=1) # bxcxhxwxshxsw
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 3641, in tensordot
ab_matmul = matmul(a_reshape, b_reshape)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2513, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5675, in mat_mul
name=name)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 800, in _apply_op_helper
op_def=op_def)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3473, in create_op
op_def=op_def)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1961, in __init__
self._traceback = tf_stack.extract_stack()
[0308 18:46:33 @input_source.py:176] EnqueueThread QueueInput/input_queue Exited.
[0308 18:46:33 @input_source.py:176] EnqueueThread QueueInput/input_queue Exited.
raph
train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn)
File "tensorpack-mask-rcnn/tensorpack/train/trainers.py", line 410, in _setup_graph
grads = self._make_get_grad_fn(input, get_cost_fn, get_opt_fn)()
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 283, in get_grad_fn
return compute_grad_from_inputs(*inputs)
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 247, in compute_grad_from_inputs
cost = get_cost_fn(*inputs)
File "tensorpack-mask-rcnn/tensorpack/tfutils/tower.py", line 286, in __call__
output = self._tower_fn(*args)
File "tensorpack-mask-rcnn/tensorpack/graph_builder/model_desc.py", line 262, in _build_graph_get_cost
ret = self.build_graph(*inputs)
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 124, in build_graph
features = self.backbone(images)
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 193, in backbone
p23456 = fpn_model('fpn', c2345, fp16=self.fp16)
File "tensorpack-mask-rcnn/tensorpack/models/registry.py", line 128, in wrapped_func
outputs = func(*args, **actual_args)
File "tensorpack-mask-rcnn/MaskRCNN/model_fpn.py", line 80, in fpn_model
lat = lat + upsample2x('upsample_lat{}'.format(6 - idx), lat_sum_5432[-1])
File "tensorpack-mask-rcnn/MaskRCNN/model_fpn.py", line 57, in upsample2x
data_format='channels_first')
File "tensorpack-mask-rcnn/tensorpack/models/registry.py", line 128, in wrapped_func
outputs = func(*args, **actual_args)
File "tensorpack-mask-rcnn/tensorpack/models/pool.py", line 130, in FixedUnPooling
ret = tf.tensordot(x, mat, axes=1) # bxcxhxwxshxsw
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 3641, in tensordot
ab_matmul = matmul(a_reshape, b_reshape)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2513, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5675, in mat_mul
name=name)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 800, in _apply_op_helper
op_def=op_def)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3473, in create_op
op_def=op_def)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1961, in __init__
self._traceback = tf_stack.extract_stack()
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 216, in setup_graph
train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn)
File "tensorpack-mask-rcnn/tensorpack/train/trainers.py", line 410, in _setup_graph
grads = self._make_get_grad_fn(input, get_cost_fn, get_opt_fn)()
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 283, in get_grad_fn
return compute_grad_from_inputs(*inputs)
File "tensorpack-mask-rcnn/tensorpack/train/tower.py", line 247, in compute_grad_from_inputs
cost = get_cost_fn(*inputs)
File "tensorpack-mask-rcnn/tensorpack/tfutils/tower.py", line 286, in __call__
output = self._tower_fn(*args)
File "tensorpack-mask-rcnn/tensorpack/graph_builder/model_desc.py", line 262, in _build_graph_get_cost
ret = self.build_graph(*inputs)
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 124, in build_graph
features = self.backbone(images)
File "tensorpack-mask-rcnn/MaskRCNN/train.py", line 193, in backbone
p23456 = fpn_model('fpn', c2345, fp16=self.fp16)
File "tensorpack-mask-rcnn/tensorpack/models/registry.py", line 128, in wrapped_func
outputs = func(*args, **actual_args)
File "tensorpack-mask-rcnn/MaskRCNN/model_fpn.py", line 80, in fpn_model
lat = lat + upsample2x('upsample_lat{}'.format(6 - idx), lat_sum_5432[-1])
File "tensorpack-mask-rcnn/MaskRCNN/model_fpn.py", line 57, in upsample2x
data_format='channels_first')
File "tensorpack-mask-rcnn/tensorpack/models/registry.py", line 128, in wrapped_func
outputs = func(*args, **actual_args)
File "tensorpack-mask-rcnn/tensorpack/models/pool.py", line 130, in FixedUnPooling
ret = tf.tensordot(x, mat, axes=1) # bxcxhxwxshxsw
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 3641, in tensordot
ab_matmul = matmul(a_reshape, b_reshape)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2513, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5675, in mat_mul
name=name)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 800, in _apply_op_helper
op_def=op_def)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3473, in create_op
op_def=op_def)
File "usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1961, in __init__
self._traceback = tf_stack.extract_stack()
--------------------------------------------------------------------------
mpirun.real noticed that process rank 7 with PID 0 on node f1f0059c329e exited on signal 6 (Aborted).
--------------------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment