Created
August 27, 2019 08:27
-
-
Save AeroXi/d4d273da9f443c0f2cf9f6d6872eeffe to your computer and use it in GitHub Desktop.
error log when pretrain on vcr
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2019-08-26 21:17:42.954423: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 2 Chunks of size 29364224 totalling 56.01MiB | |
2019-08-26 21:17:42.954434: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 29425664 totalling 28.06MiB | |
2019-08-26 21:17:42.954446: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 32751616 totalling 31.23MiB | |
2019-08-26 21:17:42.954458: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 6 Chunks of size 125018112 totalling 715.36MiB | |
2019-08-26 21:17:42.954469: I tensorflow/core/common_runtime/bfc_allocator.cc:658] Sum Total of in-use chunks: 10.14GiB | |
2019-08-26 21:17:42.954485: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Stats: | |
Limit: 10895235482 | |
InUse: 10891294208 | |
MaxInUse: 10891294208 | |
NumAllocs: 4208 | |
MaxAllocSize: 125018112 | |
2019-08-26 21:17:42.954655: W tensorflow/core/common_runtime/bfc_allocator.cc:275] **************************************************************************************************** | |
2019-08-26 21:17:42.954734: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[1024,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc | |
INFO:tensorflow:Error recorded from training_loop: OOM when allocating tensor with shape[1024,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc | |
[[{{node bert/encoder/layer_15/intermediate/dense/truediv}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_15/intermediate/dense/BiasAdd, ConstantFolding/gradients/bert/encoder/layer_0/intermediate/dense/truediv_grad/RealDiv_recip)]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
[[{{node add_1/_9593}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6797_add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
Caused by op 'bert/encoder/layer_15/intermediate/dense/truediv', defined at: | |
File "pretrain_on_vcr.py", line 467, in <module> | |
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train | |
saving_listeners=saving_listeners | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train | |
loss = self._train_model(input_fn, hooks, saving_listeners) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model | |
return self._train_model_default(input_fn, hooks, saving_listeners) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default | |
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2186, in _call_model_fn | |
features, labels, mode, config) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn | |
model_fn_results = self._model_fn(features=features, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2470, in _model_fn | |
features, labels, is_export_mode=is_export_mode) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1250, in call_without_tpu | |
return self._call_model_fn(features, labels, is_export_mode=is_export_mode) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1524, in _call_model_fn | |
estimator_spec = self._model_fn(features=features, **kwargs) | |
File "pretrain_on_vcr.py", line 148, in model_fn | |
use_one_hot_embeddings=use_one_hot_embeddings) | |
File "/data1/cx/r2c/data/get_bert_embeddings/modeling.py", line 216, in __init__ | |
do_return_all_layers=True) | |
File "/data1/cx/r2c/data/get_bert_embeddings/modeling.py", line 879, in transformer_model | |
kernel_initializer=create_initializer(initializer_range)) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/layers/core.py", line 184, in dense | |
return layer.apply(inputs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 828, in apply | |
return self.__call__(inputs, *args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 364, in __call__ | |
outputs = super(Layer, self).__call__(inputs, *args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 769, in __call__ | |
outputs = self.call(inputs, *args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py", line 951, in call | |
return self.activation(outputs) # pylint: disable=not-callable | |
File "/data1/cx/r2c/data/get_bert_embeddings/modeling.py", line 276, in gelu | |
cdf = 0.5 * (1.0 + tf.erf(input_tensor / tf.sqrt(2.0))) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 862, in binary_op_wrapper | |
return func(x, y, name=name) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 970, in _truediv_python3 | |
return gen_math_ops.real_div(x, y, name=name) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5989, in real_div | |
"RealDiv", x=x, y=y, name=name) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper | |
op_def=op_def) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func | |
return func(*args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op | |
op_def=op_def) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__ | |
self._traceback = tf_stack.extract_stack() | |
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc | |
[[{{node bert/encoder/layer_15/intermediate/dense/truediv}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_15/intermediate/dense/BiasAdd, ConstantFolding/gradients/bert/encoder/layer_0/intermediate/dense/truediv_grad/RealDiv_recip)]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
[[{{node add_1/_9593}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6797_add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
INFO:tensorflow:training_loop marked as finished | |
WARNING:tensorflow:Reraising captured error | |
Traceback (most recent call last): | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call | |
return fn(*args) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn | |
options, feed_dict, fetch_list, target_list, run_metadata) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun | |
run_metadata) | |
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1024,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc | |
[[{{node bert/encoder/layer_15/intermediate/dense/truediv}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_15/intermediate/dense/BiasAdd, ConstantFolding/gradients/bert/encoder/layer_0/intermediate/dense/truediv_grad/RealDiv_recip)]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
[[{{node add_1/_9593}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6797_add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
During handling of the above exception, another exception occurred: | |
Traceback (most recent call last): | |
File "pretrain_on_vcr.py", line 467, in <module> | |
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2400, in train | |
rendezvous.raise_errors() | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors | |
six.reraise(typ, value, traceback) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/six.py", line 693, in reraise | |
raise value | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train | |
saving_listeners=saving_listeners | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train | |
loss = self._train_model(input_fn, hooks, saving_listeners) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model | |
return self._train_model_default(input_fn, hooks, saving_listeners) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1215, in _train_model_default | |
saving_listeners) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1409, in _train_with_estimator_spec | |
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run | |
run_metadata=run_metadata) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1148, in run | |
run_metadata=run_metadata) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1239, in run | |
raise six.reraise(*original_exc_info) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/six.py", line 693, in reraise | |
raise value | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1224, in run | |
return self._sess.run(*args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1296, in run | |
run_metadata=run_metadata) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1076, in run | |
return self._sess.run(*args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 887, in run | |
run_metadata_ptr) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1110, in _run | |
feed_dict_tensor, options, run_metadata) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run | |
run_metadata) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call | |
raise type(e)(node_def, op, message) | |
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1024,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc | |
[[{{node bert/encoder/layer_15/intermediate/dense/truediv}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_15/intermediate/dense/BiasAdd, ConstantFolding/gradients/bert/encoder/layer_0/intermediate/dense/truediv_grad/RealDiv_recip)]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
[[{{node add_1/_9593}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6797_add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
Caused by op 'bert/encoder/layer_15/intermediate/dense/truediv', defined at: | |
File "pretrain_on_vcr.py", line 467, in <module> | |
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train | |
saving_listeners=saving_listeners | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train | |
loss = self._train_model(input_fn, hooks, saving_listeners) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model | |
return self._train_model_default(input_fn, hooks, saving_listeners) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default | |
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2186, in _call_model_fn | |
features, labels, mode, config) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn | |
model_fn_results = self._model_fn(features=features, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2470, in _model_fn | |
features, labels, is_export_mode=is_export_mode) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1250, in call_without_tpu | |
return self._call_model_fn(features, labels, is_export_mode=is_export_mode) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1524, in _call_model_fn | |
estimator_spec = self._model_fn(features=features, **kwargs) | |
File "pretrain_on_vcr.py", line 148, in model_fn | |
use_one_hot_embeddings=use_one_hot_embeddings) | |
File "/data1/cx/r2c/data/get_bert_embeddings/modeling.py", line 216, in __init__ | |
do_return_all_layers=True) | |
File "/data1/cx/r2c/data/get_bert_embeddings/modeling.py", line 879, in transformer_model | |
kernel_initializer=create_initializer(initializer_range)) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/layers/core.py", line 184, in dense | |
return layer.apply(inputs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 828, in apply | |
return self.__call__(inputs, *args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 364, in __call__ | |
outputs = super(Layer, self).__call__(inputs, *args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 769, in __call__ | |
outputs = self.call(inputs, *args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py", line 951, in call | |
return self.activation(outputs) # pylint: disable=not-callable | |
File "/data1/cx/r2c/data/get_bert_embeddings/modeling.py", line 276, in gelu | |
cdf = 0.5 * (1.0 + tf.erf(input_tensor / tf.sqrt(2.0))) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 862, in binary_op_wrapper | |
return func(x, y, name=name) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 970, in _truediv_python3 | |
return gen_math_ops.real_div(x, y, name=name) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5989, in real_div | |
"RealDiv", x=x, y=y, name=name) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper | |
op_def=op_def) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func | |
return func(*args, **kwargs) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op | |
op_def=op_def) | |
File "/home/yuweijiang/anaconda3/envs/vcr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__ | |
self._traceback = tf_stack.extract_stack() | |
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc | |
[[{{node bert/encoder/layer_15/intermediate/dense/truediv}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_15/intermediate/dense/BiasAdd, ConstantFolding/gradients/bert/encoder/layer_0/intermediate/dense/truediv_grad/RealDiv_recip)]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. | |
[[{{node add_1/_9593}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6797_add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] | |
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment