AndreasMadsen/gist:91e49e13f0085ececbef0f80c830c5af

## gistfile1.txt
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x118af890
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x118b36a0
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x118b74e0

INFO:tensorflow:0314:14:49:41.625:sg_train.py:312] Training started from epoch[001]-step[0].

W tensorflow/core/common_runtime/bfc_allocator.cc:274] **************************************************************************************_**********xxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 368.75MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[4096,400,1,59]
W tensorflow/core/common_runtime/bfc_allocator.cc:274] **************************************************************************************_**********xxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 368.75MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[4096,400,1,59]
W tensorflow/core/common_runtime/bfc_allocator.cc:274] **************************************************************************************_**********xxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 368.75MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[4096,400,1,59]
W tensorflow/core/common_runtime/bfc_allocator.cc:274] **************************************************************************************_**********xxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 368.75MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[4096,400,1,59]
Exception in thread Thread-441:
Traceback (most recent call last):
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/appl/python/3.6.0/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled
	 [[Node: random_shuffle_queue_1_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](random_shuffle_queue_1, fraction_of_32_full_DequeueMany)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/appl/python/3.6.0/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/appl/python/3.6.0/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 250, in _run
    coord.request_stop(e)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 211, in request_stop
    six.reraise(*sys.exc_info())
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/six.py", line 686, in reraise
    raise value
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 234, in _run
    sess.run(enqueue_op)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled
	 [[Node: random_shuffle_queue_1_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](random_shuffle_queue_1, fraction_of_32_full_DequeueMany)]]

Caused by op 'random_shuffle_queue_1_EnqueueMany', defined at:
  File "code/script/bytenet_europarl_train.py", line 21, in <module>
    validate=True)
  File "/zhome/ff/2/77654/workspace/kandidat/code/dataset/wmt_bilingual_news.py", line 117, in __init__
    **kwargs
  File "/zhome/ff/2/77654/workspace/kandidat/code/dataset/abstract/text_dataset.py", line 75, in __init__
    **kwargs)
  File "/zhome/ff/2/77654/workspace/kandidat/code/dataset/abstract/dataset.py", line 80, in __init__
    length, source, target = self.queue.read()
  File "/zhome/ff/2/77654/workspace/kandidat/code/dataset/util/sequence_queue/sequence_queue_memory.py", line 53, in read
    seed=self.seed
  File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/queue/shuffle_tensor_index.py", line 13, in shuffle_tensor_index
    shuffle_enqueue = shuffle_queue.enqueue_many([dequeue_op])
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 378, in enqueue_many
    self._queue_ref, vals, name=scope)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1538, in _queue_enqueue_many_v2
    name=name)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

CancelledError (see above for traceback): Enqueue operation was cancelled
	 [[Node: random_shuffle_queue_1_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](random_shuffle_queue_1, fraction_of_32_full_DequeueMany)]]


INFO:tensorflow:0315:04:23:36.454:sg_train.py:415] Training finished at epoch[1]-step[21334].
Traceback (most recent call last):
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/appl/python/3.6.0/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4096,400,1,59]
	 [[Node: train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:1"](train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Shape, bytenet-model/decoder/decoder-res-block.2.3.16/conv-dilated/W/read/_22785, train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d/BatchToSpaceND_grad/SpaceToBatchND)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_main.py", line 527, in sg_queue_context
    yield
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_train.py", line 340, in wrapper
    batch_loss = func(sess, opt)
  File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/train/basic_train.py", line 44, in train_func
    run_metadata=run_metadata)[0]
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4096,400,1,59]
	 [[Node: train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:1"](train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Shape, bytenet-model/decoder/decoder-res-block.2.3.16/conv-dilated/W/read/_22785, train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d/BatchToSpaceND_grad/SpaceToBatchND)]]

Caused by op 'train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Conv2DBackpropInput', defined at:
  File "code/script/bytenet_europarl_train.py", line 27, in <module>
    model.train(max_ep=60, lr=0.0001)
  File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 73, in train
    update = self._update_model(losses, lr=lr, **kwargs)
  File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 87, in _update_model
    return tower_optim(losses, **kwargs)
  File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/parallel/tower_optim.py", line 16, in tower_optim
    gradient = tower_gradient(opt, losses, var_list)
  File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/parallel/tower_gradient.py", line 12, in tower_gradient
    tower_grads.append(opt.compute_gradients(loss, var_list=var_list))
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 354, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 482, in gradients
    in_grads = grad_fn(op, *out_grads)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/nn_grad.py", line 359, in _Conv2DGrad
    op.get_attr("data_format")),
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 489, in conv2d_backprop_input
    data_format=data_format, name=name)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

...which was originally created as op 'bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d', defined at:
  File "code/script/bytenet_europarl_train.py", line 27, in <module>
    model.train(max_ep=60, lr=0.0001)
  File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 51, in train
    loss, losses = self.train_model(reuse=reuse)
  File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 40, in train_model
    **kwargs)
  File "/zhome/ff/2/77654/workspace/kandidat/code/model/bytenet.py", line 74, in loss_model
    name="bytenet-model"
  File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/bytenet_translator/bytenet_supervised_translator.py", line 54, in bytenet_supervised_translator
    name="decoder")
  File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/bytenet_decoder/parallel_bytenet_decoder.py", line 21, in parallel_bytenet_decoder
    name=f'decoder-res-block.{i}.{size}.{rate_i}'
  File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/decoder_residual_block/parallel_decoder_residual_block.py", line 77, in parallel_decoder_residual_block
    name="conv-dilated")
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_main.py", line 235, in wrapper
    out = func(tensor, opt)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_layer.py", line 197, in sg_aconv1d
    out = tf.nn.atrous_conv2d(x, w, rate=opt.rate, padding='VALID') + b
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 994, in atrous_conv2d
    name=name)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
    data_format=data_format, name=name)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096,400,1,59]
	 [[Node: train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:1"](train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Shape, bytenet-model/decoder/decoder-res-block.2.3.16/conv-dilated/W/read/_22785, train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d/BatchToSpaceND_grad/SpaceToBatchND)]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "code/script/bytenet_europarl_train.py", line 27, in <module>
    model.train(max_ep=60, lr=0.0001)
  File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 84, in train
    lr=lr)
  File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 92, in _train_loop
    basic_train(loss, update_op, **kwargs)
  File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/train/basic_train.py", line 55, in basic_train
    train_func(save_dir=save_dir, **kwargs)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_train.py", line 404, in wrapper
    break
  File "/appl/python/3.6.0/lib/python3.6/contextlib.py", line 100, in __exit__
    self.gen.throw(type, value, traceback)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_main.py", line 532, in sg_queue_context
    coord.join(threads)
  File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 390, in join
    " ".join(stragglers))
RuntimeError: Coordinator stopped with threads still running: Thread-104 Thread-116 Thread-140 Thread-119 Thread-143 Thread-118 Thread-142 Thread-108 Thread-120 Thread-144 Thread-123 Thread-147 Thread-122 Thread-146 Thread-149 Thread-125 Thread-112 Thread-124 Thread-148 Thread-115 Thread-127 Thread-151 Thread-114 Thread-126 Thread-150 Thread-261 Thread-260 Thread-262 Thread-37 Thread-264 Thread-36 Thread-267 Thread-39 Thread-266 Thread-38 Thread-268 Thread-43 Thread-270 Thread-42 Thread-129 Thread-128 Thread-131 Thread-130 Thread-137 Thread-136 Thread-139 Thread-138 Thread-273 Thread-272 Thread-274 Thread-286 Thread-276 Thread-288 Thread-290 Thread-278 Thread-293 Thread-280 Thread-292 Thread-16 Thread-282 Thread-294 Thread-7 Thread-83 Thread-82 Thread-154 Thread-9 Thread-8 Thread-85 Thread-157 Thread-84 Thread-156 Thread-10 Thread-159 Thread-86 Thread-158 Thread-13 Thread-12 Thread-89 Thread-161 Thread-88 Thread-160 Thread-15 Thread-14 Thread-91 Thread-163 Thread-90 Thread-162 Thread-17 Thread-20 Thread-68 Thread-308 Thread-71 Thread-23 Thread-22 Thread-70 Thread-73 Thread-313 Thread-312 Thread-72 Thread-24 Thread-315 Thread-27 Thread-75 Thread-314 Thread-74 Thread-26 Thread-29 Thread-316 Thread-76 Thread-28 Thread-31 Thread-318 Thread-30 Thread-297 Thread-92 Thread-296 Thread-299 Thread-95 Thread-94 Thread-298 Thread-97 Thread-96 Thread-99 Thread-303 Thread-98 Thread-302 Thread-101 Thread-100 Thread-304 Thread-306 Thread-102 Thread-213 Thread-357 Thread-333 Thread-56 Thread-57 Thread-188 Thread-212 Thread-320 Thread-332 Thread-383 Thread-382 Thread-335 Thread-215 Thread-322 Thread-334 Thread-58 Thread-190 Thread-214 Thread-325 Thread-385 Thread-360 Thread-384 Thread-193 Thread-337 Thread-217 Thread-216 Thread-324 Thread-60 Thread-192 Thread-327 Thread-363 Thread-362 Thread-219 Thread-338 Thread-62 Thread-194 Thread-218 Thread-389 Thread-388 Thread-221 Thread-328 Thread-340 Thread-197 Thread-64 Thread-196 Thread-220 Thread-391 Thread-366 Thread-390 Thread-222 Thread-342 Thread-223 Thread-330 Thread-66 Thread-198 Thread-177 Thread-176 Thread-179 Thread-178 Thread-181 Thread-180 Thread-442 Thread-441 Thread-182 Thread-444 Thread-443 Thread-185 Thread-184 Thread-445 Thread-187 Thread-186 Thread-448 Thread-447 Thread-345 Thread-369 Thread-393 Thread-344 Thread-368 Thread-392 Thread-461 Thread-371 Thread-395 Thread-346 Thread-370 Thread-394 Thread-464 Thread-463 Thread-349 Thread-373 Thread-397 Thread-348 Thread-372 Thread-396 Thread-466 Thread-465 Thread-351 Thread-399 Thread-350 Thread-398 Thread-468 Thread-467 Thread-353 Thread-401 Thread-352 Thread-376 Thread-400 Thread-470 Thread-469 Thread-379 Thread-403 Thread-354 Thread-378 Thread-402 Thread-472 Thread-471 Thread-225 Thread-201 Thread-224 Thread-200 Thread-474 Thread-498 Thread-449 Thread-473 Thread-497 Thread-227 Thread-226 Thread-476 Thread-488 Thread-531 Thread-451 Thread-475 Thread-487 Thread-228 Thread-204 Thread-454 Thread-453 Thread-477 Thread-489 Thread-504 Thread-207 Thread-206 Thread-230 Thread-456 Thread-480 Thread-492 Thread-455 Thread-479 Thread-491 Thread-233 Thread-232 Thread-208 Thread-494 Thread-457 Thread-493 Thread-211 Thread-235 Thread-210 Thread-234 Thread-496 Thread-459 Thread-495 Thread-237 Thread-236 Thread-59 Thread-238 Thread-241 Thread-240 Thread-243 Thread-242 Thread-336 Thread-244 Thread-246 Thread-18 Thread-326 Thread-164 Thread-166 Thread-169 Thread-168 Thread-356 Thread-171 Thread-170 Thread-173 Thread-172 Thread-358 Thread-331 Thread-67 Thread-45 Thread-44 Thread-248 Thread-251 Thread-191 Thread-250 Thread-253 Thread-49 Thread-252 Thread-48 Thread-50 Thread-254 Thread-256 Thread-52 Thread-364 Thread-55 Thread-258 Thread-54 Thread-386
	W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
	W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
	W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x118af890
	W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x118b36a0
	W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x118b74e0

	INFO:tensorflow:0314:14:49:41.625:sg_train.py:312] Training started from epoch[001]-step[0].

	W tensorflow/core/common_runtime/bfc_allocator.cc:274] ************************************************************************************_********xxx
	W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 368.75MiB. See logs for memory state.
	W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[4096,400,1,59]
	W tensorflow/core/common_runtime/bfc_allocator.cc:274] ************************************************************************************_********xxx
	W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 368.75MiB. See logs for memory state.
	W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[4096,400,1,59]
	W tensorflow/core/common_runtime/bfc_allocator.cc:274] ************************************************************************************_********xxx
	W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 368.75MiB. See logs for memory state.
	W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[4096,400,1,59]
	W tensorflow/core/common_runtime/bfc_allocator.cc:274] ************************************************************************************_********xxx
	W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 368.75MiB. See logs for memory state.
	W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[4096,400,1,59]
	Exception in thread Thread-441:
	Traceback (most recent call last):
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
	return fn(*args)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
	status, run_metadata)
	File "/appl/python/3.6.0/lib/python3.6/contextlib.py", line 89, in __exit__
	next(self.gen)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
	pywrap_tensorflow.TF_GetCode(status))
	tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled
	[[Node: random_shuffle_queue_1_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](random_shuffle_queue_1, fraction_of_32_full_DequeueMany)]]

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	File "/appl/python/3.6.0/lib/python3.6/threading.py", line 916, in _bootstrap_inner
	self.run()
	File "/appl/python/3.6.0/lib/python3.6/threading.py", line 864, in run
	self._target(self._args, *self._kwargs)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 250, in _run
	coord.request_stop(e)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 211, in request_stop
	six.reraise(*sys.exc_info())
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/six.py", line 686, in reraise
	raise value
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 234, in _run
	sess.run(enqueue_op)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 767, in run
	run_metadata_ptr)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 965, in _run
	feed_dict_string, options, run_metadata)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
	target_list, options, run_metadata)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
	raise type(e)(node_def, op, message)
	tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled
	[[Node: random_shuffle_queue_1_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](random_shuffle_queue_1, fraction_of_32_full_DequeueMany)]]

	Caused by op 'random_shuffle_queue_1_EnqueueMany', defined at:
	File "code/script/bytenet_europarl_train.py", line 21, in <module>
	validate=True)
	File "/zhome/ff/2/77654/workspace/kandidat/code/dataset/wmt_bilingual_news.py", line 117, in __init__
	**kwargs
	File "/zhome/ff/2/77654/workspace/kandidat/code/dataset/abstract/text_dataset.py", line 75, in __init__
	**kwargs)
	File "/zhome/ff/2/77654/workspace/kandidat/code/dataset/abstract/dataset.py", line 80, in __init__
	length, source, target = self.queue.read()
	File "/zhome/ff/2/77654/workspace/kandidat/code/dataset/util/sequence_queue/sequence_queue_memory.py", line 53, in read
	seed=self.seed
	File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/queue/shuffle_tensor_index.py", line 13, in shuffle_tensor_index
	shuffle_enqueue = shuffle_queue.enqueue_many([dequeue_op])
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 378, in enqueue_many
	self._queue_ref, vals, name=scope)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1538, in _queue_enqueue_many_v2
	name=name)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
	op_def=op_def)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
	original_op=self._default_original_op, op_def=op_def)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
	self._traceback = _extract_stack()

	CancelledError (see above for traceback): Enqueue operation was cancelled
	[[Node: random_shuffle_queue_1_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](random_shuffle_queue_1, fraction_of_32_full_DequeueMany)]]


	INFO:tensorflow:0315:04:23:36.454:sg_train.py:415] Training finished at epoch[1]-step[21334].
	Traceback (most recent call last):
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
	return fn(*args)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
	status, run_metadata)
	File "/appl/python/3.6.0/lib/python3.6/contextlib.py", line 89, in __exit__
	next(self.gen)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
	pywrap_tensorflow.TF_GetCode(status))
	tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4096,400,1,59]
	[[Node: train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:1"](train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Shape, bytenet-model/decoder/decoder-res-block.2.3.16/conv-dilated/W/read/_22785, train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d/BatchToSpaceND_grad/SpaceToBatchND)]]

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_main.py", line 527, in sg_queue_context
	yield
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_train.py", line 340, in wrapper
	batch_loss = func(sess, opt)
	File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/train/basic_train.py", line 44, in train_func
	run_metadata=run_metadata)[0]
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 767, in run
	run_metadata_ptr)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 965, in _run
	feed_dict_string, options, run_metadata)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
	target_list, options, run_metadata)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
	raise type(e)(node_def, op, message)
	tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4096,400,1,59]
	[[Node: train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:1"](train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Shape, bytenet-model/decoder/decoder-res-block.2.3.16/conv-dilated/W/read/_22785, train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d/BatchToSpaceND_grad/SpaceToBatchND)]]

	Caused by op 'train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Conv2DBackpropInput', defined at:
	File "code/script/bytenet_europarl_train.py", line 27, in <module>
	model.train(max_ep=60, lr=0.0001)
	File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 73, in train
	update = self._update_model(losses, lr=lr, **kwargs)
	File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 87, in _update_model
	return tower_optim(losses, **kwargs)
	File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/parallel/tower_optim.py", line 16, in tower_optim
	gradient = tower_gradient(opt, losses, var_list)
	File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/parallel/tower_gradient.py", line 12, in tower_gradient
	tower_grads.append(opt.compute_gradients(loss, var_list=var_list))
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 354, in compute_gradients
	colocate_gradients_with_ops=colocate_gradients_with_ops)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 482, in gradients
	in_grads = grad_fn(op, *out_grads)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/nn_grad.py", line 359, in _Conv2DGrad
	op.get_attr("data_format")),
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 489, in conv2d_backprop_input
	data_format=data_format, name=name)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
	op_def=op_def)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
	original_op=self._default_original_op, op_def=op_def)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
	self._traceback = _extract_stack()

	...which was originally created as op 'bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d', defined at:
	File "code/script/bytenet_europarl_train.py", line 27, in <module>
	model.train(max_ep=60, lr=0.0001)
	File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 51, in train
	loss, losses = self.train_model(reuse=reuse)
	File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 40, in train_model
	**kwargs)
	File "/zhome/ff/2/77654/workspace/kandidat/code/model/bytenet.py", line 74, in loss_model
	name="bytenet-model"
	File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/bytenet_translator/bytenet_supervised_translator.py", line 54, in bytenet_supervised_translator
	name="decoder")
	File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/bytenet_decoder/parallel_bytenet_decoder.py", line 21, in parallel_bytenet_decoder
	name=f'decoder-res-block.{i}.{size}.{rate_i}'
	File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/decoder_residual_block/parallel_decoder_residual_block.py", line 77, in parallel_decoder_residual_block
	name="conv-dilated")
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_main.py", line 235, in wrapper
	out = func(tensor, opt)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_layer.py", line 197, in sg_aconv1d
	out = tf.nn.atrous_conv2d(x, w, rate=opt.rate, padding='VALID') + b
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 994, in atrous_conv2d
	name=name)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
	data_format=data_format, name=name)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
	op_def=op_def)

	ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096,400,1,59]
	[[Node: train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:1"](train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d_grad/Shape, bytenet-model/decoder/decoder-res-block.2.3.16/conv-dilated/W/read/_22785, train/gradients_1/bytenet-model_1/decoder/bytenet-decoder-depth-2/decoder-res-block.2.3.16/conv-dilated/atrous_conv2d/BatchToSpaceND_grad/SpaceToBatchND)]]


	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	File "code/script/bytenet_europarl_train.py", line 27, in <module>
	model.train(max_ep=60, lr=0.0001)
	File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 84, in train
	lr=lr)
	File "/zhome/ff/2/77654/workspace/kandidat/code/model/abstract/model.py", line 92, in _train_loop
	basic_train(loss, update_op, **kwargs)
	File "/zhome/ff/2/77654/workspace/kandidat/code/tf_operator/train/basic_train.py", line 55, in basic_train
	train_func(save_dir=save_dir, **kwargs)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_train.py", line 404, in wrapper
	break
	File "/appl/python/3.6.0/lib/python3.6/contextlib.py", line 100, in __exit__
	self.gen.throw(type, value, traceback)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/sugartensor/sg_main.py", line 532, in sg_queue_context
	coord.join(threads)
	File "/zhome/ff/2/77654/stdpy3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 390, in join
	" ".join(stragglers))
	RuntimeError: Coordinator stopped with threads still running: Thread-104 Thread-116 Thread-140 Thread-119 Thread-143 Thread-118 Thread-142 Thread-108 Thread-120 Thread-144 Thread-123 Thread-147 Thread-122 Thread-146 Thread-149 Thread-125 Thread-112 Thread-124 Thread-148 Thread-115 Thread-127 Thread-151 Thread-114 Thread-126 Thread-150 Thread-261 Thread-260 Thread-262 Thread-37 Thread-264 Thread-36 Thread-267 Thread-39 Thread-266 Thread-38 Thread-268 Thread-43 Thread-270 Thread-42 Thread-129 Thread-128 Thread-131 Thread-130 Thread-137 Thread-136 Thread-139 Thread-138 Thread-273 Thread-272 Thread-274 Thread-286 Thread-276 Thread-288 Thread-290 Thread-278 Thread-293 Thread-280 Thread-292 Thread-16 Thread-282 Thread-294 Thread-7 Thread-83 Thread-82 Thread-154 Thread-9 Thread-8 Thread-85 Thread-157 Thread-84 Thread-156 Thread-10 Thread-159 Thread-86 Thread-158 Thread-13 Thread-12 Thread-89 Thread-161 Thread-88 Thread-160 Thread-15 Thread-14 Thread-91 Thread-163 Thread-90 Thread-162 Thread-17 Thread-20 Thread-68 Thread-308 Thread-71 Thread-23 Thread-22 Thread-70 Thread-73 Thread-313 Thread-312 Thread-72 Thread-24 Thread-315 Thread-27 Thread-75 Thread-314 Thread-74 Thread-26 Thread-29 Thread-316 Thread-76 Thread-28 Thread-31 Thread-318 Thread-30 Thread-297 Thread-92 Thread-296 Thread-299 Thread-95 Thread-94 Thread-298 Thread-97 Thread-96 Thread-99 Thread-303 Thread-98 Thread-302 Thread-101 Thread-100 Thread-304 Thread-306 Thread-102 Thread-213 Thread-357 Thread-333 Thread-56 Thread-57 Thread-188 Thread-212 Thread-320 Thread-332 Thread-383 Thread-382 Thread-335 Thread-215 Thread-322 Thread-334 Thread-58 Thread-190 Thread-214 Thread-325 Thread-385 Thread-360 Thread-384 Thread-193 Thread-337 Thread-217 Thread-216 Thread-324 Thread-60 Thread-192 Thread-327 Thread-363 Thread-362 Thread-219 Thread-338 Thread-62 Thread-194 Thread-218 Thread-389 Thread-388 Thread-221 Thread-328 Thread-340 Thread-197 Thread-64 Thread-196 Thread-220 Thread-391 Thread-366 Thread-390 Thread-222 Thread-342 Thread-223 Thread-330 Thread-66 Thread-198 Thread-177 Thread-176 Thread-179 Thread-178 Thread-181 Thread-180 Thread-442 Thread-441 Thread-182 Thread-444 Thread-443 Thread-185 Thread-184 Thread-445 Thread-187 Thread-186 Thread-448 Thread-447 Thread-345 Thread-369 Thread-393 Thread-344 Thread-368 Thread-392 Thread-461 Thread-371 Thread-395 Thread-346 Thread-370 Thread-394 Thread-464 Thread-463 Thread-349 Thread-373 Thread-397 Thread-348 Thread-372 Thread-396 Thread-466 Thread-465 Thread-351 Thread-399 Thread-350 Thread-398 Thread-468 Thread-467 Thread-353 Thread-401 Thread-352 Thread-376 Thread-400 Thread-470 Thread-469 Thread-379 Thread-403 Thread-354 Thread-378 Thread-402 Thread-472 Thread-471 Thread-225 Thread-201 Thread-224 Thread-200 Thread-474 Thread-498 Thread-449 Thread-473 Thread-497 Thread-227 Thread-226 Thread-476 Thread-488 Thread-531 Thread-451 Thread-475 Thread-487 Thread-228 Thread-204 Thread-454 Thread-453 Thread-477 Thread-489 Thread-504 Thread-207 Thread-206 Thread-230 Thread-456 Thread-480 Thread-492 Thread-455 Thread-479 Thread-491 Thread-233 Thread-232 Thread-208 Thread-494 Thread-457 Thread-493 Thread-211 Thread-235 Thread-210 Thread-234 Thread-496 Thread-459 Thread-495 Thread-237 Thread-236 Thread-59 Thread-238 Thread-241 Thread-240 Thread-243 Thread-242 Thread-336 Thread-244 Thread-246 Thread-18 Thread-326 Thread-164 Thread-166 Thread-169 Thread-168 Thread-356 Thread-171 Thread-170 Thread-173 Thread-172 Thread-358 Thread-331 Thread-67 Thread-45 Thread-44 Thread-248 Thread-251 Thread-191 Thread-250 Thread-253 Thread-49 Thread-252 Thread-48 Thread-50 Thread-254 Thread-256 Thread-52 Thread-364 Thread-55 Thread-258 Thread-54 Thread-386