sveitser/grad2_memcheck.txt

## grad2_memcheck.txt
(lasagne) lulu@lulus ➜  theano git:(master) ✗  CUDA_LAUNCH_BLOCKING=1 THEANO_FLAGS='device=gpu1,exception_verbosity=high,optimizer=fast_compile,allow_gc=False' cuda-memcheck --tool racecheck  python grad2.py
========= CUDA-MEMCHECK
Using gpu device 1: GeForce GTX 970 (CNMeM is disabled, CuDNN 4007)
========= Race reported between Read access at 0x00000490 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*)
=========     and Write access at 0x000004b8 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [60 hazards]
=========     and Write access at 0x00000488 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [60 hazards]
=========
========= Race reported between Read access at 0x00000418 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*)
=========     and Write access at 0x00000450 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [48 hazards]
=========     and Write access at 0x00000410 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [48 hazards]
=========
========= Race reported between Write access at 0x00000410 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*)
=========     and Read access at 0x00000418 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [48 hazards]
=========     and Read access at 0x000003d8 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [32 hazards]
=========
========= Race reported between Write access at 0x000003d0 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*)
=========     and Read access at 0x000003d8 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [32 hazards]
=========
========= Race reported between Read access at 0x00000458 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*)
=========     and Write access at 0x00000488 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [56 hazards]
=========     and Write access at 0x00000450 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [56 hazards]
=========
========= Race reported between Write access at 0x00000370 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*)
=========     and Read access at 0x000003a8 in kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0(unsigned int, float const *, float*) [64 hazards]
=========
/home/lulu/envs/lasagne/src/theano/theano/tensor/signal/downsample.py:5: UserWarning: downsample module has been moved to the pool module.
  warnings.warn("downsample module has been moved to the pool module.")
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0xb0e04b600 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0x7f26418d8270, self->devata=0xb0e04b600)
Traceback (most recent call last):
  File "/home/lulu/envs/lasagne/src/theano/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
RuntimeError: GpuDnnConv: error doing operation: CUDNN_STATUS_EXECUTION_FAILED

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "grad2.py", line 27, in <module>
    print(f(X, X))
  File "/home/lulu/envs/lasagne/src/theano/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/lulu/envs/lasagne/src/theano/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/lib/python3.5/site-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/lulu/envs/lasagne/src/theano/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
RuntimeError: GpuDnnConv: error doing operation: CUDNN_STATUS_EXECUTION_FAILED
Apply node that caused the error: GpuDnnConv{algo='small', inplace=False}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode=(1, 1), subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 87
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D), <theano.gof.type.CDataType object at 0x7f26423f0a90>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(1, 1, 5, 5), (64, 1, 3, 3), (1, 64, 5, 5), 'No shapes', (), ()]
Inputs strides: [(0, 0, 5, 1), (9, 0, 3, 1), (0, 25, 5, 1), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7f26401228d0>, 1.0, 0.0]
Inputs name: ('image', 'kernel', 'output', 'descriptor', 'alpha', 'beta')

Outputs clients: [[GpuDownsampleFactorMaxGradGrad{ds=(3, 3), ignore_border=True}(GpuDnnConv{algo='small', inplace=False}.0, GpuDnnPool{mode='max'}.0, GpuDnnConv{algo='small', inplace=False}.0)]]

Debugprint of the apply node:
GpuDnnConv{algo='small', inplace=False} [id A] <CudaNdarrayType(float32, 4D)> ''
 |GpuContiguous [id B] <CudaNdarrayType(float32, 4D)> ''
 | |GpuElemwise{mul,no_inplace} [id C] <CudaNdarrayType(float32, 4D)> ''
 |   |GpuElemwise{mul,no_inplace} [id D] <CudaNdarrayType(float32, 4D)> ''
 |   | |GpuElemwise{second,no_inplace} [id E] <CudaNdarrayType(float32, 4D)> ''
 |   | | |GpuElemwise{pow,no_inplace} [id F] <CudaNdarrayType(float32, 4D)> ''
 |   | | | |GpuElemwise{sub,no_inplace} [id G] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | |GpuDnnConvGradI{algo='none', inplace=False} [id H] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |GpuContiguous [id I] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | | |W [id J] <CudaNdarrayType(float32, 4D)>
 |   | | | | | |GpuContiguous [id K] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | | |GpuDnnPoolGrad{mode='max'} [id L] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |GpuContiguous [id M] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   | |GpuDnnConv{algo='small', inplace=False} [id N] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   |GpuContiguous [id O] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   | |GpuFromHost [id P] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   |   |X [id Q] <TensorType(float32, 4D)>
 |   | | | | | |   |   |GpuContiguous [id I] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   |GpuAllocEmpty [id R] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   | |Shape_i{0} [id S] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | |X [id Q] <TensorType(float32, 4D)>
 |   | | | | | |   |   | |Shape_i{0} [id T] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | |W [id J] <CudaNdarrayType(float32, 4D)>
 |   | | | | | |   |   | |Elemwise{add,no_inplace} [id U] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | |Elemwise{int_div,no_inplace} [id V] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | | |Elemwise{sub,no_inplace} [id W] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | | | |Elemwise{add,no_inplace} [id X] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | | | | |Shape_i{2} [id Y] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | | | | | |X [id Q] <TensorType(float32, 4D)>
 |   | | | | | |   |   | | | | | |TensorConstant{2} [id Z] <TensorType(int8, scalar)>
 |   | | | | | |   |   | | | | |Shape_i{2} [id BA] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | | |   |W [id J] <CudaNdarrayType(float32, 4D)>
 |   | | | | | |   |   | | | |TensorConstant{1} [id BB] <TensorType(int8, scalar)>
 |   | | | | | |   |   | | |TensorConstant{1} [id BB] <TensorType(int8, scalar)>
 |   | | | | | |   |   | |Elemwise{add,no_inplace} [id BC] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   |Elemwise{int_div,no_inplace} [id BD] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   | |Elemwise{sub,no_inplace} [id BE] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   | | |Elemwise{add,no_inplace} [id BF] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   | | | |Shape_i{3} [id BG] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   | | | | |X [id Q] <TensorType(float32, 4D)>
 |   | | | | | |   |   |   | | | |TensorConstant{2} [id Z] <TensorType(int8, scalar)>
 |   | | | | | |   |   |   | | |Shape_i{3} [id BH] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   | |   |W [id J] <CudaNdarrayType(float32, 4D)>
 |   | | | | | |   |   |   | |TensorConstant{1} [id BB] <TensorType(int8, scalar)>
 |   | | | | | |   |   |   |TensorConstant{1} [id BB] <TensorType(int8, scalar)>
 |   | | | | | |   |   |GpuDnnConvDesc{border_mode=(1, 1), subsample=(1, 1), conv_mode='conv', precision='float32'} [id BI] <CDataType{cudnnConvolutionDescriptor_t}> ''
 |   | | | | | |   |   | |MakeVector{dtype='int64'} [id BJ] <TensorType(int64, vector)> ''
 |   | | | | | |   |   | | |Shape_i{0} [id S] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | |Shape_i{1} [id BK] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | | |X [id Q] <TensorType(float32, 4D)>
 |   | | | | | |   |   | | |Shape_i{2} [id Y] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | |Shape_i{3} [id BG] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | |MakeVector{dtype='int64'} [id BL] <TensorType(int64, vector)> ''
 |   | | | | | |   |   |   |Shape_i{0} [id T] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   |Shape_i{1} [id BM] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   | |W [id J] <CudaNdarrayType(float32, 4D)>
 |   | | | | | |   |   |   |Shape_i{2} [id BA] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |   |Shape_i{3} [id BH] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   |Constant{1.0} [id BN] <float32>
 |   | | | | | |   |   |Constant{0.0} [id BO] <float32>
 |   | | | | | |   |GpuContiguous [id BP] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   | |GpuDnnPool{mode='max'} [id BQ] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   |GpuContiguous [id M] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   |TensorConstant{(2,) of 3} [id BR] <TensorType(int64, vector)>
 |   | | | | | |   |   |TensorConstant{(2,) of 2} [id BS] <TensorType(int64, vector)>
 |   | | | | | |   |   |TensorConstant{(2,) of 0} [id BT] <TensorType(int64, vector)>
 |   | | | | | |   |GpuContiguous [id BU] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   | |GpuReshape{4} [id BV] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   |GpuDot22 [id BW] <CudaNdarrayType(float32, matrix)> ''
 |   | | | | | |   |   | |GpuDot22 [id BX] <CudaNdarrayType(float32, matrix)> ''
 |   | | | | | |   |   | | |GpuReshape{2} [id BY] <CudaNdarrayType(float32, matrix)> ''
 |   | | | | | |   |   | | | |GpuDnnPool{mode='max'} [id BQ] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | |   |   | | | |MakeVector{dtype='int64'} [id BZ] <TensorType(int64, vector)> ''
 |   | | | | | |   |   | | |   |Shape_i{0} [id S] <TensorType(int64, scalar)> ''
 |   | | | | | |   |   | | |   |TensorConstant{-1} [id CA] <TensorType(int64, scalar)>
 |   | | | | | |   |   | | |W [id CB] <CudaNdarrayType(float32, matrix)>
 |   | | | | | |   |   | |GpuDimShuffle{1,0} [id CC] <CudaNdarrayType(float32, matrix)> ''
 |   | | | | | |   |   |   |W [id CB] <CudaNdarrayType(float32, matrix)>
 |   | | | | | |   |   |MakeVector{dtype='int64'} [id CD] <TensorType(int64, vector)> ''
 |   | | | | | |   |     |Shape_i{0} [id S] <TensorType(int64, scalar)> ''
 |   | | | | | |   |     |TensorConstant{64} [id CE] <TensorType(int64, scalar)>
 |   | | | | | |   |     |TensorConstant{2} [id CF] <TensorType(int64, scalar)>
 |   | | | | | |   |     |TensorConstant{2} [id CG] <TensorType(int64, scalar)>
 |   | | | | | |   |TensorConstant{(2,) of 3} [id BR] <TensorType(int64, vector)>
 |   | | | | | |   |TensorConstant{(2,) of 2} [id BS] <TensorType(int64, vector)>
 |   | | | | | |   |TensorConstant{(2,) of 0} [id BT] <TensorType(int64, vector)>
 |   | | | | | |GpuAllocEmpty [id CH] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | | |Shape_i{0} [id S] <TensorType(int64, scalar)> ''
 |   | | | | | | |Shape_i{1} [id BM] <TensorType(int64, scalar)> ''
 |   | | | | | | |Subtensor{int64} [id CI] <TensorType(int64, scalar)> ''
 |   | | | | | | | |Subtensor{int64::} [id CJ] <TensorType(int64, vector)> ''
 |   | | | | | | | | |GpuShape [id CK] <TensorType(int64, vector)> ''
 |   | | | | | | | | | |GpuFromHost [id P] <CudaNdarrayType(float32, 4D)> ''
 |   | | | | | | | | |Constant{-2} [id CL] <int64>
 |   | | | | | | | |Constant{0} [id CM] <int64>
 |   | | | | | | |Subtensor{int64} [id CN] <TensorType(int64, scalar)> ''
 |   | | | | | |   |Subtensor{int64::} [id CJ] <TensorType(int64, vector)> ''
 |   | | | | | |   |Constant{1} [id CO] <int64>
 |   | | | | | |GpuDnnConvDesc{border_mode=(1, 1), subsample=(1, 1), conv_mode='conv', precision='float32'} [id CP] <CDataType{cudnnConvolutionDescriptor_t}> ''
 |   | | | | | | |MakeVector{dtype='int64'} [id CQ] <TensorType(int64, vector)> ''
 |   | | | | | | | |Shape_i{0} [id S] <TensorType(int64, scalar)> ''
 |   | | | | | | | |Shape_i{1} [id BM] <TensorType(int64, scalar)> ''
 |   | | | | | | | |Subtensor{int64} [id CI] <TensorType(int64, scalar)> ''
 |   | | | | | | | |Subtensor{int64} [id CN] <TensorType(int64, scalar)> ''
 |   | | | | | | |MakeVector{dtype='int64'} [id BL] <TensorType(int64, vector)> ''
 |   | | | | | |Constant{1.0} [id BN] <float32>
 |   | | | | | |Constant{0.0} [id BO] <float32>
 |   | | | | |GpuFromHost [id CR] <CudaNdarrayType(float32, 4D)> ''
 |   | | | |   |X2 [id CS] <TensorType(float32, 4D)>
 |   | | | |CudaNdarrayConstant{error while transferring the value: error (an illegal memory access was encountered)copying data to host} [id CT] <CudaNdarrayType(float32, (True, True, True, True))>
 |   | | |GpuDimShuffle{x,x,x,x} [id CU] <CudaNdarrayType(float32, (True, True, True, True))> ''
 |   | |   |GpuElemwise{true_div,no_inplace} [id CV] <CudaNdarrayType(float32, scalar)> ''
 |   | |     |GpuElemwise{true_div,no_inplace} [id CW] <CudaNdarrayType(float32, scalar)> ''
 |   | |     | |GpuElemwise{true_div,no_inplace} [id CX] <CudaNdarrayType(float32, scalar)> ''
 |   | |     | | |GpuElemwise{true_div,no_inplace} [id CY] <CudaNdarrayType(float32, scalar)> ''
 |   | |     | | | |CudaNdarrayConstant{error while transferring the value: error (an illegal memory access was encountered)copying data to host} [id CZ] <CudaNdarrayType(float32, scalar)>
 |   | |     | | | |GpuSubtensor{int64} [id DA] <CudaNdarrayType(float32, scalar)> ''
 |   | |     | | |   |GpuFromHost [id DB] <CudaNdarrayType(float32, vector)> ''
 |   | |     | | |   | |Elemwise{Cast{float32}} [id DC] <TensorType(float32, vector)> ''
 |   | |     | | |   |   |MakeVector{dtype='int64'} [id DD] <TensorType(int64, vector)> ''
 |   | |     | | |   |     |Shape_i{0} [id S] <TensorType(int64, scalar)> ''
 |   | |     | | |   |     |TensorConstant{1} [id DE] <TensorType(int64, scalar)>
 |   | |     | | |   |     |TensorConstant{5} [id DF] <TensorType(int64, scalar)>
 |   | |     | | |   |     |TensorConstant{5} [id DF] <TensorType(int64, scalar)>
 |   | |     | | |   |Constant{3} [id DG] <int64>
 |   | |     | | |GpuSubtensor{int64} [id DH] <CudaNdarrayType(float32, scalar)> ''
 |   | |     | |   |GpuFromHost [id DB] <CudaNdarrayType(float32, vector)> ''
 |   | |     | |   |Constant{2} [id DI] <int64>
 |   | |     | |GpuSubtensor{int64} [id DJ] <CudaNdarrayType(float32, scalar)> ''
 |   | |     |   |GpuFromHost [id DB] <CudaNdarrayType(float32, vector)> ''
 |   | |     |   |Constant{1} [id CO] <int64>
 |   | |     |GpuSubtensor{int64} [id DK] <CudaNdarrayType(float32, scalar)> ''
 |   | |       |GpuFromHost [id DB] <CudaNdarrayType(float32, vector)> ''
 |   | |       |Constant{0} [id CM] <int64>
 |   | |CudaNdarrayConstant{error while transferring the value: error (an illegal memory access was encountered)copying data to host} [id CT] <CudaNdarrayType(float32, (True, True, True, True))>
 |   |GpuElemwise{pow,no_inplace} [id DL] <CudaNdarrayType(float32, 4D)> ''
 |     |GpuElemwise{sub,no_inplace} [id G] <CudaNdarrayType(float32, 4D)> ''
 |     |CudaNdarrayConstant{error while transferring the value: error (an illegal memory access was encountered)copying data to host} [id DM] <CudaNdarrayType(float32, (True, True, True, True))>
 |GpuContiguous [id I] <CudaNdarrayType(float32, 4D)> ''
 |GpuAllocEmpty [id DN] <CudaNdarrayType(float32, 4D)> ''
 | |Shape_i{0} [id S] <TensorType(int64, scalar)> ''
 | |Shape_i{0} [id T] <TensorType(int64, scalar)> ''
 | |Elemwise{add,no_inplace} [id DO] <TensorType(int64, scalar)> ''
 | | |Elemwise{int_div,no_inplace} [id DP] <TensorType(int64, scalar)> ''
 | | | |Elemwise{sub,no_inplace} [id DQ] <TensorType(int64, scalar)> ''
 | | | | |TensorConstant{7} [id DR] <TensorType(int64, scalar)>
 | | | | |Shape_i{2} [id BA] <TensorType(int64, scalar)> ''
 | | | |TensorConstant{1} [id BB] <TensorType(int8, scalar)>
 | | |TensorConstant{1} [id BB] <TensorType(int8, scalar)>
 | |Elemwise{add,no_inplace} [id DS] <TensorType(int64, scalar)> ''
 |   |Elemwise{int_div,no_inplace} [id DT] <TensorType(int64, scalar)> ''
 |   | |Elemwise{sub,no_inplace} [id DU] <TensorType(int64, scalar)> ''
 |   | | |TensorConstant{7} [id DR] <TensorType(int64, scalar)>
 |   | | |Shape_i{3} [id BH] <TensorType(int64, scalar)> ''
 |   | |TensorConstant{1} [id BB] <TensorType(int8, scalar)>
 |   |TensorConstant{1} [id BB] <TensorType(int8, scalar)>
 |GpuDnnConvDesc{border_mode=(1, 1), subsample=(1, 1), conv_mode='conv', precision='float32'} [id DV] <CDataType{cudnnConvolutionDescriptor_t}> ''
 | |MakeVector{dtype='int64'} [id DD] <TensorType(int64, vector)> ''
 | |MakeVector{dtype='int64'} [id BL] <TensorType(int64, vector)> ''
 |Constant{1.0} [id BN] <float32>
 |Constant{0.0} [id BO] <float32>

Storage map footprint:
 - GpuDimShuffle{1,0}.0, Shape: (128, 256), ElemSize: 4 Byte(s), TotalSize: 131072 Byte(s)
 - W, Shared Input, Shape: (256, 128), ElemSize: 4 Byte(s), TotalSize: 131072 Byte(s)
 - GpuElemwise{second,no_inplace}.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuAllocEmpty.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuDnnConv{algo='small', inplace=False}.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuElemwise{add,no_inplace}.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuContiguous.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuAllocEmpty.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuDnnPoolGrad{mode='max'}.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuDnnPoolGrad{mode='max'}.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuContiguous.0, Shape: (1, 64, 5, 5), ElemSize: 4 Byte(s), TotalSize: 6400 Byte(s)
 - GpuContiguous.0, Shape: (64, 1, 3, 3), ElemSize: 4 Byte(s), TotalSize: 2304 Byte(s)
 - W, Shared Input, Shape: (64, 1, 3, 3), ElemSize: 4 Byte(s), TotalSize: 2304 Byte(s)
 - GpuDot22.0, Shape: (1, 256), ElemSize: 4 Byte(s), TotalSize: 1024 Byte(s)
 - GpuContiguous.0, Shape: (1, 64, 2, 2), ElemSize: 4 Byte(s), TotalSize: 1024 Byte(s)
 - GpuElemwise{second,no_inplace}.0, Shape: (1, 64, 2, 2), ElemSize: 4 Byte(s), TotalSize: 1024 Byte(s)
 - GpuReshape{4}.0, Shape: (1, 64, 2, 2), ElemSize: 4 Byte(s), TotalSize: 1024 Byte(s)
 - GpuContiguous.0, Shape: (1, 64, 2, 2), ElemSize: 4 Byte(s), TotalSize: 1024 Byte(s)
 - GpuContiguous.0, Shape: (1, 64, 2, 2), ElemSize: 4 Byte(s), TotalSize: 1024 Byte(s)
 - GpuReshape{2}.0, Shape: (1, 256), ElemSize: 4 Byte(s), TotalSize: 1024 Byte(s)
 - GpuDnnPool{mode='max'}.0, Shape: (1, 64, 2, 2), ElemSize: 4 Byte(s), TotalSize: 1024 Byte(s)
 - GpuDot22.0, Shape: (1, 128), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
 - GpuElemwise{second,no_inplace}.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuContiguous.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuContiguous.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuElemwise{mul,no_inplace}.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuFromHost.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - X, Input, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - X2, Input, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuDnnConvGradI{algo='none', inplace=False}.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuFromHost.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuElemwise{mul,no_inplace}.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuElemwise{sub,no_inplace}.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuElemwise{pow,no_inplace}.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuElemwise{pow,no_inplace}.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - GpuAllocEmpty.0, Shape: (1, 1, 5, 5), ElemSize: 4 Byte(s), TotalSize: 100 Byte(s)
 - MakeVector{dtype='int64'}.0, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - MakeVector{dtype='int64'}.0, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - MakeVector{dtype='int64'}.0, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - MakeVector{dtype='int64'}.0, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - MakeVector{dtype='int64'}.0, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - GpuShape.0, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - TensorConstant{(2,) of 0}, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
 - Subtensor{int64::}.0, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
 - TensorConstant{(2,) of 3}, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
 - MakeVector{dtype='int64'}.0, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
 - TensorConstant{(2,) of 2}, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
 - Elemwise{Cast{float32}}.0, Shape: (4,), ElemSize: 4 Byte(s), TotalSize: 16 Byte(s)
 - GpuFromHost.0, Shape: (4,), ElemSize: 4 Byte(s), TotalSize: 16 Byte(s)
 - TensorConstant{2}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{5}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Shape_i{2}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Shape_i{1}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{int_div,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Shape_i{3}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{2}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Shape_i{3}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{2}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Shape_i{2}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{sub,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{-2}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{int_div,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Shape_i{1}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{add,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{sub,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Shape_i{0}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Shape_i{0}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Subtensor{int64}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{sub,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{int_div,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{3}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{int_div,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{7}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Subtensor{int64}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - TensorConstant{64}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Elemwise{sub,no_inplace}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - GpuElemwise{true_div,no_inplace}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - GpuSubtensor{int64}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - GpuSubtensor{int64}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - CudaNdarrayConstant{error while transferring the value: error (an illegal memory access was encountered)copying data to host}, Shape: (1, 1, 1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
 - GpuElemwise{true_div,no_inplace}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - Constant{1.0}, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - GpuSubtensor{int64}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - CudaNdarrayConstant{error while transferring the value: error (an illegal memory access was encountered)copying data to host}, Shape: (1, 1, 1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
 - GpuSubtensor{int64}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - GpuDimShuffle{x,x,x,x}.0, Shape: (1, 1, 1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
 - Constant{0.0}, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - CudaNdarrayConstant{error while transferring the value: error (an illegal memory access was encountered)copying data to host}, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - CudaNdarrayConstant{error while transferring the value: error (an illegal memory access was encountered)copying data to host}, Shape: (1, 1, 1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
 - GpuElemwise{true_div,no_inplace}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - GpuElemwise{true_div,no_inplace}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
 - TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
 - TensorConstant{2}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
 - TensorConstant{-1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
 TotalSize: 183575.0 Byte(s) 0.000 GB
 TotalSize inputs: 133747.0 Byte(s) 0.000 GB

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
========= RACECHECK SUMMARY: 6 hazards displayed (6 errors, 0 warnings)