|
GpuDnnBatchNorm has properties: ['inplace_running_xxx', 'inplace_running_var', 'inplace_output'] |
|
|
|
#################### |
|
debugprint before compilation: out and grads[0] |
|
GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=False, inplace_running_var=False, inplace_output=False}.0 [id A] '' |
|
|GpuContiguous [id B] '' |
|
| |GpuFromHost<None> [id C] '' |
|
| |x [id D] |
|
|GpuContiguous [id E] '' |
|
| |GpuFromHost<None> [id F] '' |
|
| |scale [id G] |
|
|GpuContiguous [id H] '' |
|
| |GpuFromHost<None> [id I] '' |
|
| |bias [id J] |
|
|Constant{0.005} [id K] |
|
|Constant{0.3} [id L] |
|
|GpuContiguous [id M] '' |
|
| |GpuFromHost<None> [id N] '' |
|
| |running_mean [id O] |
|
|GpuContiguous [id P] '' |
|
|GpuFromHost<None> [id Q] '' |
|
|running_var [id R] |
|
HostFromGpu(gpuarray) [id S] '' |
|
|GpuDnnBatchNormGrad{mode='per-activation'}.0 [id T] '' |
|
|GpuContiguous [id B] '' |
|
|GpuFromHost<None> [id U] '' |
|
| |dy [id V] |
|
|GpuContiguous [id E] '' |
|
|GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=False, inplace_running_var=False, inplace_output=False}.1 [id W] '' |
|
| |GpuContiguous [id B] '' |
|
| |GpuContiguous [id E] '' |
|
| |GpuContiguous [id H] '' |
|
| |Constant{0.005} [id K] |
|
| |Constant{0.3} [id L] |
|
| |GpuContiguous [id M] '' |
|
| |GpuContiguous [id P] '' |
|
|GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=False, inplace_running_var=False, inplace_output=False}.2 [id W] '' |
|
|Constant{0.005} [id K] |
|
|
|
#################### |
|
compiling |
|
|
|
#################### |
|
debugprint after compilation |
|
Elemwise{Add}[(0, 0)] [id A] '' 17 |
|
|InplaceDimShuffle{x,0,1,2,3,4} [id B] '' 14 |
|
| |HostFromGpu(gpuarray) [id C] '' 12 |
|
| |GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=True, inplace_running_var=True, inplace_output=False}.0 [id D] '' 11 |
|
| |GpuContiguous [id E] '' 10 |
|
| | |GpuFromHost<None> [id F] '' 4 |
|
| | |x [id G] |
|
| |GpuContiguous [id H] '' 9 |
|
| | |GpuFromHost<None> [id I] '' 3 |
|
| | |scale [id J] |
|
| |GpuContiguous [id K] '' 8 |
|
| | |GpuFromHost<None> [id L] '' 2 |
|
| | |bias [id M] |
|
| |Constant{0.005} [id N] |
|
| |Constant{0.3} [id O] |
|
| |GpuContiguous [id P] '' 7 |
|
| | |GpuFromHost<None> [id Q] '' 1 |
|
| | |running_mean [id R] |
|
| |GpuContiguous [id S] '' 6 |
|
| |GpuFromHost<None> [id T] '' 0 |
|
| |running_var [id U] |
|
|InplaceDimShuffle{x,0,1,2,3,4} [id V] '' 16 |
|
|HostFromGpu(gpuarray) [id W] '' 15 |
|
|GpuDnnBatchNormGrad{mode='per-activation'}.0 [id X] '' 13 |
|
|GpuContiguous [id E] '' 10 |
|
|GpuFromHost<None> [id Y] '' 5 |
|
| |dy [id Z] |
|
|GpuContiguous [id H] '' 9 |
|
|GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=True, inplace_running_var=True, inplace_output=False}.1 [id D] '' 11 |
|
|GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=True, inplace_running_var=True, inplace_output=False}.2 [id D] '' 11 |
|
|Constant{0.005} [id N] |
|
|
|
#################### |
|
toposort order after compilation |
|
0: GpuFromHost<None>(running_var) |
|
1: GpuFromHost<None>(running_mean) |
|
2: GpuFromHost<None>(bias) |
|
3: GpuFromHost<None>(scale) |
|
4: GpuFromHost<None>(x) |
|
5: GpuFromHost<None>(dy) |
|
6: GpuContiguous(GpuFromHost<None>.0) |
|
7: GpuContiguous(GpuFromHost<None>.0) |
|
8: GpuContiguous(GpuFromHost<None>.0) |
|
9: GpuContiguous(GpuFromHost<None>.0) |
|
10: GpuContiguous(GpuFromHost<None>.0) |
|
11: GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=True, inplace_running_var=True, inplace_output=False}(GpuContiguous.0, GpuContiguous.0, GpuContiguous.0, Constant{0.005}, Constant{0.3}, GpuContiguous.0, GpuContiguous.0) |
|
12: HostFromGpu(gpuarray)(GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=True, inplace_running_var=True, inplace_output=False}.0) |
|
13: GpuDnnBatchNormGrad{mode='per-activation'}(GpuContiguous.0, GpuFromHost<None>.0, GpuContiguous.0, GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=True, inplace_running_var=True, inplace_output=False}.1, GpuDnnBatchNorm{mode='per-activation', running_averages=True, inplace_running_xxx=True, inplace_running_var=True, inplace_output=False}.2, Constant{0.005}) |
|
14: InplaceDimShuffle{x,0,1,2,3,4}(HostFromGpu(gpuarray).0) |
|
15: HostFromGpu(gpuarray)(GpuDnnBatchNormGrad{mode='per-activation'}.0) |
|
16: InplaceDimShuffle{x,0,1,2,3,4}(HostFromGpu(gpuarray).0) |
|
17: Elemwise{Add}[(0, 0)](InplaceDimShuffle{x,0,1,2,3,4}.0, InplaceDimShuffle{x,0,1,2,3,4}.0) |
|
|
|
#################### |
|
some analysis |
|
GpuDnnBatchNorm has properties: ['inplace_running_xxx', 'inplace_running_var', 'inplace_output'] |
|
- There is one GpuDnnBatchNorm Op and one GpuDnnBatchNormGrad Op. |
|
- Both nodes share the same input[0]. |
|
- The GpuDnnBatchNorm Op does not work inplace on this input[0] (inplace_output=False). |
|
- The GpuDnnBatchNorm Op runs before the GpuDnnBatchNormGrad Op. |
|
Problem? No, this might work. |