Skip to content

Instantly share code, notes, and snippets.

@Mezzano
Created July 29, 2016 11:56
Show Gist options
  • Save Mezzano/4c6933f15969124775998bf4338abe0a to your computer and use it in GitHub Desktop.
Save Mezzano/4c6933f15969124775998bf4338abe0a to your computer and use it in GitHub Desktop.
This file has been truncated, but you can view the full file.
args: ./deepcl_unittests --gtest_filter=-DATA*:SLOW*
Note: Google Test filter = -DATA*:SLOW*
[==========] Running 158 tests from 29 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from testClBlas
[ RUN ] testClBlas.basic
DEBUG TANGUY: 18200632Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
clblas teardown
unknown file: Failure
C++ exception with description "clblasSgemm() failed with -11" thrown in the test body.
[ FAILED ] testClBlas.basic (77 ms)
[ RUN ] testClBlas.transA
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
1 2 9
3 7 5
initializing clblas
clblas teardown
unknown file: Failure
C++ exception with description "clblasSgemm() failed with -11" thrown in the test body.
[ FAILED ] testClBlas.transA (52 ms)
[ RUN ] testClBlas.transB
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
3
-1
initializing clblas
clblas teardown
unknown file: Failure
C++ exception with description "clblasSgemm() failed with -11" thrown in the test body.
[ FAILED ] testClBlas.transB (55 ms)
[ RUN ] testClBlas.colMajor
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
clblas teardown
unknown file: Failure
C++ exception with description "clblasSgemm() failed with -11" thrown in the test body.
[ FAILED ] testClBlas.colMajor (50 ms)
[ RUN ] testClBlas.colMajor2
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
clblas teardown
unknown file: Failure
C++ exception with description "clblasSgemm() failed with -11" thrown in the test body.
[ FAILED ] testClBlas.colMajor2 (48 ms)
[ RUN ] testClBlas.colMajorTransA
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
clblas teardown
unknown file: Failure
C++ exception with description "clblasSgemm() failed with -11" thrown in the test body.
[ FAILED ] testClBlas.colMajorTransA (43 ms)
[ RUN ] testClBlas.colMajorTransB
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
clblas teardown
unknown file: Failure
C++ exception with description "clblasSgemm() failed with -11" thrown in the test body.
[ FAILED ] testClBlas.colMajorTransB (51 ms)
[----------] 7 tests from testClBlas (377 ms total)
[----------] 1 test from testDeepCL
[ RUN ] testDeepCL.basic
unknown file: Failure
C++ exception with description "No devices found" thrown in the test body.
[ FAILED ] testDeepCL.basic (0 ms)
[----------] 1 test from testDeepCL (0 ms total)
[----------] 23 tests from testupdateweights
[ RUN ] testupdateweights.conv1
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
layer 0:InputLayer{ outputPlanes=2 outputSize=5 }
layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
layer 2:SquareLossLayer{}
layer 0:InputLayer{ outputPlanes=2 outputSize=5 }
layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
layer 2:SquareLossLayer{}
batchSize: 4
inputtotalsize=200 outputTotalSize=72
layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
weightsize=36 biassize=0
statefultimer v0.7
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 1: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 2
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
ForwardAuto: kernel 2: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
... not valid
forward try kernel 3
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 3: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 4
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 4: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 5
ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
... not valid
forward try kernel 6
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 6: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 7
... seems valid
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 3
12: //#define gFilterSize 3
13: //#define gSize 5
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 3;
25: index /= 3;
26: int h_out = index % 3;
27: int channel_in = index / 3;
28: int channel_out = channel_in * 3 * 3;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 3 + h_out) * 3 + w_out;
32: data_im += (channel_in * 5 + h_in) * 5 + w_in;
33: for (int i = 0; i < 3; ++i) {
34: for (int j = 0; j < 3; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 5 && w < 5) ?
38: data_im[i * 5 + j] : 0;
39: data_col += 3 * 3;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 5 + 0;
54: int h = (index / 5) % 5 + 0;
55: int c = index / (5 * 5);
56: // compute the start and end of the output
57: int w_col_start = (w < 3) ? 0 : (w - 3) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 3);
59: int h_col_start = (h < 3) ? 0 : (h - 3) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 3);
61:
62: int offset = (c * 3 * 3 + h * 3 + w) * 3 * 3;
63: int coeff_h_col = (1 - 1 * 3 * 3) * 3;
64: int coeff_w_col = (1 - 1 * 3 * 3);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
ForwardAuto: kernel 7 this instance cant be used:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 3
12: //#define gFilterSize 3
13: //#define gSize 5
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 3;
25: index /= 3;
26: int h_out = index % 3;
27: int channel_in = index / 3;
28: int channel_out = channel_in * 3 * 3;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 3 + h_out) * 3 + w_out;
32: data_im += (channel_in * 5 + h_in) * 5 + w_in;
33: for (int i = 0; i < 3; ++i) {
34: for (int j = 0; j < 3; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 5 && w < 5) ?
38: data_im[i * 5 + j] : 0;
39: data_col += 3 * 3;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 5 + 0;
54: int h = (index / 5) % 5 + 0;
55: int c = index / (5 * 5);
56: // compute the start and end of the output
57: int w_col_start = (w < 3) ? 0 : (w - 3) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 3);
59: int h_col_start = (h < 3) ? 0 : (h - 3) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 3);
61:
62: int offset = (c * 3 * 3 + h * 3 + w) * 3 * 3;
63: int coeff_h_col = (1 - 1 * 3 * 3) * 3;
64: int coeff_w_col = (1 - 1 * 3 * 3);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
forward kernel 0: cannot be used
forward kernel 1: cannot be used
forward kernel 2: cannot be used
forward kernel 3: cannot be used
forward kernel 4: cannot be used
forward kernel 5: cannot be used
forward kernel 6: cannot be used
forward kernel 7: cannot be used
clblas teardown
unknown file: Failure
C++ exception with description "No valid forward implementations found" thrown in the test body.
[ FAILED ] testupdateweights.conv1 (147 ms)
[ RUN ] testupdateweights.conv1z
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
layer 0:InputLayer{ outputPlanes=2 outputSize=3 }
layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
layer 2:SquareLossLayer{}
layer 0:InputLayer{ outputPlanes=2 outputSize=3 }
layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
layer 2:SquareLossLayer{}
batchSize: 4
inputtotalsize=72 outputTotalSize=72
layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
weightsize=36 biassize=0
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 1: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 2
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
ForwardAuto: kernel 2: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
... not valid
forward try kernel 3
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 3: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 4
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 4: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 5
ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, padzeros must be disabled
... not valid
forward try kernel 6
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 6: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 7
... seems valid
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 1
10: //#define gStride 1
11: //#define gColSize 3
12: //#define gFilterSize 3
13: //#define gSize 3
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 3;
25: index /= 3;
26: int h_out = index % 3;
27: int channel_in = index / 3;
28: int channel_out = channel_in * 3 * 3;
29: int h_in = h_out * 1 - 1;
30: int w_in = w_out * 1 - 1;
31: data_col += (channel_out * 3 + h_out) * 3 + w_out;
32: data_im += (channel_in * 3 + h_in) * 3 + w_in;
33: for (int i = 0; i < 3; ++i) {
34: for (int j = 0; j < 3; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 3 && w < 3) ?
38: data_im[i * 3 + j] : 0;
39: data_col += 3 * 3;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 3 + 1;
54: int h = (index / 3) % 3 + 1;
55: int c = index / (3 * 3);
56: // compute the start and end of the output
57: int w_col_start = (w < 3) ? 0 : (w - 3) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 3);
59: int h_col_start = (h < 3) ? 0 : (h - 3) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 3);
61:
62: int offset = (c * 3 * 3 + h * 3 + w) * 3 * 3;
63: int coeff_h_col = (1 - 1 * 3 * 3) * 3;
64: int coeff_w_col = (1 - 1 * 3 * 3);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
ForwardAuto: kernel 7 this instance cant be used:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 1
10: //#define gStride 1
11: //#define gColSize 3
12: //#define gFilterSize 3
13: //#define gSize 3
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 3;
25: index /= 3;
26: int h_out = index % 3;
27: int channel_in = index / 3;
28: int channel_out = channel_in * 3 * 3;
29: int h_in = h_out * 1 - 1;
30: int w_in = w_out * 1 - 1;
31: data_col += (channel_out * 3 + h_out) * 3 + w_out;
32: data_im += (channel_in * 3 + h_in) * 3 + w_in;
33: for (int i = 0; i < 3; ++i) {
34: for (int j = 0; j < 3; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 3 && w < 3) ?
38: data_im[i * 3 + j] : 0;
39: data_col += 3 * 3;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 3 + 1;
54: int h = (index / 3) % 3 + 1;
55: int c = index / (3 * 3);
56: // compute the start and end of the output
57: int w_col_start = (w < 3) ? 0 : (w - 3) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 3);
59: int h_col_start = (h < 3) ? 0 : (h - 3) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 3);
61:
62: int offset = (c * 3 * 3 + h * 3 + w) * 3 * 3;
63: int coeff_h_col = (1 - 1 * 3 * 3) * 3;
64: int coeff_w_col = (1 - 1 * 3 * 3);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
forward kernel 0: cannot be used
forward kernel 1: cannot be used
forward kernel 2: cannot be used
forward kernel 3: cannot be used
forward kernel 4: cannot be used
forward kernel 5: cannot be used
forward kernel 6: cannot be used
forward kernel 7: cannot be used
clblas teardown
unknown file: Failure
C++ exception with description "No valid forward implementations found" thrown in the test body.
[ FAILED ] testupdateweights.conv1z (141 ms)
[ RUN ] testupdateweights.numericallytest
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest (56 ms)
[ RUN ] testupdateweights.numericallytest_imagesize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize3 (66 ms)
[ RUN ] testupdateweights.numericallytest_imagesize5
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=5 -DgOutputSizeSquared=25 -DgInputSize=5 -DgInputSizeSquared=25 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=5 -DgOutputSizeSquared=25 -DgInputSize=5 -DgInputSizeSquared=25 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=5 -DgOutputSizeSquared=25 -DgInputSize=5 -DgInputSizeSquared=25 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize5 (66 ms)
[ RUN ] testupdateweights.numericallytest_imagesize9
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=9 -DgOutputSizeSquared=81 -DgInputSize=9 -DgInputSizeSquared=81 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=9 -DgOutputSizeSquared=81 -DgInputSize=9 -DgInputSizeSquared=81 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=9 -DgOutputSizeSquared=81 -DgInputSize=9 -DgInputSizeSquared=81 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize9 (57 ms)
[ RUN ] testupdateweights.numericallytest_imagesize9_filtersize9
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize9_filtersize9 (56 ms)
[ RUN ] testupdateweights.numericallytest_imagesize9_filtersize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=7 -DgOutputSizeSquared=49 -DgInputSize=7 -DgInputSizeSquared=49 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=7 -DgOutputSizeSquared=49 -DgInputSize=7 -DgInputSizeSquared=49 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=7 -DgOutputSizeSquared=49 -DgInputSize=7 -DgInputSizeSquared=49 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize9_filtersize3 (67 ms)
[ RUN ] testupdateweights.numericallytest_imagesize3_filtersize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize3_filtersize3 (68 ms)
[ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize5_filtersize3 (68 ms)
[ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_batchsize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize5_filtersize3_batchsize3 (69 ms)
[ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3 (70 ms)
[ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3_batchsize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=1 -D TANH"
" thrown in the test body.
[ FAILED ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3_batchsize3 (71 ms)
[ RUN ] testupdateweights.backprop_weights_2
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2 (25 ms)
[ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize2
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=4 -DgInputStripeOuterSize=4 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=4 -DgInputStripeOuterSize=4 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=4 -DgInputStripeOuterSize=4 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=4 -DgInputStripeOuterSize=4 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2_upstreamimagesize2 (30 ms)
[ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize3 (25 ms)
[ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize4_filtersize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=8 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=8 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=8 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=8 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=8 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=8 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=8 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=8 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2_upstreamimagesize4_filtersize3 (34 ms)
[ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize5_filtersize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2_upstreamimagesize5_filtersize3 (50 ms)
[ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize1
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=3 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=9 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=3 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=9 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=3 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=9 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=3 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=9 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize1 (52 ms)
[ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize16_filtersize1
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=16 -D gInputSizeSquared=256 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=16 -D gOutputSizeSquared=256 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=8 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=32 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=32
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=16 -D gInputSizeSquared=256 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=16 -D gOutputSizeSquared=256 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=8 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=32 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=16 -D gInputSizeSquared=256 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=16 -D gOutputSizeSquared=256 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=8 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=32 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=32"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=16 -D gInputSizeSquared=256 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=16 -D gOutputSizeSquared=256 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=8 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=32 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=32"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2_upstreamimagesize16_filtersize1 (48 ms)
[ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1
LayerDimensions{ inputPlanes=1 inputSize=17 numFilters=1 filterSize=1 outputSize=17 padZeros=0 biased=0 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=17 -DgInputStripeOuterSize=17 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=34
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=17 -DgInputStripeOuterSize=17 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=34"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=17 -DgInputStripeOuterSize=17 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=34"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=17 -DgInputStripeOuterSize=17 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=34"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1 (54 ms)
[ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1_moredata
expectedresult: -958.715
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=17 -DgInputStripeOuterSize=17 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=34
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=17 -DgInputStripeOuterSize=17 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=34"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=17 -DgInputStripeOuterSize=17 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=34"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=17 -DgInputStripeOuterSize=17 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=34"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1_moredata (57 ms)
[ RUN ] testupdateweights.backprop_instance3_smaller2
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
numweights: 36
options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=96 -D gInputSizeSquared=9216 -D gNumFilters=1 -D gFilterSize=6 -D gHalfFilterSize=3 -D gFilterSizeSquared=36 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=91 -D gOutputSizeSquared=8281 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=512 -DgInputStripeMarginRows=5 -DgInputStripeInnerNumRows=0 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=0 -DgInputStripeOuterSize=960 -DgInputStripeMarginSize=480 -DgOutputStripeNumRows=1 -DgOutputStripeSize=91
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=96 -D gInputSizeSquared=9216 -D gNumFilters=1 -D gFilterSize=6 -D gHalfFilterSize=3 -D gFilterSizeSquared=36 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=91 -D gOutputSizeSquared=8281 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=512 -DgInputStripeMarginRows=5 -DgInputStripeInnerNumRows=0 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=0 -DgInputStripeOuterSize=960 -DgInputStripeMarginSize=480 -DgOutputStripeNumRows=1 -DgOutputStripeSize=91"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=96 -D gInputSizeSquared=9216 -D gNumFilters=1 -D gFilterSize=6 -D gHalfFilterSize=3 -D gFilterSizeSquared=36 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=91 -D gOutputSizeSquared=8281 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=512 -DgInputStripeMarginRows=5 -DgInputStripeInnerNumRows=0 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=0 -DgInputStripeOuterSize=960 -DgInputStripeMarginSize=480 -DgOutputStripeNumRows=1 -DgOutputStripeSize=91"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // BIASED (or not)
9:
10: // workgroupId: [outputPlane][inputPlane]
11: // localId: [filterRow][filterCol]
12: // per-thread iteration: [n][outputRow][outputCol]
13: // local: errorimage: outputSize * outputSize
14: // imageimage: inputSize * inputSize
15: // specific characteristic: load one stripe of each image at a time,
16: // so we dont run out of memory
17: // number of stripes set in: gNumStripes
18: // note that whilst we can stripe the gradOutput simply,
19: // we actually need to add a half-filter widthed additional few rows
20: // onto the images stripe, otherwise we will be missing data
21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
22: // the outersize, including the two margins is: gInputStripeOuterSize
23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
24: // corresponding outer margin would be
25: void kernel backprop_floats_withscratch_dobias_striped(
26: const float learningRateMultiplier, const int batchSize,
27: global const float *gradOutput, global const float *images,
28: global float *gradWeights,
29: #ifdef BIASED
30: global float *gradBiasWeights,
31: #endif
32: local float *_errorStripe, local float *_imageStripe
33: ) {
34: // gHalfFilterSize
35: // gInputSize
36: //
37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
40: // if we just added gFilterSize)
41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
44: //
45: // gOutputStripeNumRows
46: // gOutputStripeSize
47:
48: const int globalId = get_global_id(0);
49: const int localId = get_local_id(0);
50: const int workgroupId = get_group_id(0);
51: const int workgroupSize = get_local_size(0);
52:
53: const int filterRow = localId / gFilterSize;
54: const int filterCol = localId % gFilterSize;
55:
56: const int outPlane = workgroupId / gInputPlanes;
57: const int upstreamPlane = workgroupId % gInputPlanes;
58:
59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
60: // aggregate over: [outRow][outCol][n]
61: float thiswchange = 0;
62: #ifdef BIASED
63: float thisbiaschange = 0;
64: #endif
65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
67: for (int n = 0; n < batchSize; n++) {
68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
75: // need to fetch the image, but it's bigger than us, so will need to loop...
76: barrier(CLK_LOCAL_MEM_FENCE);
77: for (int i = 0; i < numLoopsForImageStripe; i++) {
78: int thisOffset = i * workgroupSize + localId;
79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
80: bool process = thisOffset < gInputStripeOuterSize
81: && thisGlobalImagesOffset >= imageImageGlobalOffset
82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
83: if (process) {
84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
85: }
86: }
87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
89: int thisOffset = i * workgroupSize + localId;
90: int globalErrorsOffset = errorStripeOffset + thisOffset;
91: bool process = thisOffset < gOutputStripeSize
92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
93: if (process) {
94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
95: }
96: }
97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
99: barrier(CLK_LOCAL_MEM_FENCE);
100: // if (localId == 13) {
101: // for (int i = 0; i < 12; i++) {
102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
103: // }
104: // for (int i = 0; i < 20; i++) {
105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
106: // }
107: // }
108: if (localId < gFilterSizeSquared) {
109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
110: int upstreamRow = outRow - gMargin + filterRow;
111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
112: int upstreamCol = outCol - gMargin + filterCol;
113: bool proceed =
114: upstreamRow >= 0 && upstreamCol >= 0
115: && upstreamRow < gInputSize && upstreamCol < gInputSize
116: && outRow < gOutputSize;
117: if (proceed) {
118: int resultIndex = outRow * gOutputSize + outCol;
119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
122: - stripe * gInputStripeInnerSize ];
123: thiswchange += upstreamResult * error;
124: #ifdef BIASED
125: thisbiaschange += error;
126: #endif
127: }
128: }
129: }
130: }
131: }
132: }
133: if (localId < gFilterSizeSquared) {
134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
136: }
137: #ifdef BIASED
138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
139: if (writeBias) {
140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
141: }
142: #endif
143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
144: // aggregate over: [outRow][outCol][n]
145: }
146:
147:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/BackpropWeightsScratchLarge.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=96 -D gInputSizeSquared=9216 -D gNumFilters=1 -D gFilterSize=6 -D gHalfFilterSize=3 -D gFilterSizeSquared=36 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=91 -D gOutputSizeSquared=8281 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=512 -DgInputStripeMarginRows=5 -DgInputStripeInnerNumRows=0 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=0 -DgInputStripeOuterSize=960 -DgInputStripeMarginSize=480 -DgOutputStripeNumRows=1 -DgOutputStripeSize=91"
" thrown in the test body.
[ FAILED ] testupdateweights.backprop_instance3_smaller2 (63 ms)
[----------] 23 tests from testupdateweights (1443 ms total)
[----------] 17 tests from testforward
[ RUN ] testforward.imagesize2_nopadzeros
expected number of output: 4
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=2 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=2 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=2 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.imagesize2_nopadzeros (75 ms)
[ RUN ] testforward.imagesize2_padzeros
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=2 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=1 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=2 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=1 -D gSkip=0 -DgWorkgroupSize=32"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=2 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=1 -D gMargin=1 -D gEven=1 -D gSkip=0 -DgWorkgroupSize=32"
" thrown in the test body.
[ FAILED ] testforward.imagesize2_padzeros (49 ms)
[ RUN ] testforward.imagesize3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
" thrown in the test body.
[ FAILED ] testforward.imagesize3 (91 ms)
[ RUN ] testforward.test2
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.test2 (102 ms)
[ RUN ] testforward.test3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
" thrown in the test body.
[ FAILED ] testforward.test3 (50 ms)
[ RUN ] testforward.compare_0_1_biased_nopad
LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=15 -D gOutputSizeSquared=225 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=15 -D gOutputSizeSquared=225 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=15 -D gOutputSizeSquared=225 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.compare_0_1_biased_nopad (101 ms)
[ RUN ] testforward.compare_0_1_biased_pad
LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=19 -D gOutputSizeSquared=361 -D gPadZeros=1 -D gMargin=2 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=19 -D gOutputSizeSquared=361 -D gPadZeros=1 -D gMargin=2 -D gEven=0 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=19 -D gOutputSizeSquared=361 -D gPadZeros=1 -D gMargin=2 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.compare_0_1_biased_pad (47 ms)
[ RUN ] testforward.compare_1_n_biased_nopad
instance: 2
LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=15 -D gOutputSizeSquared=225 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=15 -D gOutputSizeSquared=225 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=15 -D gOutputSizeSquared=225 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.compare_1_n_biased_nopad (61 ms)
[ RUN ] testforward.compare_1_n_biased_pad
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
instance: 2
LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=19 -D gOutputSizeSquared=361 -D gPadZeros=1 -D gMargin=2 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=19 -D gOutputSizeSquared=361 -D gPadZeros=1 -D gMargin=2 -D gEven=0 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=19 -D gOutputSizeSquared=361 -D gPadZeros=1 -D gMargin=2 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.compare_1_n_biased_pad (145 ms)
[ RUN ] testforward.compare_1_5_biased_nopad
LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=19 outputSize=1 padZeros=0 biased=1 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=19 -D gHalfFilterSize=9 -D gFilterSizeSquared=361 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=19 -D gHalfFilterSize=9 -D gFilterSizeSquared=361 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=8 -D gInputPlanes=8 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=8 -D gFilterSize=19 -D gHalfFilterSize=9 -D gFilterSizeSquared=361 -D gNumOutputPlanes=8 -D gOutputPlanes=8 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.compare_1_5_biased_nopad (50 ms)
[ RUN ] testforward.compare_1_4_fcscenario
LayerDimensions{ inputPlanes=10 inputSize=24 numFilters=10 filterSize=24 outputSize=1 padZeros=0 biased=1 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=24 -D gInputSizeSquared=576 -D gNumFilters=10 -D gFilterSize=24 -D gHalfFilterSize=12 -D gFilterSizeSquared=576 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=24 -D gInputSizeSquared=576 -D gNumFilters=10 -D gFilterSize=24 -D gHalfFilterSize=12 -D gFilterSizeSquared=576 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=24 -D gInputSizeSquared=576 -D gNumFilters=10 -D gFilterSize=24 -D gHalfFilterSize=12 -D gFilterSizeSquared=576 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.compare_1_4_fcscenario (59 ms)
[ RUN ] testforward.compare_break1_0_1
LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=33 -D gInputSizeSquared=1089 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=33 -D gOutputSizeSquared=1089 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=33 -D gInputSizeSquared=1089 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=33 -D gOutputSizeSquared=1089 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=33 -D gInputSizeSquared=1089 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=33 -D gOutputSizeSquared=1089 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.compare_break1_0_1 (101 ms)
[ RUN ] testforward.compare_break1_0_4
LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=545 -D gPixelsPerThread=2 -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=33 -D gInputSizeSquared=1089 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=33 -D gOutputSizeSquared=1089 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=545 -D gPixelsPerThread=2 -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=33 -D gInputSizeSquared=1089 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=33 -D gOutputSizeSquared=1089 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=545 -D gPixelsPerThread=2 -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=33 -D gInputSizeSquared=1089 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=33 -D gOutputSizeSquared=1089 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.compare_break1_0_4 (53 ms)
[ RUN ] testforward.comparespecific_break2
LayerDimensions{ inputPlanes=64 inputSize=19 numFilters=64 filterSize=19 outputSize=1 padZeros=0 biased=0 skip=0}
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=64 -D gInputPlanes=64 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=64 -D gFilterSize=19 -D gHalfFilterSize=9 -D gFilterSizeSquared=361 -D gNumOutputPlanes=64 -D gOutputPlanes=64 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=64 -D gInputPlanes=64 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=64 -D gFilterSize=19 -D gHalfFilterSize=9 -D gFilterSizeSquared=361 -D gNumOutputPlanes=64 -D gOutputPlanes=64 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=64 -D gInputPlanes=64 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=64 -D gFilterSize=19 -D gHalfFilterSize=9 -D gFilterSizeSquared=361 -D gNumOutputPlanes=64 -D gOutputPlanes=64 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.comparespecific_break2 (138 ms)
[ RUN ] testforward.softmax
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
output[0]=0.0320586
output[1]=0.0871443
output[2]=0.643914
output[3]=0.236883
loss 0.44019
loss 3.44019
loss 2.44019
loss 1.44019
[ OK ] testforward.softmax (25 ms)
[ RUN ] testforward.softmax_byplane
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
output[0]=0.0320586
output[1]=0.0871443
output[2]=0.643914
output[3]=0.236883
loss 0.44019
loss 3.44019
loss 2.44019
loss 1.44019
[ OK ] testforward.softmax_byplane (17 ms)
[ RUN ] testforward.crash_from_jm
-D gNumInputPlanes=32 -D gInputPlanes=32 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=20 -D gFilterSize=28 -D gHalfFilterSize=14 -D gFilterSizeSquared=784 -D gNumOutputPlanes=20 -D gOutputPlanes=20 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=32 -D gInputPlanes=32 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=20 -D gFilterSize=28 -D gHalfFilterSize=14 -D gFilterSizeSquared=784 -D gNumOutputPlanes=20 -D gOutputPlanes=20 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=32 -D gInputPlanes=32 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=20 -D gFilterSize=28 -D gHalfFilterSize=14 -D gFilterSizeSquared=784 -D gNumOutputPlanes=20 -D gOutputPlanes=20 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=32 -D gInputPlanes=32 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=20 -D gFilterSize=28 -D gHalfFilterSize=14 -D gFilterSizeSquared=784 -D gNumOutputPlanes=20 -D gOutputPlanes=20 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0"
" thrown in the test body.
[ FAILED ] testforward.crash_from_jm (157 ms)
[----------] 17 tests from testforward (1322 ms total)
[----------] 2 tests from testfilehelper
[ RUN ] testfilehelper.testfilehelper
[ OK ] testfilehelper.testfilehelper (19 ms)
[ RUN ] testfilehelper.testreadchunk
[ OK ] testfilehelper.testreadchunk (11 ms)
[----------] 2 tests from testfilehelper (30 ms total)
[----------] 12 tests from testsimpleconvolvenet
[ RUN ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh (77 ms)
[ RUN ] testsimpleconvolvenet.imagesize1_planes2_filters2_tanh
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize1_planes2_filters2_tanh (77 ms)
[ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_tanh
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D TANH"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize3_n4_filtersize3_tanh (78 ms)
[ RUN ] testsimpleconvolvenet.imagesize1_2planes_filtersize1
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 1: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 2
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
ForwardAuto: kernel 2: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
... not valid
forward try kernel 3
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 3: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 4
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 4: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 5
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: kernel void reduce_segments(const int numSegments, const int segmentLength,
8: global float const *in, global float* out) {
9: const int globalId = get_global_id(0);
10: const int segmentId = globalId;
11:
12: if (segmentId >= numSegments) {
13: return;
14: }
15:
16: float sum = 0;
17: global const float *segment = in + segmentId * segmentLength;
18: for (int i = 0; i < segmentLength; i++) {
19: sum += segment[i];
20: }
21: out[segmentId] = sum;
22: }
23:
24:
25:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
ForwardAuto: kernel 5: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: kernel void reduce_segments(const int numSegments, const int segmentLength,
8: global float const *in, global float* out) {
9: const int globalId = get_global_id(0);
10: const int segmentId = globalId;
11:
12: if (segmentId >= numSegments) {
13: return;
14: }
15:
16: float sum = 0;
17: global const float *segment = in + segmentId * segmentLength;
18: for (int i = 0; i < segmentLength; i++) {
19: sum += segment[i];
20: }
21: out[segmentId] = sum;
22: }
23:
24:
25:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
... not valid
forward try kernel 6
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 6: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 7
... seems valid
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 1
12: //#define gFilterSize 1
13: //#define gSize 1
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 1;
25: index /= 1;
26: int h_out = index % 1;
27: int channel_in = index / 1;
28: int channel_out = channel_in * 1 * 1;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 1 + h_out) * 1 + w_out;
32: data_im += (channel_in * 1 + h_in) * 1 + w_in;
33: for (int i = 0; i < 1; ++i) {
34: for (int j = 0; j < 1; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 1 && w < 1) ?
38: data_im[i * 1 + j] : 0;
39: data_col += 1 * 1;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 1 + 0;
54: int h = (index / 1) % 1 + 0;
55: int c = index / (1 * 1);
56: // compute the start and end of the output
57: int w_col_start = (w < 1) ? 0 : (w - 1) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 1);
59: int h_col_start = (h < 1) ? 0 : (h - 1) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 1);
61:
62: int offset = (c * 1 * 1 + h * 1 + w) * 1 * 1;
63: int coeff_h_col = (1 - 1 * 1 * 1) * 1;
64: int coeff_w_col = (1 - 1 * 1 * 1);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
ForwardAuto: kernel 7 this instance cant be used:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 1
12: //#define gFilterSize 1
13: //#define gSize 1
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 1;
25: index /= 1;
26: int h_out = index % 1;
27: int channel_in = index / 1;
28: int channel_out = channel_in * 1 * 1;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 1 + h_out) * 1 + w_out;
32: data_im += (channel_in * 1 + h_in) * 1 + w_in;
33: for (int i = 0; i < 1; ++i) {
34: for (int j = 0; j < 1; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 1 && w < 1) ?
38: data_im[i * 1 + j] : 0;
39: data_col += 1 * 1;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 1 + 0;
54: int h = (index / 1) % 1 + 0;
55: int c = index / (1 * 1);
56: // compute the start and end of the output
57: int w_col_start = (w < 1) ? 0 : (w - 1) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 1);
59: int h_col_start = (h < 1) ? 0 : (h - 1) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 1);
61:
62: int offset = (c * 1 * 1 + h * 1 + w) * 1 * 1;
63: int coeff_h_col = (1 - 1 * 1 * 1) * 1;
64: int coeff_w_col = (1 - 1 * 1 * 1);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
forward kernel 0: cannot be used
forward kernel 1: cannot be used
forward kernel 2: cannot be used
forward kernel 3: cannot be used
forward kernel 4: cannot be used
forward kernel 5: cannot be used
forward kernel 6: cannot be used
forward kernel 7: cannot be used
clblas teardown
unknown file: Failure
C++ exception with description "No valid forward implementations found" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize1_2planes_filtersize1 (186 ms)
[ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_relu
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize3_n4_filtersize3_relu (70 ms)
[ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_linear
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 1: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 2
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
ForwardAuto: kernel 2: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
... not valid
forward try kernel 3
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 3: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 4
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 4: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 5
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: kernel void reduce_segments(const int numSegments, const int segmentLength,
8: global float const *in, global float* out) {
9: const int globalId = get_global_id(0);
10: const int segmentId = globalId;
11:
12: if (segmentId >= numSegments) {
13: return;
14: }
15:
16: float sum = 0;
17: global const float *segment = in + segmentId * segmentLength;
18: for (int i = 0; i < segmentLength; i++) {
19: sum += segment[i];
20: }
21: out[segmentId] = sum;
22: }
23:
24:
25:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
ForwardAuto: kernel 5: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: kernel void reduce_segments(const int numSegments, const int segmentLength,
8: global float const *in, global float* out) {
9: const int globalId = get_global_id(0);
10: const int segmentId = globalId;
11:
12: if (segmentId >= numSegments) {
13: return;
14: }
15:
16: float sum = 0;
17: global const float *segment = in + segmentId * segmentLength;
18: for (int i = 0; i < segmentLength; i++) {
19: sum += segment[i];
20: }
21: out[segmentId] = sum;
22: }
23:
24:
25:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
... not valid
forward try kernel 6
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 6: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 7
... seems valid
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 1
12: //#define gFilterSize 3
13: //#define gSize 3
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 1;
25: index /= 1;
26: int h_out = index % 1;
27: int channel_in = index / 1;
28: int channel_out = channel_in * 3 * 3;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 1 + h_out) * 1 + w_out;
32: data_im += (channel_in * 3 + h_in) * 3 + w_in;
33: for (int i = 0; i < 3; ++i) {
34: for (int j = 0; j < 3; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 3 && w < 3) ?
38: data_im[i * 3 + j] : 0;
39: data_col += 1 * 1;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 3 + 0;
54: int h = (index / 3) % 3 + 0;
55: int c = index / (3 * 3);
56: // compute the start and end of the output
57: int w_col_start = (w < 3) ? 0 : (w - 3) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 1);
59: int h_col_start = (h < 3) ? 0 : (h - 3) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 1);
61:
62: int offset = (c * 3 * 3 + h * 3 + w) * 1 * 1;
63: int coeff_h_col = (1 - 1 * 3 * 1) * 1;
64: int coeff_w_col = (1 - 1 * 1 * 1);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
ForwardAuto: kernel 7 this instance cant be used:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 1
12: //#define gFilterSize 3
13: //#define gSize 3
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 1;
25: index /= 1;
26: int h_out = index % 1;
27: int channel_in = index / 1;
28: int channel_out = channel_in * 3 * 3;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 1 + h_out) * 1 + w_out;
32: data_im += (channel_in * 3 + h_in) * 3 + w_in;
33: for (int i = 0; i < 3; ++i) {
34: for (int j = 0; j < 3; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 3 && w < 3) ?
38: data_im[i * 3 + j] : 0;
39: data_col += 1 * 1;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 3 + 0;
54: int h = (index / 3) % 3 + 0;
55: int c = index / (3 * 3);
56: // compute the start and end of the output
57: int w_col_start = (w < 3) ? 0 : (w - 3) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 1);
59: int h_col_start = (h < 3) ? 0 : (h - 3) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 1);
61:
62: int offset = (c * 3 * 3 + h * 3 + w) * 1 * 1;
63: int coeff_h_col = (1 - 1 * 3 * 1) * 1;
64: int coeff_w_col = (1 - 1 * 1 * 1);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
forward kernel 0: cannot be used
forward kernel 1: cannot be used
forward kernel 2: cannot be used
forward kernel 3: cannot be used
forward kernel 4: cannot be used
forward kernel 5: cannot be used
forward kernel 6: cannot be used
forward kernel 7: cannot be used
clblas teardown
unknown file: Failure
C++ exception with description "No valid forward implementations found" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize3_n4_filtersize3_linear (190 ms)
[ RUN ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased (79 ms)
[ RUN ] testsimpleconvolvenet.imagesize1_n2_2layers_biased
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize1_n2_2layers_biased (83 ms)
[ RUN ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=4 -DgOutputSizeSquared=16 -DgInputSize=4 -DgInputSizeSquared=16 -DgNumPlanes=3 -D RELU"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=4 -DgOutputSizeSquared=16 -DgInputSize=4 -DgInputSizeSquared=16 -DgNumPlanes=3 -D RELU"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=4 -DgOutputSizeSquared=16 -DgInputSize=4 -DgInputSizeSquared=16 -DgNumPlanes=3 -D RELU"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3 (76 ms)
[ RUN ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=4 -DgOutputSizeSquared=16 -DgInputSize=4 -DgInputSizeSquared=16 -DgNumPlanes=3 -D RELU"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=4 -DgOutputSizeSquared=16 -DgInputSize=4 -DgInputSizeSquared=16 -DgNumPlanes=3 -D RELU"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=4 -DgOutputSizeSquared=16 -DgInputSize=4 -DgInputSizeSquared=16 -DgNumPlanes=3 -D RELU"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6 (84 ms)
[ RUN ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n6
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=3 -D RELU"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=3 -D RELU"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=3 -D RELU"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n6 (75 ms)
[ RUN ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=3 -D RELU"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=3 -D RELU"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=3 -DgOutputSizeSquared=9 -DgInputSize=3 -DgInputSizeSquared=9 -DgNumPlanes=3 -D RELU"
" thrown in the test body.
[ FAILED ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18 (86 ms)
[----------] 12 tests from testsimpleconvolvenet (1163 ms total)
[----------] 3 tests from testlogicaloperators
[ RUN ] testlogicaloperators.Convolve_1layer_biased_And
And
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 1: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 2
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
ForwardAuto: kernel 2: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
... not valid
forward try kernel 3
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 3: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 4
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 4: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 5
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: kernel void reduce_segments(const int numSegments, const int segmentLength,
8: global float const *in, global float* out) {
9: const int globalId = get_global_id(0);
10: const int segmentId = globalId;
11:
12: if (segmentId >= numSegments) {
13: return;
14: }
15:
16: float sum = 0;
17: global const float *segment = in + segmentId * segmentLength;
18: for (int i = 0; i < segmentLength; i++) {
19: sum += segment[i];
20: }
21: out[segmentId] = sum;
22: }
23:
24:
25:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
ForwardAuto: kernel 5: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: kernel void reduce_segments(const int numSegments, const int segmentLength,
8: global float const *in, global float* out) {
9: const int globalId = get_global_id(0);
10: const int segmentId = globalId;
11:
12: if (segmentId >= numSegments) {
13: return;
14: }
15:
16: float sum = 0;
17: global const float *segment = in + segmentId * segmentLength;
18: for (int i = 0; i < segmentLength; i++) {
19: sum += segment[i];
20: }
21: out[segmentId] = sum;
22: }
23:
24:
25:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
... not valid
forward try kernel 6
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 6: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 7
... seems valid
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 1
12: //#define gFilterSize 1
13: //#define gSize 1
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 1;
25: index /= 1;
26: int h_out = index % 1;
27: int channel_in = index / 1;
28: int channel_out = channel_in * 1 * 1;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 1 + h_out) * 1 + w_out;
32: data_im += (channel_in * 1 + h_in) * 1 + w_in;
33: for (int i = 0; i < 1; ++i) {
34: for (int j = 0; j < 1; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 1 && w < 1) ?
38: data_im[i * 1 + j] : 0;
39: data_col += 1 * 1;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 1 + 0;
54: int h = (index / 1) % 1 + 0;
55: int c = index / (1 * 1);
56: // compute the start and end of the output
57: int w_col_start = (w < 1) ? 0 : (w - 1) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 1);
59: int h_col_start = (h < 1) ? 0 : (h - 1) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 1);
61:
62: int offset = (c * 1 * 1 + h * 1 + w) * 1 * 1;
63: int coeff_h_col = (1 - 1 * 1 * 1) * 1;
64: int coeff_w_col = (1 - 1 * 1 * 1);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
ForwardAuto: kernel 7 this instance cant be used:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 1
12: //#define gFilterSize 1
13: //#define gSize 1
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 1;
25: index /= 1;
26: int h_out = index % 1;
27: int channel_in = index / 1;
28: int channel_out = channel_in * 1 * 1;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 1 + h_out) * 1 + w_out;
32: data_im += (channel_in * 1 + h_in) * 1 + w_in;
33: for (int i = 0; i < 1; ++i) {
34: for (int j = 0; j < 1; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 1 && w < 1) ?
38: data_im[i * 1 + j] : 0;
39: data_col += 1 * 1;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 1 + 0;
54: int h = (index / 1) % 1 + 0;
55: int c = index / (1 * 1);
56: // compute the start and end of the output
57: int w_col_start = (w < 1) ? 0 : (w - 1) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 1);
59: int h_col_start = (h < 1) ? 0 : (h - 1) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 1);
61:
62: int offset = (c * 1 * 1 + h * 1 + w) * 1 * 1;
63: int coeff_h_col = (1 - 1 * 1 * 1) * 1;
64: int coeff_w_col = (1 - 1 * 1 * 1);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
forward kernel 0: cannot be used
forward kernel 1: cannot be used
forward kernel 2: cannot be used
forward kernel 3: cannot be used
forward kernel 4: cannot be used
forward kernel 5: cannot be used
forward kernel 6: cannot be used
forward kernel 7: cannot be used
clblas teardown
unknown file: Failure
C++ exception with description "No valid forward implementations found" thrown in the test body.
[ FAILED ] testlogicaloperators.Convolve_1layer_biased_And (182 ms)
[ RUN ] testlogicaloperators.Convolve_1layerbiased_Or
Or, convolve
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 1: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 2
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
ForwardAuto: kernel 2: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
... not valid
forward try kernel 3
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 3: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 4
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 4: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 5
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: kernel void reduce_segments(const int numSegments, const int segmentLength,
8: global float const *in, global float* out) {
9: const int globalId = get_global_id(0);
10: const int segmentId = globalId;
11:
12: if (segmentId >= numSegments) {
13: return;
14: }
15:
16: float sum = 0;
17: global const float *segment = in + segmentId * segmentLength;
18: for (int i = 0; i < segmentLength; i++) {
19: sum += segment[i];
20: }
21: out[segmentId] = sum;
22: }
23:
24:
25:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
ForwardAuto: kernel 5: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: kernel void reduce_segments(const int numSegments, const int segmentLength,
8: global float const *in, global float* out) {
9: const int globalId = get_global_id(0);
10: const int segmentId = globalId;
11:
12: if (segmentId >= numSegments) {
13: return;
14: }
15:
16: float sum = 0;
17: global const float *segment = in + segmentId * segmentLength;
18: for (int i = 0; i < segmentLength; i++) {
19: sum += segment[i];
20: }
21: out[segmentId] = sum;
22: }
23:
24:
25:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/reduce_segments.cl build log:
(8:0) : error : invalid global address space qualifier specified for parameter type
(8:0) : error : syntax error at 'const'
... not valid
forward try kernel 6
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 6: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept:
8: // - load same input plane from each image
9: // - hold filter plane for this input plane, for all filters
10: // - reduce afterwards
11: // local memory for one plane from each filter of 64c7 = 64 * 7 * 7 * 4 = 12.5KB
12: // local memory for one single input plane = 19 * 19 * 4 = 1.4KB
13: // => seems ok?
14: // workgroupid: [inputPlaneId]
15: // localid: [filterId][outRow] (if this is more than workgroupsize, we should reuse some threads...)
16: // iterate over: [n][outCol]
17: // output: [n][filterId][outRow][outCol][inputPlane]
18: // need to later reduce output over: [inputPlane]
19: void kernel forward_byinputplane(const int batchSize,
20: global const float *images, global const float *filters,
21: global float *output,
22: local float *_inputPlane, local float *_filterPlanes) {
23: // const int evenPadding = gFilterSize % 2 == 0 ? 1 : 0;
24:
25: const int globalId = get_global_id(0);
26: const int workgroupId = get_group_id(0);
27: const int workgroupSize = get_local_size(0);
28: const int localId = get_local_id(0);
29:
30: const int inputPlaneId = workgroupId;
31: const int numLoops = (gNumFilters * gOutputSize + workgroupSize - 1) / workgroupSize;
32: const int numFilterCopyLoops = (gFilterSizeSquared + gOutputSize - 1) / gOutputSize;
33: const int numImageCopyLoops = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
34: for (int loop = 0; loop < numLoops; loop++) {
35: const int loopLocalId = localId + loop * workgroupSize;
36: const int filterId = loopLocalId / gOutputSize;
37: const int outRow = loopLocalId % gOutputSize;
38:
39: // copy down our filter, we have gOutputSize threads to do this
40: global float const *globalFilterPlane = filters +
41: (filterId * gNumInputPlanes + inputPlaneId) * gFilterSizeSquared;
42: local float *_localFilterPlane = _filterPlanes + filterId * gFilterSizeSquared;
43: barrier(CLK_LOCAL_MEM_FENCE);
44: for (int i = 0; i < numFilterCopyLoops; i++) {
45: const int offset = i * gOutputSize + outRow;
46: bool process = filterId < gNumFilters && offset < gFilterSizeSquared;
47: if (process) {
48: _localFilterPlane[ offset ] = globalFilterPlane[ offset ];
49: }
50: }
51: // loop over n ...
52: for (int n = 0; n < batchSize; n++) {
53: // copy down our imageplane, we have workgroupSize threads to do this
54: barrier(CLK_LOCAL_MEM_FENCE);
55: global float const *globalImagePlane = images +
56: (n * gNumInputPlanes + inputPlaneId) * gInputSizeSquared;
57: for (int i = 0; i< numImageCopyLoops; i++) {
58: const int offset = i * workgroupSize + localId;
59: if (offset < gInputSizeSquared) {
60: _inputPlane[ offset ] = globalImagePlane[ offset ];
61: }
62: }
63: barrier(CLK_LOCAL_MEM_FENCE);
64: // calc output for each [outrow][outcol]
65: bool filterPlaneOk = filterId < gNumFilters;
66: for (int outCol = 0; outCol < gOutputSize; outCol++) {
67: float sum = 0;
68: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
69: int inRow = outRow + filterRow;
70: #if gPadZeros == 1
71: inRow -= gHalfFilterSize;
72: #endif
73: bool rowOk = filterPlaneOk && inRow >= 0 && inRow < gInputSize;
74: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
75: int inCol = outCol + filterCol;
76: #if gPadZeros == 1
77: inCol -= gHalfFilterSize;
78: #endif
79: bool process = rowOk && inCol >= 0 && inCol < gInputSize;
80: if (process) {
81: float imageValue = _inputPlane[ inRow * gInputSize + inCol ];
82: float filterValue = _localFilterPlane[ filterRow * gFilterSize + filterCol ];
83: sum += imageValue * filterValue;
84: }
85: }
86: }
87: if (filterId < gNumFilters) {
88: // [n][filterId][outRow][outCol][inputPlane]
89: int resultIndex = (( (n
90: * gNumFilters + filterId)
91: * gOutputSize + outRow)
92: * gOutputSize + outCol)
93: * gNumInputPlanes + inputPlaneId;
94: output[resultIndex] = sum;
95: //if (globalId == 2) output[0] = resultIndex;
96: // output[resultIndex] = outRow;
97: }
98: // output[localId] = _localFilterPlane[localId];
99: }
100: }
101: }
102: }
103:
104:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward_byinputplane.cl build log:
error : syntax error in compiler option string " -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 7
... seems valid
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
kernel build error:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 1
12: //#define gFilterSize 1
13: //#define gSize 1
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 1;
25: index /= 1;
26: int h_out = index % 1;
27: int channel_in = index / 1;
28: int channel_out = channel_in * 1 * 1;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 1 + h_out) * 1 + w_out;
32: data_im += (channel_in * 1 + h_in) * 1 + w_in;
33: for (int i = 0; i < 1; ++i) {
34: for (int j = 0; j < 1; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 1 && w < 1) ?
38: data_im[i * 1 + j] : 0;
39: data_col += 1 * 1;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 1 + 0;
54: int h = (index / 1) % 1 + 0;
55: int c = index / (1 * 1);
56: // compute the start and end of the output
57: int w_col_start = (w < 1) ? 0 : (w - 1) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 1);
59: int h_col_start = (h < 1) ? 0 : (h - 1) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 1);
61:
62: int offset = (c * 1 * 1 + h * 1 + w) * 1 * 1;
63: int coeff_h_col = (1 - 1 * 1 * 1) * 1;
64: int coeff_w_col = (1 - 1 * 1 * 1);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
ForwardAuto: kernel 7 this instance cant be used:
kernel source:
1: // from SpatialConvolutionMM.cu:
2:
3: // CL: grid stride looping
4: #define CL_KERNEL_LOOP(i, n) \
5: for (int i = get_group_id(0) * get_local_size(0) + get_local_id(0); \
6: i < (n); \
7: i += get_local_size(0) * get_num_groups(0))
8:
9: //#define gPadding 0
10: //#define gStride 1
11: //#define gColSize 1
12: //#define gFilterSize 1
13: //#define gSize 1
14:
15: // Kernel for fast unfold+copy
16: // (adapted from Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)
17: kernel void im2col(
18: const int n,
19: global float const * im_data, int im_offset,
20: global float* data_col) {
21: global const float *data_im = im_data + im_offset;
22:
23: CL_KERNEL_LOOP(index, n) {
24: int w_out = index % 1;
25: index /= 1;
26: int h_out = index % 1;
27: int channel_in = index / 1;
28: int channel_out = channel_in * 1 * 1;
29: int h_in = h_out * 1 - 0;
30: int w_in = w_out * 1 - 0;
31: data_col += (channel_out * 1 + h_out) * 1 + w_out;
32: data_im += (channel_in * 1 + h_in) * 1 + w_in;
33: for (int i = 0; i < 1; ++i) {
34: for (int j = 0; j < 1; ++j) {
35: int h = h_in + i;
36: int w = w_in + j;
37: *data_col = (h >= 0 && w >= 0 && h < 1 && w < 1) ?
38: data_im[i * 1 + j] : 0;
39: data_col += 1 * 1;
40: }
41: }
42: }
43: }
44:
45: kernel void col2im(
46: const int n,
47: global float const *data_col,
48: global float* im_data, int im_offset) {
49: global float *data_im = im_data + im_offset;
50:
51: for (int index = get_group_id(0) * get_local_size(0) + get_local_id(0); index < (n); index += get_local_size(0) * get_num_groups(0)) {
52: float val = 0;
53: int w = index % 1 + 0;
54: int h = (index / 1) % 1 + 0;
55: int c = index / (1 * 1);
56: // compute the start and end of the output
57: int w_col_start = (w < 1) ? 0 : (w - 1) / 1 + 1;
58: int w_col_end = min(w / 1 + 1, 1);
59: int h_col_start = (h < 1) ? 0 : (h - 1) / 1 + 1;
60: int h_col_end = min(h / 1 + 1, 1);
61:
62: int offset = (c * 1 * 1 + h * 1 + w) * 1 * 1;
63: int coeff_h_col = (1 - 1 * 1 * 1) * 1;
64: int coeff_w_col = (1 - 1 * 1 * 1);
65: for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
66: for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
67: val += data_col[offset + h_col * coeff_h_col + w_col * coeff_w_col];
68: }
69: }
70: data_im[index] = val;
71: }
72: }
73:
74:
Something went wrong with clCreateKernel, OpenCL erorr code -45
ForwardIm2Col.cl build log:
(19:0) : error : invalid global address space qualifier specified for parameter type
(19:0) : error : syntax error at 'const'
forward kernel 0: cannot be used
forward kernel 1: cannot be used
forward kernel 2: cannot be used
forward kernel 3: cannot be used
forward kernel 4: cannot be used
forward kernel 5: cannot be used
forward kernel 6: cannot be used
forward kernel 7: cannot be used
clblas teardown
unknown file: Failure
C++ exception with description "No valid forward implementations found" thrown in the test body.
[ FAILED ] testlogicaloperators.Convolve_1layerbiased_Or (193 ms)
[ RUN ] testlogicaloperators.Convolve_2layers_relu_Xor
Xor, convolve
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
clblas teardown
unknown file: Failure
C++ exception with description "
kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9:
10: #ifdef TANH
11: #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13: #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15: #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19: #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21: #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23:
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26: const int globalId = get_global_id(0);
27: if (globalId >= N) {
28: return;
29: }
30: inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33:
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36: const int globalId = get_global_id(0);
37: if (globalId >= N) {
38: return;
39: }
40: out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43:
44:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/activate.cl build log:
error : syntax error in compiler option string " -DgOutputSize=1 -DgOutputSizeSquared=1 -DgInputSize=1 -DgInputSizeSquared=1 -DgNumPlanes=2 -D RELU"
" thrown in the test body.
[ FAILED ] testlogicaloperators.Convolve_2layers_relu_Xor (85 ms)
[----------] 3 tests from testlogicaloperators (460 ms total)
[----------] 12 tests from testbackward
[ RUN ] testbackward.squareloss
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
layer 2:SquareLossLayer{}
inputtotalsize=2400 outputTotalSize=2400
layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
layer 2:SquareLossLayer{}
Parameters overview: (skipping 3 layers with 0 params)
TOTAL : params=0
idx=44 predicted losschange=-0.000912508 actual=-0.000976562
idx=2245 predicted losschange=0.00785823 actual=0.00805664
idx=648 predicted losschange=0.00965759 actual=0.00976562
idx=586 predicted losschange=0.0136895 actual=0.0136719
idx=730 predicted losschange=0.00117897 actual=0.00146484
idx=611 predicted losschange=0.00152302 actual=0.00195312
idx=1130 predicted losschange=0.0159167 actual=0.0161133
idx=15 predicted losschange=0.0434798 actual=0.0439453
idx=1923 predicted losschange=-0.00790002 actual=-0.0078125
idx=670 predicted losschange=0.0335141 actual=0.0336914
[ OK ] testbackward.squareloss (64 ms)
[ RUN ] testbackward.crossentropyloss
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
layer 2:Layer{}
inputtotalsize=300 outputTotalSize=300
layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
layer 2:Layer{}
Parameters overview: (skipping 3 layers with 0 params)
TOTAL : params=0
idx=44 predicted losschange=0.000274935 actual=0.000274658
idx=145 predicted losschange=-0.000885784 actual=-0.00088501
idx=48 predicted losschange=-0.000859834 actual=-0.000854492
idx=286 predicted losschange=0.00713042 actual=0.00717163
idx=130 predicted losschange=-0.000264829 actual=-0.000244141
idx=11 predicted losschange=-1.98163e-05 actual=0
idx=230 predicted losschange=-0.000594819 actual=-0.000610352
idx=15 predicted losschange=-0.0006499 actual=-0.000640869
idx=123 predicted losschange=-0.000846121 actual=-0.000823975
idx=70 predicted losschange=0.000790196 actual=0.000793457
[ OK ] testbackward.crossentropyloss (53 ms)
[ RUN ] testbackward.softmaxloss
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
inputtotalsize=10 outputTotalSize=10
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
Parameters overview: (skipping 3 layers with 0 params)
TOTAL : params=0
idx=4 predicted losschange=0.000113075 actual=0.00011301
idx=5 predicted losschange=0.000145627 actual=0.000145674
idx=8 predicted losschange=3.16699e-05 actual=3.19481e-05
idx=6 predicted losschange=4.89271e-06 actual=5.24521e-06
idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
idx=1 predicted losschange=-8.26119e-05 actual=-8.27312e-05
idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
idx=5 predicted losschange=0.000145627 actual=0.000145674
idx=3 predicted losschange=-5.50179e-05 actual=-5.50747e-05
idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
[ OK ] testbackward.softmaxloss (50 ms)
[ RUN ] testbackward.squareloss2
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:SquareLossLayer{}
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:SquareLossLayer{}
batchSize: 32
inputtotalsize=160 outputTotalSize=160
layer SquareLossLayer{}
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:SquareLossLayer{}
Parameters overview: (skipping 3 layers with 0 params)
TOTAL : params=0
idx=44 predicted losschange=0.000126406 actual=0.000125885
idx=5 predicted losschange=0.00461891 actual=0.00464439
idx=8 predicted losschange=0.000356787 actual=0.000356674
idx=106 predicted losschange=0.00716324 actual=0.00719643
idx=90 predicted losschange=0.000474759 actual=0.000480652
idx=131 predicted losschange=0.000979017 actual=0.000984192
idx=10 predicted losschange=0.000660134 actual=0.000663757
idx=15 predicted losschange=0.00961313 actual=0.00965118
idx=3 predicted losschange=0.00264732 actual=0.00267029
idx=30 predicted losschange=0.00865312 actual=0.00868607
[ OK ] testbackward.squareloss2 (60 ms)
[ RUN ] testbackward.crossentropy2
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:Layer{}
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:Layer{}
batchSize: 2
inputtotalsize=10 outputTotalSize=10
layer Layer{}
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:Layer{}
Parameters overview: (skipping 3 layers with 0 params)
TOTAL : params=0
idx=4 predicted losschange=0.00258649 actual=nan
idx=5 predicted losschange=0.0227095 actual=nan
idx=8 predicted losschange=-0.00202714 actual=nan
idx=6 predicted losschange=-0.000846508 actual=nan
idx=0 predicted losschange=-0.000424821 actual=nan
idx=1 predicted losschange=-0.00171216 actual=nan
idx=0 predicted losschange=-0.000424821 actual=nan
idx=5 predicted losschange=0.0227095 actual=nan
idx=3 predicted losschange=0.0123444 actual=nan
idx=0 predicted losschange=-0.000424821 actual=nan
[ OK ] testbackward.crossentropy2 (21 ms)
[ RUN ] testbackward.softmax2
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
batchSize: 2
inputtotalsize=10 outputTotalSize=10
layer SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
Parameters overview: (skipping 3 layers with 0 params)
TOTAL : params=0
idx=4 predicted losschange=0.00035729 actual=0.000357628
idx=5 predicted losschange=0.0015055 actual=0.00151086
idx=8 predicted losschange=-5.63632e-05 actual=-5.65052e-05
idx=6 predicted losschange=-1.48864e-05 actual=-1.4782e-05
idx=0 predicted losschange=1.96542e-05 actual=1.95503e-05
idx=1 predicted losschange=-0.000287167 actual=-0.000287056
idx=0 predicted losschange=1.96542e-05 actual=1.95503e-05
idx=5 predicted losschange=0.0015055 actual=0.00151086
idx=3 predicted losschange=-0.000152824 actual=-0.00014782
idx=0 predicted losschange=1.96542e-05 actual=1.95503e-05
[ OK ] testbackward.softmax2 (20 ms)
[ RUN ] testbackward.conv1
Couldnt find OpenCL-enabled GPU: No OpenCL-enabled GPUs found
Trying for OpenCL-enabled CPU
Using Vivante Corporation , OpenCL platform: Vivante OpenCL Platform
Using OpenCL device: Vivante OpenCL Device
initializing clblas
layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
layer 3:SquareLossLayer{}
layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
layer 3:SquareLossLayer{}
batchSize: 4
inputtotalsize=128 outputTotalSize=32
layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
forward try kernel 0
... not plausibly optimal, skipping
forward try kernel 1
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 1: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // notes on non-odd filtersizes:
8: // for odd, imagesize and filtersize 3, padZeros = 0:
9: // output is a single square
10: // m and n should vary between -1,0,1
11: // for even, imagesize and filtersize 2, padzeros = 0
12: // output is a single square, which we can position at topleft or bottomrigth
13: // lets position it in bottomright
14: // then m and n should vary as -1,0
15: //
16: // for even, imagesize and filtersize 2, padzeros = 1
17: // output is 2 by 2
18: // well... if it is even:
19: // - if we are not padding zeros, then we simply move our filter around the image somehow
20: // - if we are padding zeros, then we conceptually pad the bottom and right edge of the image with zeros by 1
21: // filtersize remains the same
22: // m will vary as -1,0,1
23: // outputrow is fixed by globalid
24: // inputrow should be unchanged...
25: // padzeros = 0:
26: // x x . . . .
27: // x x . . x x
28: // . . . . x x
29: // when filtersize even:
30: // new imagesize = oldimagesize - filtersize + 1
31: // when filtersize odd:
32: // x x x .
33: // x x x .
34: // x x x .
35: // . . . .
36: // new imagesize = oldimagesize - filtersize + 1
37: // padzeros = 1:
38: // x x
39: // x x . . x x . . . . . . .
40: // . . . x x . . x x . . .
41: // . . . . . . . x x . . x x
42: // outrow=0 outrow=1 outrow=2 x x
43: // outcol=0 outcol=1 outcol=2 outrow=3
44: // outcol=3
45: // when filtersize is even, and padzeros, imagesize grows by 1 each time...
46: // imagesize = oldimagesize + 1
47: // when filtersize is odd
48: // x x x
49: // x x x . x x x . . .
50: // x x x . x x x . x x x
51: // . . . x x x . x x x
52: // x x x
53:
54: // images are organized like [imageId][plane][row][col]
55: // filters are organized like [filterid][inplane][filterrow][filtercol]
56: // output are organized like [imageid][filterid][row][col]
57: // global id is organized like output, ie: [imageid][outplane][outrow][outcol]
58: // - no local memory used currently
59: // - each thread:
60: // - loads a whole upstream cube
61: // - loads a whole filter cube
62: // - writes one output...
63: void kernel convolve_imagecubes_float2(
64: const int numExamples,
65: global const float *inputs, global const float *filters,
66: global float *output) {
67: int globalId = get_global_id(0);
68:
69: int outputImage2Id = globalId / gOutputSizeSquared;
70: int exampleId = outputImage2Id / gNumFilters;
71: int filterId = outputImage2Id % gNumFilters;
72:
73: // intraimage coords
74: int localid = globalId % gOutputSizeSquared;
75: int outputRow = localid / gOutputSize;
76: int outputCol = localid % gOutputSize;
77:
78: global float const*inputCube = inputs + exampleId * gNumInputPlanes * gInputSizeSquared;
79: global float const*filterCube = filters + filterId * gNumInputPlanes * gFilterSizeSquared;
80:
81: float sum = 0;
82: if (exampleId < numExamples) {
83: for (int inputPlaneIdx = 0; inputPlaneIdx < gNumInputPlanes; inputPlaneIdx++) {
84: global float const*inputPlane = inputCube + inputPlaneIdx * gInputSizeSquared;
85: global float const*filterPlane = filterCube + inputPlaneIdx * gFilterSizeSquared;
86: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
87: // trying to reduce register pressure...
88: #if gPadZeros == 1
89: #define inputRowIdx (outputRow + u)
90: #else
91: #define inputRowIdx (outputRow + u + gHalfFilterSize)
92: #endif
93: global float const *inputRow = inputPlane + inputRowIdx * gInputSize;
94: global float const *filterRow = filterPlane + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
95: bool rowOk = inputRowIdx >= 0 && inputRowIdx < gInputSize;
96: #pragma unroll
97: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
98: #if gPadZeros == 1
99: #define inputColIdx (outputCol + v)
100: #else
101: #define inputColIdx (outputCol + v + gHalfFilterSize)
102: #endif
103: bool process = rowOk && inputColIdx >= 0 && inputColIdx < gInputSize;
104: if (process) {
105: sum += inputRow[inputColIdx] * filterRow[v];
106: }
107: }
108: }
109: }
110: }
111:
112: if (exampleId < numExamples) {
113: output[globalId] = sum;
114: }
115: }
116:
117:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward1.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 2
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
ForwardAuto: kernel 2: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, const int N) {
8: int numLoops = (N + gWorkgroupSize - 1) / gWorkgroupSize;
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * gWorkgroupSize + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [outplane]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [imageid][upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [imageid][filterid][row][col]
26: // assumes filter is small, so filtersize * filterSize * inputPlanes * 4 < about 3KB
27: // eg 5 * 5 * 32 * 4 = 3.2KB => ok :-)
28: // but 28 * 28 * 32 * 4 = 100KB => less good :-P
29: void kernel forward_2_by_outplane(
30: const int batchSize,
31: global const float *images, global const float *filters,
32: global float *output,
33: local float *_inputPlane, local float *_filterCube) {
34: const int globalId = get_global_id(0);
35:
36: const int workgroupId = get_group_id(0);
37: const int workgroupSize = get_local_size(0);
38: const int outPlane = workgroupId;
39:
40: const int localId = get_local_id(0);
41: const int outputRow = localId / gOutputSize;
42: const int outputCol = localId % gOutputSize;
43:
44: #if gPadZeros == 1
45: const int minu = max(-gHalfFilterSize, -outputRow);
46: const int maxu = min(gHalfFilterSize, gOutputSize - 1 - outputRow) - gEven;
47: const int minv = max(-gHalfFilterSize, -outputCol);
48: const int maxv = min(gHalfFilterSize, gOutputSize - 1 - outputCol) - gEven;
49: #else
50: const int minu = -gHalfFilterSize;
51: const int maxu = gHalfFilterSize - gEven;
52: const int minv = -gHalfFilterSize;
53: const int maxv = gHalfFilterSize - gEven;
54: #endif
55:
56: {
57: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
58: copyLocal(_filterCube,
59: filters + outPlane * filterCubeLength,
60: filterCubeLength);
61: }
62: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
63:
64: for (int n = 0; n < batchSize; n++) {
65: float sum = 0;
66: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
67: barrier(CLK_LOCAL_MEM_FENCE);
68: copyLocal(_inputPlane,
69: images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared,
70: gInputSizeSquared);
71: barrier(CLK_LOCAL_MEM_FENCE);
72: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
73: if (localId < gOutputSizeSquared) {
74: for (int u = minu; u <= maxu; u++) {
75: int inputRow = outputRow + u;
76: #if gPadZeros == 0
77: inputRow += gHalfFilterSize;
78: #endif
79: int inputimagerowoffset = inputRow * gInputSize;
80: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
81: for (int v = minv; v <= maxv; v++) {
82: int inputCol = outputCol + v;
83: #if gPadZeros == 0
84: inputCol += gHalfFilterSize;
85: #endif
86: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
87: }
88: }
89: }
90: }
91: // output are organized like [imageid][filterid][row][col]
92: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
93: if (localId < gOutputSizeSquared) {
94: output[resultIndex ] = sum;
95: }
96: }
97: }
98: #endif
99:
100:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward2.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgWorkgroupSize=32"
... not valid
forward try kernel 3
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 3: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: // concept: each workgroup handles convolving one input example with one filtercube
8: // and writing out one single output plane
9: //
10: // workgroup id organized like: [imageid][outplane]
11: // local id organized like: [outrow][outcol]
12: // each thread iterates over: [upstreamplane][filterrow][filtercol]
13: // number workgroups = 32
14: // one filter plane takes up 5 * 5 * 4 = 100 bytes
15: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
16: // all filter cubes = 3.2KB * 32 = 102KB (too big)
17: // output are organized like [imageid][filterid][row][col]
18: void kernel forward_3_by_n_outplane(const int batchSize,
19: global const float *images, global const float *filters,
20: global float *output,
21: local float *_upstreamImage, local float *_filterCube) {
22: const int globalId = get_global_id(0);
23:
24: const int workgroupId = get_group_id(0);
25: const int workgroupSize = get_local_size(0);
26: const int n = workgroupId / gNumFilters;
27: const int outPlane = workgroupId % gNumFilters;
28:
29: const int localId = get_local_id(0);
30: const int outputRow = localId / gOutputSize;
31: const int outputCol = localId % gOutputSize;
32:
33: const int minu = gPadZeros ? max(-gHalfFilterSize, -outputRow) : -gHalfFilterSize;
34: const int maxu = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputRow - gEven) : gHalfFilterSize - gEven;
35: const int minv = gPadZeros ? max(-gHalfFilterSize, -outputCol) : - gHalfFilterSize;
36: const int maxv = gPadZeros ? min(gHalfFilterSize - gEven, gOutputSize - 1 - outputCol - gEven) : gHalfFilterSize - gEven;
37:
38: const int numUpstreamsPerThread = (gInputSizeSquared + workgroupSize - 1) / workgroupSize;
39:
40: const int filterCubeLength = gInputPlanes * gFilterSizeSquared;
41: const int filterCubeGlobalOffset = outPlane * filterCubeLength;
42: const int numPixelsPerThread = (filterCubeLength + workgroupSize - 1) / workgroupSize;
43: for (int i = 0; i < numPixelsPerThread; i++) {
44: int thisOffset = localId + i * workgroupSize;
45: if (thisOffset < filterCubeLength) {
46: _filterCube[thisOffset] = filters[filterCubeGlobalOffset + thisOffset];
47: }
48: }
49: // dont need a barrier, since we'll just run behind the barrier from the upstream image download
50:
51: float sum = 0;
52: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
53: int thisUpstreamImageOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
54: barrier(CLK_LOCAL_MEM_FENCE);
55: for (int i = 0; i < numUpstreamsPerThread; i++) {
56: int thisOffset = workgroupSize * i + localId;
57: if (thisOffset < gInputSizeSquared) {
58: _upstreamImage[ thisOffset ] = images[ thisUpstreamImageOffset + thisOffset ];
59: }
60: }
61: barrier(CLK_LOCAL_MEM_FENCE);
62: int filterImageOffset = upstreamPlane * gFilterSizeSquared;
63: for (int u = minu; u <= maxu; u++) {
64: int inputRow = outputRow + u;
65: #if gPadZeros == 0
66: inputRow += gHalfFilterSize;
67: #endif
68: int inputimagerowoffset = inputRow * gInputSize;
69: int filterrowoffset = filterImageOffset + (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
70: for (int v = minv; v <= maxv; v++) {
71: int inputCol = outputCol + v;
72: #if gPadZeros == 0
73: inputCol += gHalfFilterSize;
74: #endif
75: if (localId < gOutputSizeSquared) {
76: sum += _upstreamImage[ inputimagerowoffset + inputCol] * _filterCube[ filterrowoffset + v ];
77: }
78: }
79: }
80: }
81:
82: // output are organized like [imageid][filterid][row][col]
83: int resultIndex = (n * gNumFilters + outPlane) * gOutputSizeSquared + localId;
84: if (localId < gOutputSizeSquared) {
85: output[resultIndex ] = sum;
86: }
87: }
88:
89:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward3.cl build log:
error : syntax error in compiler option string " -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
... not valid
forward try kernel 4
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
kernel build error:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamPlane < gInputPlanes; upstreamPlane++) {
73: barrier(CLK_LOCAL_MEM_FENCE);
74: copyLocal(_inputPlane, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
75: copyLocal(_filterPlane, filters + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
76: barrier(CLK_LOCAL_MEM_FENCE);
77:
78: if (effectiveLocalId < gOutputSizeSquared) {
79: for (int u = -gHalfFilterSize; u <= gHalfFilterSize - gEven; u++) {
80: // trying to reduce register pressure...
81: #if gPadZeros == 1
82: #define inputRow (outputRow + u)
83: #else
84: #define inputRow (outputRow + u + gHalfFilterSize)
85: #endif
86: int inputimagerowoffset = inputRow * gInputSize;
87: int filterrowoffset = (u+gHalfFilterSize) * gFilterSize + gHalfFilterSize;
88: bool rowOk = inputRow >= 0 && inputRow < gInputSize;
89: for (int v = -gHalfFilterSize; v <= gHalfFilterSize - gEven; v++) {
90: #if gPadZeros == 1
91: #define inputCol (outputCol + v)
92: #else
93: #define inputCol (outputCol + v + gHalfFilterSize)
94: #endif
95: bool process = rowOk && inputCol >= 0 && inputCol < gInputSize;
96: if (process) {
97: sum += _inputPlane[ inputimagerowoffset + inputCol] * _filterPlane[ filterrowoffset + v ];
98: }
99: }
100: }
101: }
102: }
103: // output are organized like [imageid][filterid][row][col]
104: #define resultIndex (( n * gNumFilters + outPlane) * gOutputSizeSquared + effectiveLocalId)
105: if (effectiveLocalId < gOutputSizeSquared) {
106: output[resultIndex ] = sum;
107: }
108: }
109: #endif
110:
111:
Something went wrong with clCreateKernel, OpenCL erorr code -45
cl/forward4.cl build log:
error : syntax error in compiler option string " -D gWorkgroupSize=32 -D gPixelsPerThread=1 -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0"
ForwardAuto: kernel 4: this instance cant be used:
kernel source:
1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6:
7: void copyLocal(local float *target, global float const *source, int N) {
8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
9: for (int loop = 0; loop < numLoops; loop++) {
10: int offset = loop * get_local_size(0) + get_local_id(0);
11: if (offset < N) {
12: target[offset] = source[offset];
13: }
14: }
15: }
16:
17: #ifdef gOutputSize // for previous tests that dont define it
18: // workgroup id organized like: [n][filterid]
19: // local id organized like: [outrow][outcol]
20: // each thread iterates over: [upstreamplane][filterrow][filtercol]
21: // number workgroups = 32
22: // one filter plane takes up 5 * 5 * 4 = 100 bytes
23: // one filter cube (corresponding to one outplane) = 5*5 * 32 * 4 = 3.2KB (ok)
24: // all filter cubes = 3.2KB * 32 = 102KB (too big)
25: // output are organized like [n][filterid][outrow][outcol]
26: // the pixels per thread thing... :
27: // - we have one thread (~= cuda core) per output value,
28: // ie one thread for each combination of [outrow][outcol]
29: // - however, the number of threads is typically limited on a gpu,
30: // eg to 512 (eg Intel HD), or 1024 (eg nVidia K520)
31: // - so what happens if the number of output points is larger than
32: // the maximum workgroup size?
33: // - then we have several possibilities really:
34: // - we can divide the image into blocks, and process each block
35: // separately. This is probably a good option, but fair amount of
36: // work
37: // - we can get each thread to handle more than one output
38: // pixel, by looping
39: // - we can consider the output image in 1d, by putting the rows
40: // one after another, and assign each contiguous workgroup-size
41: // block to one workgroup
42: // => this is how this kernel works
43: // basically, it's a hack, so larger images actually run, without
44: // crashing, and we can probably improve it a lot :-)
45: //
46: // So, when outputSize * outputSize > workgroupSize, then
47: // multiple workgroups will be created for each output plane
48: // the number of such workgroups is given by: `gPixelsPerThread`
49: // the id of our workgroup within such a set of workgroups is calculated
50: // as `pixel`
51: // effectiveLocalId is our local id if we had one enormous workgroup
52: // containing the whole output image plane
53: void kernel forward_4_by_n_outplane_smallercache(const int batchSize,
54: global const float *images, global const float *filters,
55: global float *output,
56: local float *_inputPlane, local float *_filterPlane) {
57: #define globalId (get_global_id(0))
58:
59: #define localId (get_local_id(0))
60: #define workgroupId (get_group_id(0))
61: // const int workgroupSize = get_local_size(0);
62: const int effectiveWorkgroupId = workgroupId / gPixelsPerThread;
63: const int pixel = workgroupId % gPixelsPerThread;
64: const int effectiveLocalId = localId + pixel * gWorkgroupSize;
65: const int n = effectiveWorkgroupId / gNumFilters;
66: const int outPlane = effectiveWorkgroupId % gNumFilters;
67:
68: const int outputRow = effectiveLocalId / gOutputSize;
69: const int outputCol = effectiveLocalId % gOutputSize;
70:
71: float sum = 0;
72: for (int upstreamPlane = 0; upstreamP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment