Skip to content

Instantly share code, notes, and snippets.

@rygorous
Created October 1, 2013 03:50
Embed
What would you like to do?
This is gonna be fun...
Platform found : Advanced Micro Devices, Inc.
Device 0: Kalindi
Build: DEBUG (**NOTE: this is compiled as "release" and definitely no -O3, not sure where this comes from)
Pixel format: CL_RGBA / CL_UNSIGNED_INT32
Image width: 1024
Image height: 1024
Image origin: 0 0
Image region: 1024 1024
Image size in bytes: 16777216
Region size in bytes: 16777216
Global work items X: 1024
Global work items Y: 8
Total work items: 8192
Pixels per thread: 128
Local work items X: 16
Local work items Y: 8
Number of groups: 64
Timing loops: 10
Repeats: 1
Kernel loops: 100
Kernel launches: 1
inputImage: CL_MEM_READ_ONLY
outputImage: CL_MEM_WRITE_ONLY
Host baseline (single thread, naive):
Timer resolution 257 ns
Page fault 4228 ns
CPU read 4.07 GB/s
memcpy() 4.11 GB/s
memset(,1,) 4.42 GB/s
memset(,0,) 4.40 GB/s
AVERAGES (over loops 2 - 9, use -l for complete log)
--------
1. Host mapped write to inputImage
clEnqueueMapImage(WRITE): 0.007238 s [ 2.32 GB/s ]
memset(): 0.010836 s 1.55 GB/s
clEnqueueUnmapMemObject(): 0.005763 s [ 2.91 GB/s ]
2. GPU kernel read of inputImage
clEnqueueNDRangeKernel(): 0.154124 s 10.89 GB/s
verification ok
3. GPU kernel write to outputImage
clEnqueueNDRangeKernel(): 0.203785 s 8.23 GB/s
4. Host mapped read of outputImage
clEnqueueMapImage(READ): 0.006493 s [ 2.58 GB/s ]
CPU read: 0.011416 s 1.47 GB/s
verification ok
clEnqueueUnmapMemObject(): 0.000049 s [ 342.67 GB/s ]
Passed!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment