Skip to content

Instantly share code, notes, and snippets.

@woachk
Created March 17, 2024 21:19
Show Gist options
  • Save woachk/3fa9916bb12667bb11320ad62bc6c81f to your computer and use it in GitHub Desktop.
Save woachk/3fa9916bb12667bb11320ad62bc6c81f to your computer and use it in GitHub Desktop.
clpeak on Adreno 690 (Snapdragon 8cx Gen 3)
Platform: QUALCOMM Snapdragon(TM)
Device: QUALCOMM Adreno(TM) 690
Driver version : OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.06.00 (Win64)
Compute units : 8
Clock frequency : 1 MHz
Global memory bandwidth (GBPS)
float : 51.47
float2 : 52.57
float4 : 57.94
float8 : 57.77
float16 : 26.09
Single-precision compute (GFLOPS)
float : 1943.67
float2 : 2313.79
float4 : 2214.62
float8 : 2582.35
float16 : 2525.76
Half-precision compute (GFLOPS)
half : 3120.89
half2 : 3322.86
half4 : 3963.98
half8 : 3764.43
half16 : 4021.50
No double precision support! Skipped
Integer compute (GIOPS)
int : 661.56
int2 : 586.14
int4 : 581.94
int8 : 528.93
int16 : 478.01
Integer compute Fast 24bit (GIOPS)
int : 1963.71
int2 : 1147.33
int4 : 1629.27
int8 : 1628.86
int16 : 1700.51
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 11.82
enqueueReadBuffer : 12.17
enqueueWriteBuffer non-blocking : 11.71
enqueueReadBuffer non-blocking : 12.39
enqueueMapBuffer(for read) : 364.66
memcpy from mapped ptr : 12.73
enqueueUnmap(after write) : 21220.19
memcpy to mapped ptr : 12.02
Kernel launch latency : -744550848.00 us
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment