patrickmmartin/OPENCL_NOTES.md

## OPENCL_NOTES.md

      
    Raw
  

              OPENCL_NOTES.md
            
          
    Notes on OpenCL

Implementations

See references at
https://www.khronos.org/opencl/
Overview

The laptop seems to be kicking the desktop's arse for the basic properties bandwidth and clock speeds, though there are only 24 units on the laptop, versus the graphics card's ~1000.
Resources

http://cpuboss.com/ -> vers good for basic facts including opencl throughput
Setup

pyopencl
https://gist.github.com/patrickmmartin/e1313dde7b908e8d009f2a13c3cd164b
tricks


rename the .icd files for when there are broken drivers to avoid annoyances
sudo updatedb and locate are amazing

example - clean-ish set of .icd after Nvidia install and beignet


problem with nvidia .icd ? *

$ locate .icd
*/etc/OpenCL/vendors/intel-beignet.icd*
*/etc/OpenCL/vendors/nvidia.icd*
/home/patrick/src/C/beignet/intel-beignet.icd.in
/home/patrick/src/C/beignet/build/intel-beignet.icd

$ cat `locate .icd`
*/usr/local/lib/beignet//libcl.so*
*libnvidia-opencl.so.1*
@BEIGNET_INSTALL_DIR@/libcl.so
/usr/local/lib/beignet//libcl.so

$ cat `locate .icd`| xargs -n1 ls -larth
*-rw-r--r-- 1 root root 1.8M May 23 00:29 /usr/local/lib/beignet//libcl.so*
*ls: cannot access 'libnvidia-opencl.so.1': No such file or directory*
ls: cannot access '@BEIGNET_INSTALL_DIR@/libcl.so': No such file or directory
-rw-r--r-- 1 root root 1.8M May 23 00:29 /usr/local/lib/beignet//libcl.so
 
 $ locate libnvidia-opencl
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.1
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.375.39
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.375.39


Hardware support results

Comparisons
Dell XPS 13


i7500U -> OpenCL capable
Intel(R) HD Graphics Kabylake ULT GT2Intersect -> OpenCL capable

Windows 10

opencl implementation bundled with the Windows drivers


Luxmark 3.1  passes all tests and registers the on-board GPU and CPU as rendering targets
pyopencl tests
NOT RUN


Luxmark 3.1  passes all tests and registers the on-board GPU and CPU as rendering targets

Linux (Ubuntu 16.04)


Needs an opencl implementation

opencl implementation:  https://cgit.freedesktop.org/beignet/tree/docs/Beignet.mdwn


installation was from source, but straightforward enough :P


Luxmark 3.1 passes only the ball tests and registers only the on-board GPU as rendering targets


pyopencl tests


$ python benchmark.py 
Execution time of test without OpenCL:  0.025046110153198242 s
===============================================================
Platform name: Intel Gen OCL Driver
Platform profile: FULL_PROFILE
Platform vendor: Intel
Platform version: OpenCL 1.2 beignet 1.4 (git-448f8f7)
---------------------------------------------------------------
Device name: Intel(R) HD Graphics Kabylake ULT GT2
Device type: GPU
Device memory:  3932 MB
Device max clock speed: 1000 MHz
Device compute units: 24
Device max work group size: 512
Device max work item sizes: [512, 512, 512]
Data points: 8388608
Workers: 256
Preferred work group size multiple: 16
Execution time of test: 0.00440888 s
Results OK


$ python dump-performance.py   
float32 add: 1828.97 GOps/s  
bandwidth @ 1073741824 bytes: 7.59742 GB/s  
DeviceToHostTransfer  
bandwidth @ 1073741824 bytes: 9.58943 GB/s  
DeviceToDeviceTransfer  
bandwidth @ 1073741824 bytes: 6.81554 GB/s  

Desktop


Core(TM)2 Quad CPU    Q8200  @ 2.33GHz <- NOT opencl capable
Nvida GT 730                           <- opencl capable ?

Linux (Ubuntu 16.04)

opencl implementation:  https://cgit.freedesktop.org/beignet/tree/docs/Beignet.mdwn

Does not appear to work? - utest_run
opencl implementation: nvida-340 ?

opencl implementation: nvida-375

sudo apt-get install nvidia-375


lots of dependencies


dependencies only install with the python 2 set via update-alternatives


still no joy from clinfo -> reboot

_errors were seen from the X server (vnc4server) resulting from the moved beignet files (whoops) _
X server is needed for access to openCL (yes?!), so getting X server working is first step
reboot and local login now seems to work
clinfo works
many examples work, like mandelbrot, particles
python demo_mandelbrot.py
python gl_particle_animation.py

fixed in cleaner set up of Ubuntu 17 ? *

Unfortunately we see a lot if this - some examples don't mind - others are blowing up
X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument  
Assuming 131072kB available aperture size.  
May lead to reduced performance or incorrect rendering.  
get chip id failed: -1 [22]  
param: 4, val: 0  


$ python benchmark.py 
Execution time of test without OpenCL:  0.119845867157 s
===============================================================
Platform name: NVIDIA CUDA
Platform profile: FULL_PROFILE
Platform vendor: NVIDIA Corporation
Platform version: OpenCL 1.2 CUDA 8.0.0
---------------------------------------------------------------
Device name: GeForce GT 730
Device type: GPU
Device memory:  979 MB
Device max clock speed: 901 MHz
Device compute units: 2
Device max work group size: 1024
Device max work item sizes: [1024, 1024, 64]
Data points: 8388608
Workers: 256
Preferred work group size multiple: 32
Execution time of test: 0.00897843 s
Results OK


$ python dump-performance.py  
4194304 20171356894.8 0 
float32 add: 10085.7 GOps/s  
HostToDeviceTransfer  
latency: 3.27519e-05 s  
bandwidth @ 268435456 bytes: 1.39221 GB/s  
DeviceToHostTransfer  
latency: 3.89906e-05 s  
bandwidth @ 268435456 bytes: 1.41215 GB/s  
DeviceToDeviceTransfer  
latency: 3.98391e-05 s  
bandwidth @ 268435456 bytes: 5.3896 GB/s  

but required this patch
--- a/examples/dump-performance.py  
+++ b/examples/dump-performance.py  
@@ -27,7 +27,7 @@ def main():  
   
         print("latency: %g s" % perf.transfer_latency(queue, tx_type))  
-        for i in range(6, 31, 2):  
+        for i in range(6, 30, 2):  
             bs = 1 << i  
             print("bandwidth @ %d bytes: %g GB/s" % (  
                     bs, perf.transfer_bandwidth(queue, tx_type, bs)/1e9))  


luxmark
did not work

logging in via ssh

This works, but possibly because there is a functioning X server waiting for log on
TODO

remove beignet, see if the pure NVidia driver will remove problem with LuxMark, etc.