- Using CUDA makes sense for massively parallelizable code, like matrix multiplication.
- But copying data from host memory (RAM) to GPU is slow.
- MATLAB has many CUDA aware functions. For testin, you can use MATLAB on sunna (node 01 - 08 only, please).
- Generating random numbers and copying it back to host RAM was seen to be about 3 times faster while using CUDA.
- But "raw" CUDA C/C++ needs a lot of boilerplate code to be written. Hecke tells me that this is changing/has changed a lot in newer versions of CUDA.
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/nld/gcc-4.6.3/bin:$PATH
export LD_LIBRARY_PATH=/usr/nld/gcc-4.6.3/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/nld/gcc-4.6.3/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/nld/mpc-0.9/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/nld/mpfr-3.1.1/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/nld/gmp-5.0.5/lib:$LD_LIBRARY_PATH
dmanik@sunna02:~> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221