soareschen/cuda-setup.md

## cuda-setup.md

      
    Raw
  

              cuda-setup.md
            
          
    This gist explains the steps required to install CUDA on Ubuntu 16.04 as well as enabling it inside LXD containers.
The setup assumes GTX 10 series hardware, tested with my GTX 1070.
Driver Installation

Download the latest Nvidia driver at http://www.nvidia.com/Download/index.aspx.
On 64-bit systems, install 32-bit OpenGL libaries first so that the driver will install
To install the driver, you have to close down the desktop GUI and install through terminal.
This is required for programs such as Steam to work.
host# apt-get install libgl1-mesa-dri:i386 libgl1-mesa-dri:i386
Open the terminal with Ctrl + Alt + F1 and run the following commands with root permission:
host# service lightdm stop
host# init 3
host# sh ./NVIDIA-Linux-x86_64-367.35.run
host# shutdown -r now
CUDA Host Installation

We will use CUDA 7.5 as there is driver conflict in CUDA 8 RC that causes blank screen after restart.
Download CUDA Toolkit at https://developer.nvidia.com/cuda-downloads. Choose Ubuntu 15.04 with runfile (local).
It is recommended to download the .run installer instead of .deb.
With that we can customize some options such as to not install the Nvidia driver again during CUDA installation.
Run the installer with the --override option, as our Ubuntu and GCC versions are not officially supported.
host# sh ./cuda_7.5.18_linux.run --override
Remember to choose no when asked whether to install the driver.
After installation, restart and verify that the desktop can still be loaded as usual.
To compile any code, we need to force CUDA to work with the latest GCC 5.4.
The official supported GCC for CUDA is only up to version 4.9.
Edit the file /usr/local/cuda-7.5/include/host_config.h, search for the following line and comment it out:
// before:
#error -- unsupported GNU version! gcc versions later than 4.9 are not supported!

// after:
// #error -- unsupported GNU version! gcc versions later than 4.9 are not supported!

Try compiling and run the CUDA example code to verify that CUDA is working properly.
host$ cd NVIDIA_CUDA-7.5_Samples/0_Simple/vectorAdd
host$ make
host$ ./vectorAdd
LXD CUDA Installation

Now once CUDA on the host is working. We can setup Ubuntu LXD container to install CUDA on it.
First setup LXD with this tutorial: http://insights.ubuntu.com/2016/03/14/the-lxd-2-0-story-prologue/
Make sure that the following 3 files exist on your system:
$ ls /dev/nvidia*
/dev/nvidia0
/dev/nvidiactl
/dev/nvidia-uvm
With my experience, /dev/nvidia-uvm is missing on Ubuntu 16.04 with the latest driver.
The workaround according to the CUDA installation guide is to execute the following command:
host# /sbin/modprobe nvidia-uvm
host# D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
host# mknod -m 666 /dev/nvidia-uvm c $D 0
Next initialize an LXD container with the Nvidia devices mounted onto the container.
host$ CONTAINER=ubuntu-cuda
host$ lxc init ubuntu: $CONTAINER
host$ lxc config device add $CONTAINER nvidia0 unix-char path=/dev/nvidia0
host$ lxc config device add $CONTAINER nvidiactl unix-char path=/dev/nvidiactl
host$ lxc config device add $CONTAINER nvidia-uvm unix-char path=/dev/nvidia-uvm
Also to make easy of file sharing, I usually mount a share directory to the container to access the installer and example files.
host$ lxc config set $CONTAINER security.privileged true
host$ lxc config device add $CONTAINER shareName disk source=/home/$USER/share path=/share
Exec into the container and install the driver first.
Note that we need to install the driver without the kernel module, as the module is installed on the host OS already.
host$ lxc start $CONTAINER
host$ lxc exec $CONTAINER bash
container# sh /share/NVIDIA-Linux-x86_64-367.35.run --no-kernel-module
After that install CUDA with the same steps as above. You should be able to access CUDA both on the host and container by now.