kuanghan/athena_cuda.md

## athena_cuda.md

      
    Raw
  

              athena_cuda.md
            
          
    Installing NVIDIA Driver & CUDA inside an LXC container running Ubuntu 16.04 on a neuroscience computing server.

Introduction: I was trying to run some neuroscience image processing commands that uses NVIDIA GPU. The challenge is that most of our computation will be run inside an LXC container running Ubuntu 16.04 (the host runs Ubuntu 16.04 as well). Installing the NVIDIA driver on the host is not so hard, but doing it inside the LXC container is much more challenging.
I already have an unprivileged container running, so I will not repeat the steps to create an LXC container here.
Our graphics card is NVIDIA GeForce GTX 1080 Ti.
Here are the main steps:

Install NVIDIA driver on the host
Install NVIDIA driver in the container. The driver version in the container has to be exactly the same as the one on the host.
Install CUDA & other GPU-related libraries in the container.

I found this page
https://blog.nelsonliu.me/2017/04/29/installing-and-updating-gtx-1080-ti-cuda-drivers-on-ubuntu/
which mostly followed the instructions on this page:
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#verify-you-have-supported-version-of-linux
(see Section 4, Runfile Installation).
And here is what I did (mostly following steps in Section 4.2 from the link above, but I'm listing the steps that I actually did below):

Install gcc and other essential packages on the host:

sudo apt install build-essential software-properties-common


Download the CUDA Toolkit run file cuda_9.0.176_384.81_linux-run
Following the instructions in Section 4.3.5 to blacklist nouveau driver
Reboot athena into text-only mode: found this page https://askubuntu.com/questions/870221/booting-into-text-mode-in-16-04/870226 (From here on, I use the KVM environment.)
Running into this error:

huangk04@athena:~/Downloads$ sudo sh cuda_9.0.176_384.81_linux-run
[sudo] password for huangk04:
Sorry, user huangk04 is not allowed to execute '/bin/sh cuda_9.0.176_384.81_linux-run' as root on athena.mssm.edu.


Got past that error by typing sudo su and then it runs.
The root partition /dev/mapper/vg01-lv.root is too small for CUDA; will install CUDA in /data/cuda-9.0 and make symbolic links to /usr/local/cuda-9.0; also install CUDA samples at /data/cuda-9.0/samples/; also need to specify temp file directory because of disk space issue as well; so here is the command I executed:

sh cuda_9.0.176_384.81_linux-run --tmpdir=/data/tmp

(Mmm... I probably did not need to install CUDA Toolkit on the host...)

Add the graphics driver PPA (verified that the driver version 384.98 is supported on Ubuntu 16.04), update, and then install driver version 384:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-384


Reboot, and then (still in host) type nvidia-smi to confirm that the driver version is indeed 384.98.

To set up GPU driver in the container, Prantik forwarded this tutorial to me: https://medium.com/@MARatsimbazafy/journey-to-deep-learning-nvidia-gpu-passthrough-to-lxc-container-97d0bc474957

On the host, edit the file /etc/modules-load.d/modules.conf and add the following lines (not sure if this is necessary):

nvidia
nvidia_uvm


Update initramfs:

sudo update-initramfs -u


Set the login runlevel back to graphical.target (and another reboot is required):

sudo systemctl set-default graphical.target


Edit the file /home/huangk04/.local/share/lxc/athena_box/config and add the following lines to it:

# GPU Passthrough config
lxc.cgroup.devices.allow = c 195:* rwm
lxc.cgroup.devices.allow = c 243:* rwm
lxc.mount.entry = /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry = /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry = /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry = /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

However, passing the GPU device in the LXC config file led to the following error when I tried to start the container:
huangk04@athena:~$ lxc-start -n athena_box -d
lxc-start: tools/lxc_start.c: main: 366 The container failed to sta
rt.
lxc-start: tools/lxc_start.c: main: 368 To get more details, run the container in foreground mode.
lxc-start: tools/lxc_start.c: main: 370 Additional information can be obtained by setting the --logfile and --logpriority options.

I find that I can't start my container if I specify anything in the config file that tries to modify the cgroup settings, like trying to get access to the /dev/nvidia* devices on the host.
It looks like a cgroups issue with LXC on Ubuntu 16.04 (maybe somehow related to systemd, but I don't really understand what that means). What's more confusing is that Ubuntu has a package cgmanager that manages cgroups (by being a wrapper to send calls to dbus?), but when I tried to install it by typing
sudo apt update
sudo apt install cgmanager

it showed that I installed the version 0.39-2ubuntu5. But the version of cgm I got is
huangk04@athena:~$ cgm --version
0.29

Seems like a bug in cgmanager. Anyway, I found some instructions (e.g., https://www.berthon.eu/2015/lxc-unprivileged-containers-on-ubuntu-14-04-lts/) that I could move all processes in my current shell to a specific cgroup with access to the devices I that need, and then I may be able to start the container. So here is what I tried:
sudo cgm create all $USER
sudo cgm chown all $USER $(id -u) $(id -g)
sudo cgm movepid all $USER $$

In fact, the second and third commands actually threw me errors. But these commands did have an effect on what I see in /proc/self/cgroup. Before these commands, it looks like this:
huangk04@athena:~$ cat /proc/self/cgroup
11:cpuset:/
10:net_cls,net_prio:/
9:cpu,cpuacct:/user.slice
8:perf_event:/
7:memory:/user/huangk04/0
6:devices:/user.slice
5:freezer:/user/huangk04/0
4:hugetlb:/
3:blkio:/user.slice
2:pids:/user.slice/user-10354.slice
1:name=systemd:/user.slice/user-10354.slice/session-6.scope

and after the three commands above (probably the first one only need to be run once), I see
huangk04@athena:~$ cat /proc/self/cgroup
11:cpuset:/huangk04
10:net_cls,net_prio:/huangk04
9:cpu,cpuacct:/user.slice/huangk04
8:perf_event:/huangk04
7:memory:/user/huangk04/0
6:devices:/user.slice/huangk04
5:freezer:/user/huangk04/0
4:hugetlb:/huangk04
3:blkio:/user.slice/huangk04
2:pids:/user.slice/user-10354.slice/huangk04
1:name=systemd:/user.slice/user-10354.slice/session-6.scope

and now the container starts. I suspect that it is a bug in cgmanager that's throwing me errors even though the commands worked, which could be related to the incoherent version numbers I see when viewing them in different ways. Also, the sudo cgm chown and sudo cgm movepid are not persistent, meaning that I need to run these commands in the future if I need to restart the container (in a different shell, most likely).

Install NVIDIA driver in container as well (so we have the nvidia-smi command in the container): First, download the driver runfile NVIDIA-Linux-x86_64-384.98.run (again, the version in the container must match the version on the host, 384.98). Then do the following (courtesy of this website: https://qiita.com/yanoshi/items/75b0fc6b65df49fc2263)

cd ~/Downloads  # or wherever the runfile is in
chmod a+x NVIDIA-Linux-x86_64-384.98.run
sudo ./NVIDIA-Linux-x86_64-384.98.run --no-kernel-module

And then follow the prompts to install the driver. After that, I can see the GPU info in the container by typing nvidia-smi:
root@xenial:/usr/local/cuda-9.0/samples/1_Utilities/deviceQuery# nvidia-smi
Tue Nov 21 02:35:05 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.98                 Driver Version: 384.98                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   38C    P0    59W / 250W |      0MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


Install CUDA in the container, following the same steps from steps 2 to 7.
Run a CUDA test: go to the directory /usr/local/cuda-9.0/samples/1_Utilities/deviceQuery, type make to compile an executable, and run the executable ./deviceQuery, which produced the following output:

root@xenial:/usr/local/cuda-9.0/samples/1_Utilities/deviceQuery# ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11172 MBytes (11715084288 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1582 MHz (1.58 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS


It is best to remove the graphics driver PPA so that future apt update won't update the driver to a newer but incompatible version:

sudo add-apt-repository --remove ppa:graphics-drivers/ppa

Should do this both on the host and in the container.
To be added: run more tests using the GPU
BTW...

Here is the page where one can download the CUDA- and openMP-enabled versions of eddy from FSL, in case I forget:
https://fsl.fmrib.ox.ac.uk/fsldownloads/patches/eddy-patch-fsl-5.0.9/centos6/
Also, here is a nice instruction on how to install CUDA 7.5 on Ubuntu 16.04:
http://www.xgerrmann.com/uncategorized/building-cuda-7-5-on-ubuntu-16-04/
In the end, eddy_cuda7.5 still doesn't run properly inside the container. I wonder if it's because I installed CUDA 9.0 before installing CUDA 7.5, even though I created an environment module file for CUDA 7.5 and loaded it before running eddy_cuda7.5 (and eddy_cuda7.5 seems to be able to find the correct libraries). I'll need to experiment more with this later.