ingo-m/debian_install_nvidia_docker.md

## debian_install_nvidia_docker.md

      
    Raw
  

              debian_install_nvidia_docker.md
            
          
    Install nvidia docker for GPU-enabled tensorflow on Debian 10

Overview


Install proprietary nvidia driver
Install nvidia-container-toolkit, containing CUDA (?) (via apt-get)
Install tensorflow docker container

(1) Install proprietary nvidia driver

Instructions from https://linuxconfig.org/how-to-install-nvidia-driver-on-debian-10-buster-linux
Preparations:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install nvidia-detect
Check recommended driver:
nvidia-detect
Possible output (depending on your graphics card model):
Detected NVIDIA GPUs:
05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] [10de:1c03] (rev a1)

Checking card:  NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
Your card is supported by the default drivers and legacy driver series 390.
It is recommended to install the
    nvidia-driver
package.

Download the recommended nvidia source package from https://www.nvidia.com/en-us/drivers/unix/
There seem to be two options:

The latest compatible driver, which at time of writing was version 430.50 (filename NVIDIA-Linux-x86_64-430.50.run)
The legacy driver, version 390.129 (filename: NVIDIA-Linux-x86_64-390.129.run)

The legacy driver (i.e. version 390.129) is not compatible with CUDA 10, so the newer version (i.e. version 430.50) is
needed. Save the respective file at an accessible location (will need to navigate to the folder using the CLI later).
Install linux-headers (may already be installed anyway):
sudo apt-get install linux-headers-$(uname -r) build-essential
Disable the default nouveau driver:
echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nvidia-nouveau.conf
Some more preparations (perhaps unnecessary?):
sudo apt-get install module-assistant
sudo m-a prepare
sudo update-initramfs -u
Source: https://unix.stackexchange.com/a/424603
Reboot to multi-user runlevel. This will disable the GUI after reboot:
sudo systemctl set-default multi-user.target
sudo systemctl reboot
Here is some more background information on the previous step:

For systemd, the concept of runlevels is replaced by the term “targets”.
There is a “mapping” between the init runlevels and systemd targets:

multi-user.target: analogous to runlevel 3, Text mode
graphical.target: analogous to runlevel 5, GUI mode with X server
Source: https://www.systutorials.com/239880/change-systemd-boot-target-linux/


Login as root user. cd to the directory containing the installation file
(NVIDIA-Linux-x86_64-390.129.run) and install the nvidia driver by running:
bash NVIDIA-Linux-x86_64-390.116.run
During the installation you may be asked the following set of questions:

Register kernel module sources with DKSM?

--> Yes

(The CC version check failed:
The kernel was built with gcc version 8.2.0 (Debian 8.2.0-14), but the current compiler version is cc (Debian 8.3.0-2) 8.3.0.
This may lead to subtle problems; if you are not certain whether the mismatched compiler will be compatible with your kernel, you may wish to abort installation, set the CC environment variable to the name of the compiler used to compile your kernel, and restart installation.

--> Ignore CC version check

Install NVIDIA's 32-bit compatibility libraries?

--> Yes

An incomplete installation of libglvnd was found. Do you want to install a full copy of libglvnd? This will overwrite any existing libglvnd libraries.

--> Install and overwrite existing filesort installation.

Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X?  Any pre-existing X configuration file will be backed up.

--> Yes
Reboot the system, back into GUI mode:
systemctl set-default graphical.target
systemctl reboot
(2) Install nvidia-container-toolkit

Instructions from: https://github.com/NVIDIA/nvidia-docker
Add the package repositories:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
Install the toolkit:
sudo apt-get update
sudo apt-get install nvidia-container-toolkit nvidia-docker2
sudo systemctl restart docker
Reboot system.
Install nvidia cuda docker container:
docker run --runtime=nvidia --rm nvidia/cuda:10.1-devel nvidia-smi
(3) Install tensorflow docker container

Instructions from: https://www.tensorflow.org/install/docker
Pull docker image:
docker pull tensorflow/tensorflow:2.3.1-gpu
Test docker images:
docker run -it --rm tensorflow/tensorflow:2.3.1-gpu bash
Sources:

https://github.com/tensorflow/docs/pull/979/commits/5f768cc3f06d31b59260405d8a7b37b08abb2905
https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements
NVIDIA/nvidia-docker#1056
https://www.tensorflow.org/install/docker