Skip to content

Instantly share code, notes, and snippets.

@ingo-m
Last active May 8, 2024 14:21
Show Gist options
  • Save ingo-m/0952a9d77dc39250b559cbbb91ca9dae to your computer and use it in GitHub Desktop.
Save ingo-m/0952a9d77dc39250b559cbbb91ca9dae to your computer and use it in GitHub Desktop.
Install nvidia docker for GPU-enabled tensorflow on Debian 10

Install nvidia docker for GPU-enabled tensorflow on Debian 10

Overview

  1. Install proprietary nvidia driver
  2. Install nvidia-container-toolkit, containing CUDA (?) (via apt-get)
  3. Install tensorflow docker container

(1) Install proprietary nvidia driver

Instructions from https://linuxconfig.org/how-to-install-nvidia-driver-on-debian-10-buster-linux

Preparations:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install nvidia-detect

Check recommended driver:

nvidia-detect

Possible output (depending on your graphics card model):

Detected NVIDIA GPUs:
05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] [10de:1c03] (rev a1)

Checking card:  NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
Your card is supported by the default drivers and legacy driver series 390.
It is recommended to install the
    nvidia-driver
package.

Download the recommended nvidia source package from https://www.nvidia.com/en-us/drivers/unix/

There seem to be two options:

  • The latest compatible driver, which at time of writing was version 430.50 (filename NVIDIA-Linux-x86_64-430.50.run)
  • The legacy driver, version 390.129 (filename: NVIDIA-Linux-x86_64-390.129.run)

The legacy driver (i.e. version 390.129) is not compatible with CUDA 10, so the newer version (i.e. version 430.50) is needed. Save the respective file at an accessible location (will need to navigate to the folder using the CLI later).

Install linux-headers (may already be installed anyway):

sudo apt-get install linux-headers-$(uname -r) build-essential

Disable the default nouveau driver:

echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nvidia-nouveau.conf

Some more preparations (perhaps unnecessary?):

sudo apt-get install module-assistant
sudo m-a prepare
sudo update-initramfs -u

Source: https://unix.stackexchange.com/a/424603

Reboot to multi-user runlevel. This will disable the GUI after reboot:

sudo systemctl set-default multi-user.target
sudo systemctl reboot

Here is some more background information on the previous step:

For systemd, the concept of runlevels is replaced by the term “targets”. There is a “mapping” between the init runlevels and systemd targets:

Login as root user. cd to the directory containing the installation file (NVIDIA-Linux-x86_64-390.129.run) and install the nvidia driver by running:

bash NVIDIA-Linux-x86_64-390.116.run

During the installation you may be asked the following set of questions:

Register kernel module sources with DKSM?

--> Yes

(The CC version check failed: The kernel was built with gcc version 8.2.0 (Debian 8.2.0-14), but the current compiler version is cc (Debian 8.3.0-2) 8.3.0. This may lead to subtle problems; if you are not certain whether the mismatched compiler will be compatible with your kernel, you may wish to abort installation, set the CC environment variable to the name of the compiler used to compile your kernel, and restart installation.

--> Ignore CC version check

Install NVIDIA's 32-bit compatibility libraries?

--> Yes

An incomplete installation of libglvnd was found. Do you want to install a full copy of libglvnd? This will overwrite any existing libglvnd libraries.

--> Install and overwrite existing filesort installation.

Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.

--> Yes

Reboot the system, back into GUI mode:

systemctl set-default graphical.target
systemctl reboot

(2) Install nvidia-container-toolkit

Instructions from: https://github.com/NVIDIA/nvidia-docker

Add the package repositories:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Install the toolkit:

sudo apt-get update
sudo apt-get install nvidia-container-toolkit nvidia-docker2
sudo systemctl restart docker

Reboot system.

Install nvidia cuda docker container:

docker run --runtime=nvidia --rm nvidia/cuda:10.1-devel nvidia-smi

(3) Install tensorflow docker container

Instructions from: https://www.tensorflow.org/install/docker

Pull docker image:

docker pull tensorflow/tensorflow:2.3.1-gpu

Test docker images:

docker run -it --rm tensorflow/tensorflow:2.3.1-gpu bash

Sources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment