Skip to content

Instantly share code, notes, and snippets.

@kuang-da
Last active February 15, 2024 21:04
Show Gist options
  • Save kuang-da/2796a792ced96deaf466fdfb7651aa2e to your computer and use it in GitHub Desktop.
Save kuang-da/2796a792ced96deaf466fdfb7651aa2e to your computer and use it in GitHub Desktop.
[Install nvidia-docker2 In Pop!_OS]#popos

Introduction

This gist is a note about install nvidia-docker in Pop!_OS 20.10. nvidia-docker is used to help docker containers compute on GPU.

The basic installcation is in Nvidia's offical documentation. But there are a few tweaks to make it work on Pop!_OS 20.10.

Setting up Docker

No surprise. Follow the offical documentaion should work.

Setting up NVIDIA Container Toolkit

Adding NVIDIA Source

Pop!_OS is an "Unsupported distribution" in Nvidia source. Also, Ubuntu 20.10 are not supported by Nvidia source yet. So we need to change the distribution into ubuntu20.04 when adding sources. For instacne,

distribution="ubuntu20.04" \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Reference:

Install nvidia-docker2

While installing nvidia-docker2, I got the following error

(base) ➜  ~ sudo apt-get install -y nvidia-docker2              
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.5.0) but 3.4.0-1pop1~1601325114~20.10~2880fc6 is to be installed
E: Unable to correct problems, you have held broken packages.

It is because Pop!_OS's own source for Nvidia driver has high priority than Nvidia's offical source. But the dependencies for nvidia-docker2 falls behind to Nvidia's offical source. To fix that, we could give nvdia docker source a higher priority as folllows.

vi /etc/apt/preferences.d/nvidia-docker-pin-1002
with content;
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002

Then follow the offical documentation by running the following command. We will launch a container with GPU.

sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Reference:

@illtellyoulater
Copy link

Hey everyone, I have noticed that there are two main methods for installing nvidia-docker2 on Pop!_OS 22.04. One is described in the System76 support article (updated in March 2023), and the other is outlined in this gist. As a non-expert, I was curious about the differences between these two methods, and what the advantages and disadvantages of each might be. So I asked ChatGPT-4.0 to explain the differences and here's the comprehensive response it provided:

The first method, as outlined in the System76 support article, involves using the nvidia-container-toolkit package and executes the following instructions:

sudo apt update
sudo apt full-upgrade
sudo apt install nvidia-container-toolkit docker.io
sudo usermod -aG docker $USER
sudo kernelstub --add-options "systemd.unified_cgroup_hierarchy=0"
[reboot...]
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
[done]

This approach appears straightforward and may be easier for novice users to follow. Each command updates the system, installs the necessary packages, adds the current user to the Docker group (allowing Docker commands to be run without sudo), modifies a kernel parameter to disable the unified cgroup hierarchy (a feature of systemd), reboots the system, configures Docker to use the NVIDIA libraries when running containers, and restarts Docker. However, it might not support the most recent versions of CUDA if the nvidia-container-toolkit package hasn't been updated recently. Notably, this method involves disabling the unified cgroup hierarchy feature of systemd, a significant system component, which could potentially lead to compatibility issues with future software that expects this feature to be enabled.

The second method, as described in this gist, involves using the nvidia-docker2 package and executes the following instructions:

sudo apt install nvidia-docker2
set no-cgroups = true in /etc/nvidia-container-runtime/control.toml
sudo systemctl restart docker
vi /etc/apt/preferences.d/nvidia-docker-pin-1002
  Package: *nvidia*
  Pin: origin nvidia.github.io
  Pin-Priority: 1002

This method seems more complex and may require a deeper understanding of Docker and Linux systems. However, it can be considered less invasive as it avoids modifying system components, and more flexible as it might support more recent versions of CUDA if the nvidia-docker2 package has been recently updated.

When considering the benefits and drawbacks of the two methods, the first option may be simpler to execute and more reliable due to its direct support from System76, the developers of Pop!_OS. However, it may not be compatible with the most current versions of CUDA. In contrast, the second option may support newer versions of CUDA, but it may be more challenging to implement and less stable since it lacks direct support from System76.

In order to determine the most suitable method for your needs, it is necessary to consider various factors such as your technical proficiency, the version of CUDA you plan to use, and your specific requirements. Which of the two methods you choose will ultimately depend on these considerations.

@kuang-da
Copy link
Author

kuang-da commented Jun 7, 2023

Hi @illtellyoulater,

The primary distinction between my gist and system76's tutorial is the inclusion of source from NVIDIA in my gist. This ensures that the nvidia-container-toolkit is the most up-to-date version. On the other hand, system76's tutorial installs nvidia-container-toolkit from their own channel. Both approaches are likely to be effective.

PS: Prior to creating this gist, I encountered issues with system76's tutorial, although I can't recall the exact reasons now. I'm pleasantly surprised that this gist continues to be helpful to others even after two years. 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment