Skip to content

Instantly share code, notes, and snippets.

@bernardomig
Last active October 29, 2023 15:32
Show Gist options
  • Star 30 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save bernardomig/315534407585d5912f5616c35c7fe374 to your computer and use it in GitHub Desktop.
Save bernardomig/315534407585d5912f5616c35c7fe374 to your computer and use it in GitHub Desktop.
Setup podman and nvidia-container-runtime

How to setup nvidia-container-runtime and podman/runc

Podman is such a cool project! However, there is no easy way to setup the nvidia-container-runtime and podman so that we can run unprivileged container in a gpu host. This is specially interesting in environments with multiple people accessing the same host (strong isolation between containers!!!).

Steps to setup the whole system

  1. Install podman and friends (buildah and skopeo)

    Ubuntu: add-apt-repository -y ppa:projectatomic/ppa && apt install podman buildah skopeo

    Fedora: yum install podman buildah skopeo

  2. Install the nvidia-container-runtime Instructions in https://nvidia.github.io/libnvidia-container/

  3. Install the nvidia-hook.json

cat <<EOF >> /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
{
  "hook": "/usr/bin/nvidia-container-runtime-hook",
  "arguments": ["prestart"],
  "annotations": ["sandbox"],
  "stage": [ "prestart" ]
}
EOF
  1. Configure the nvidia-container-runtime
cat <<EOF >> /etc/nvidia-container-runtime/config.toml
disable-require = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-runtime-hook.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
EOF

How to use the system?

Just use the podman as before, but it the nvidia-container-runtime-hook will inject the nvidia driver into the container runtime. It Just Works™

podman run -it --rm nvidia/cuda nvidia-smi
Tue May  7 14:09:49 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce MX150       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   33C    P8    N/A /  N/A |      0MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
@mrivard
Copy link

mrivard commented Aug 21, 2021

The URL of the nvidia-container-runtime GitHub project is incorrect. The correct URL is https://github.com/NVIDIA/nvidia-container-runtime/ .

@gmat
Copy link

gmat commented Oct 27, 2021

an article on the same subject tested on centos 8 https://www.redhat.com/en/blog/how-use-gpus-containers-bare-metal-rhel-8

@jwmelto
Copy link

jwmelto commented May 8, 2023

This doesn't appear to work in the context of buildah run?

$ podman run --rm nvidia/cuda:12.1.0-devel-ubi9 bash -c 'ls -lL /usr/lib64/libnvidia*'

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

-rwxr-xr-x. 1 nobody nobody   160488 Feb 22 03:43 /usr/lib64/libnvidia-allocator.so.1
-rwxr-xr-x. 1 nobody nobody   160488 Feb 22 03:43 /usr/lib64/libnvidia-allocator.so.530.30.02
-rwxr-xr-x. 1 nobody nobody   262616 Feb 22 03:42 /usr/lib64/libnvidia-cfg.so.1
-rwxr-xr-x. 1 nobody nobody   262616 Feb 22 03:42 /usr/lib64/libnvidia-cfg.so.530.30.02
-rwxr-xr-x. 1 nobody nobody 56305912 Feb 22 04:18 /usr/lib64/libnvidia-compiler.so.530.30.02
-rwxr-xr-x. 1 nobody nobody  1806968 Feb 22 03:44 /usr/lib64/libnvidia-ml.so.1
-rwxr-xr-x. 1 nobody nobody  1806968 Feb 22 03:44 /usr/lib64/libnvidia-ml.so.530.30.02
-rwxr-xr-x. 1 nobody nobody 85979712 Feb 22 04:45 /usr/lib64/libnvidia-nvvm.so.4
-rwxr-xr-x. 1 nobody nobody 85979712 Feb 22 04:45 /usr/lib64/libnvidia-nvvm.so.530.30.02
-rwxr-xr-x. 1 nobody nobody 24199760 Feb 22 04:09 /usr/lib64/libnvidia-opencl.so.1
-rwxr-xr-x. 1 nobody nobody 24199760 Feb 22 04:09 /usr/lib64/libnvidia-opencl.so.530.30.02
-rwxr-xr-x. 1 nobody nobody 21784224 Feb 22 04:02 /usr/lib64/libnvidia-ptxjitcompiler.so.1
-rwxr-xr-x. 1 nobody nobody 21784224 Feb 22 04:02 /usr/lib64/libnvidia-ptxjitcompiler.so.530.30.02
$ nv=$(buildah from nvidia/cuda:12.1.0-devel-ubi9)
$ buildah run -- $nv -- bash -c 'ls -lL /usr/lib64/libnv*'
ls: cannot access '/usr/lib64/libnv*': No such file or directory
error while running runtime: exit status 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment