Skip to content

Instantly share code, notes, and snippets.

@jabbany
Last active June 28, 2024 11:54
Show Gist options
  • Save jabbany/66e633c3fd30f81159feb46e4da25c55 to your computer and use it in GitHub Desktop.
Save jabbany/66e633c3fd30f81159feb46e4da25c55 to your computer and use it in GitHub Desktop.
Setting up podman + nvidia on F37 - F40 (Self notes)

Follow the following instructions:

  1. Install the C compiler through sudo dnf group install "C Development Tools and Libraries"
  2. Install the kernel headers sudo dnf install kernel-devel
  3. Download the latest drivers (replace with URL from nvidia website)
    wget https://us.download.nvidia.com/XFree86/Linux-x86_64/550.54.14/NVIDIA-Linux-x86_64-550.54.14.run
    chmod a+x NVIDIA-Linux-x86_64-550.54.14.run
    sudo ./NVIDIA-Linux-x86_64-550.54.14.run
    
  4. Installer creates a disable config for nouveau.
  5. Run sudo dracut --force to rebuild the initramfs to exclude nouveau
  6. Actually run the install this time: sudo ./NVIDIA-Linux-x86_64-550.54.14.run
  7. Confirm that you can nvidia-smi now
  8. Add the container toolkit repos
    sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
    sudo dnf clean expire-cache
    sudo dnf install -y nvidia-container-toolkit-base
    
    Alternative:
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
    sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    sudo dnf install -y nvidia-container-toolkit
    
  9. Make sure the toolkit is installed nvidia-ctk --version
  10. Generate the CDI: sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
  11. Make sure the CDI is world readable: sudo chmod a+r /etc/cdi/nvidia.yaml (If not done you get this error Error: setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=all later. Related to this)
  12. Make sure devices are discovered via grep " name:" /etc/cdi/nvidia.yaml. Output should be a list of GPU IDs and name: all
  13. Install podman sudo dnf install podman
  14. Let the containers access the devices sudo chcon -t container_file_t /dev/nvidia* by fixing SELinux Context (If not done you get this error Failed to initialize NVML: Insufficient Permissions later. See here) NOTE: This needs to be rerun on every reboot.
  15. Run this to check podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L. You should get: GPU 0: NVIDIA (...) (UUID: ...)
@gaquinozh
Copy link

That was so helpful! Thank you a thousand times.

@alecco
Copy link

alecco commented Oct 5, 2023

Nice one.

Next step is look for images in cuda/doc/supported-tags.md

podman pull docker.io/nvidia/cuda:12.2.0-devel-ubuntu22.04    # download devel 12.2 image
podman run --rm --device=nvidia.com/gpu=all -it docker.io/nvidia/cuda:12.2.0-devel-ubuntu22.04 nvidia-smi -L  # check container sees GPU, output should be "GPU 0: NVIDIA ..."
podman run --rm --device=nvidia.com/gpu=all -it docker.io/nvidia/cuda:12.2.0-devel-ubuntu22.04 bash

@alecco
Copy link

alecco commented Oct 7, 2023

FWIW if after an upgrade the container stops seeing the driver:

#  ls -la /dev/nvidia*
ls: cannot access '/dev/nvidia-modeset': Permission denied
ls: cannot access '/dev/nvidia-uvm': Permission denied
ls: cannot access '/dev/nvidia-uvm-tools': Permission denied
ls: cannot access '/dev/nvidia0': Permission denied
ls: cannot access '/dev/nvidiactl': Permission denied

Add --security-opt label=disable to podman run or podman create. But mind the security hole if you are running 3rd party stuff.

@otavio-silva
Copy link

It's working nicely, thank you!

@ThomasHalwax
Copy link

Thanks, got it up and running!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment