jabbany/main.md

## main.md

      
    Raw
  

              main.md
            
          
    Follow the following instructions:

Install the C compiler through sudo dnf group install "C Development Tools and Libraries"
Install the kernel headers sudo dnf install kernel-devel
Download the latest drivers (replace with URL from nvidia website)
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/550.54.14/NVIDIA-Linux-x86_64-550.54.14.run
chmod a+x NVIDIA-Linux-x86_64-550.54.14.run
sudo ./NVIDIA-Linux-x86_64-550.54.14.run


Installer creates a disable config for nouveau.
Run sudo dracut --force to rebuild the initramfs to exclude nouveau
Actually run the install this time: sudo ./NVIDIA-Linux-x86_64-550.54.14.run
Confirm that you can nvidia-smi now
Add the container toolkit repos
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
sudo dnf clean expire-cache
sudo dnf install -y nvidia-container-toolkit-base

Alternative:
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y nvidia-container-toolkit


Make sure the toolkit is installed nvidia-ctk --version
Generate the CDI: sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
Make sure the CDI is world readable: sudo chmod a+r /etc/cdi/nvidia.yaml
(If not done you get this error Error: setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=all later. Related to this)
Make sure devices are discovered via grep " name:" /etc/cdi/nvidia.yaml. Output should be a list of GPU IDs and name: all
Install podman sudo dnf install podman
Let the containers access the devices sudo chcon -t container_file_t /dev/nvidia* by fixing SELinux Context
(If not done you get this error Failed to initialize NVML: Insufficient Permissions later. See here)
NOTE: This needs to be rerun on every reboot.
Run this to check podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L. You should get: GPU 0: NVIDIA (...) (UUID: ...)