Skip to content

Instantly share code, notes, and snippets.

@vadimstasiev
Created February 7, 2023 00:04
Show Gist options
  • Save vadimstasiev/d18ac345e87c152a599559632eb0395d to your computer and use it in GitHub Desktop.
Save vadimstasiev/d18ac345e87c152a599559632eb0395d to your computer and use it in GitHub Desktop.
Nvidia Quadro P400 Passthrough on Proxmox - Reddit Post

Source: https://www.reddit.com/r/jellyfin/comments/cig9kh/nvidia_quadro_p400_passthrough_on_proxmox/

Nvidia Quadro P400 Passthrough on Proxmox

Ok, finally got it working. I'll provide a writeup here of my steps. But these are for Proxmox with an Ubuntu 1810 container only. If you want to use a different setup, you will have to figure it out yourself.

On the host (Proxmox):

  • Install the Proxmox Linux Headers. The version should match your kernel
    apt install pve-headers-$(uname -r)
  • Download the Nvidia driver, make it executable and run it:
    wget http://us.download.nvidia.com/XFree86/Linux-x86_64/430.34/NVIDIA-Linux-x86_64-430.34.run && chmod +x NVIDIA-Linux-x86_64-430.34.run && ./NVIDIA-Linux-x86_64-430.34.run
    You want to use dkms but not update the x config file
  • Load the Nvidia kernel modules at boot time. For this edit the file /etc/modules-load.d/modules.conf and add the lines nvidia and nvidia-uvm. The file should something like this:

    # /etc/modules: kernel modules to load at boot time.
    #
    # This file contains the names of kernel modules that should be loaded
    # at boot time, one per line. Lines beginning with "#" are ignored.
    nvidia
    nvidia-uvm
  • Create a script. (I used vim /root/nvidia-dev-node-setup) and fill it with following bash script code

#!/bin/bash

/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then
    # Count the number of NVIDIA controllers found.
    NVDEVS=`lspci | grep -i NVIDIA`
    N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
    NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
    N=`expr $N3D + $NVGA - 1`
    for i in `seq 0 $N`; do
        mknod -m 666 /dev/nvidia$i c 195 $i
    done
    mknod -m 666 /dev/nvidiactl c 195 255
else
    exit 1
fi

/sbin/modprobe nvidia-uvm

if [ "$?" -eq 0 ]; then
     # Find out the major device number used by the nvidia-uvm driver
     D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
     mknod -m 666 /dev/nvidia-uvm c $D 0
else
    exit 1
fi

/usr/bin/nvidia-modprobe -u -c 0

/usr/bin/nvidia-persistenced --persistence-mode
  • Edit your crontab with crontab -e and add following line at the end:@reboot /root/nvidia-dev-node-setup
  • Reboot your Proxmox host. When rebooted, the command ls -lah /dev/nvidia* should show these devices:

root@pve:~# ls -lah /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Jul 28 14:34 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Jul 28 14:34 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Jul 28 14:34 /dev/nvidia-modeset
crw-rw-rw- 1 root root 237,   0 Jul 28 14:34 /dev/nvidia-uvm
crw-rw-rw- 1 root root 237,   1 Jul 28 14:34 /dev/nvidia-uvm-tools
  • When executing ls -lah /dev/dri/* you should see something like this:

root@pve:~# ls -lah /dev/dri/*
crw-rw---- 1 root video 226,   0 Jul 28 14:34 /dev/dri/card0
crw-rw---- 1 root video 226,   1 Jul 28 14:34 /dev/dri/card1
crw-rw---- 1 root video 226, 128 Jul 28 14:34 /dev/dri/renderD128
  • Please note the numbers in the fifth column when executing both these commands (eg. 195, 237, 226)
  • Last but not least the Quadro should be recognized by nvidia-smi:

root@pve:~# nvidia-smi
Sun Jul 28 14:51:13 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34       Driver Version: 430.34       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P400         On   | 00000000:07:00.0 Off |                  N/A |
| 34%   35C    P8    N/A /  N/A |      1MiB /  2000MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

In the container

  • Set up an privileged (dunno if unprivileged works too) Ubuntu 18.10 LXC container on proxmox
  • Setup a sudo user https://linuxize.com/post/how-to-create-a-sudo-user-on-ubuntu/
  • Login as the new user and install jellyfin (taken from the official documentation)
    sudo apt install -y apt-transport-https software-properties-common && sudo add-apt-repository universe && wget -O - https://repo.jellyfin.org/ubuntu/jellyfin_team.gpg.key | sudo apt-key add - && echo "deb [arch=$( dpkg --print-architecture )] https://repo.jellyfin.org/ubuntu $( lsb_release -c -s ) main" | sudo tee /etc/apt/sources.list.d/jellyfin.list && sudo apt update && sudo apt install -y jellyfin && sudo systemctl enable jellyfin && sudo reboot
  • After rebooting download and install the same Nvidia driver as on the host, but without the kernel modules (since we share the kernel with the host and it already has the modules)
    wget http://us.download.nvidia.com/XFree86/Linux-x86_64/430.34/NVIDIA-Linux-x86_64-430.34.run && chmod +x NVIDIA-Linux-x86_64-430.34.run && sudo ./NVIDIA-Linux-x86_64-430.34.run --no-kernel-module

Again on the host

  • When this is done, shut down the container and edit its conf file. For me it was the container with the id 115:
    vim /etc/pve/nodes/pve/lxc/115.conf
  • Add following lines to the conf file. But replace the numbers in the first three lines with the numbers you saw above when doing the ls commands. Adjust the number of lines accordingly. One line per number:

lxc.cgroup.devices.allow: c 226:* rwm
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 237:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/card1 dev/dri/card1 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.mount.entry: /dev/fb0 dev/fb0 none bind,optional,create=file
  • Reboot the container

Back in the container

  • Run nvidia-smi. The graphics card should be recognized as shown above
  • Run ls -lah /dev/nvidia* and ls -lah /dev/dri*. You should see the same nodes as on the host.
  • Test ffmpeg. A transcoding command I picked up from Jellyfin logs (dunno if it is versatile enough to work with all test files on any machine) You should not see any error messages
    /usr/lib/jellyfin-ffmpeg/ffmpeg -c:v h264_cuvid -resize 426x238 -i file:"/path/to/input.mkv" -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_nvenc -force_key_frames "expr:gte(t,n_forced*5)" -copyts -avoid_negative_ts disabled -start_at_zero -pix_fmt yuv420p -preset default -b:v 64000 -maxrate 64000 -bufsize 128000 -profile:v high -vsync -1 -map_metadata -1 -map_chapters -1 -threads 0 -codec:a:0 libmp3lame -ac 2 -ab 128000 -af "volume=2" -y /path/to/output.mkv
  • Go to the Jellyfin web interface, Hamburger Menu/Admin/Dashboard/Transcoding. Choose Nvidia NVENC. Check all the boxes that appear (formats and the hardware acceleration boxes). Save the settings
  • Play something in Jellyfin. nvidia-smi should show a thread (On the host and/or container)
  • Congratulations. You have passed through your graphics card to Jellyfin in a LXC Container on Proxmox.

Talking points

  • I really struggled with Reddit markdown language when writing this. So if someone whats to structure/format this guide in a better way: be my guest
  • I ended up passing through everything graphics/Nvidia related to the container. Most likely it is not necessary, but when doing some test with removing device nodes, everything stopped working. So I leave it no as is is. But everyone is invited to optimize this.
  • The device cgroup number might change when rebooting the host. If this becomes a problem, a script might become necessary to update the LXC conf file before starting the container. Or is there a way to fix these values?

[Initial post]

Hi guys,

I bought a Quadro P400 for my home server to do some transcoding for 4K videos. I spent the last weekend to figure out what to do to get transcoding working in my Jellyfin LXC container on proxmox. I ended up passing through /dev/dri/* (which only seems to be for VAAPI) but also /dev/nvidia0, /dev/nvidiactl and /dev/nvidia-uvm. Transcoding still didn't work though.

Does someone know how to setup a Quadro/Nvidia passthrough on Proxmox with Jellyfin?

Thanks!

permalink by FriedrichNietzsche84 (↑ 16/ ↓ 0)

@vadimstasiev
Copy link
Author

vadimstasiev commented Feb 7, 2023

source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

Docker setup Nvidia container:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt update -y

apt install docker-ce nvidia-docker2

systemctl restart docker

Test it works:

docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

@vadimstasiev
Copy link
Author

error

nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown

Fix

source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#step-3-rootless-containers-setup

Allow Nvidia container to run in rootless container:
sudo sed -i 's/^#no-cgroups = false/no-cgroups = true/;' /etc/nvidia-container-runtime/config.toml

@vadimstasiev
Copy link
Author

@vadimstasiev
Copy link
Author

vadimstasiev commented Feb 9, 2023

Final working container, must note I am passing through an intel iGPU along with a quadro p400

arch: amd64
cores: 3
cpulimit: 1
features: nesting=1
hostname: docker-tdar
memory: 8000
net0: name=eth0,bridge=vmbr0,gw=10.10.10.1,hwaddr=56:33:0A:E0:71:16,ip=10.10.10.86/24,type=veth
onboot: 0
ostype: debian
rootfs: GREEN250G:2303/vm-2303-disk-0.raw,size=60G
swap: 0
unprivileged: 1
lxc.cgroup.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file,mode=0666
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.hook.pre-start: sh -c "chown 1000:1000 /dev/dri/renderD128"
lxc.hook.pre-start: sh -c "chown 1000:1000 /dev/dri/card0"
lxc.idmap: u 0 100000 44
lxc.idmap: g 0 100000 44
lxc.idmap: u 44 44 1
lxc.idmap: g 44 44 1
lxc.idmap: u 45 100045 60
lxc.idmap: g 45 100045 60
lxc.idmap: u 105 103 1
lxc.idmap: g 105 103 1
lxc.idmap: u 106 100106 894
lxc.idmap: g 106 100106 894
lxc.idmap: u 1000 1000 1
lxc.idmap: g 1000 1000 1
lxc.idmap: u 1001 101001 64535
lxc.idmap: g 1001 101001 64535
lxc.cgroup.devices.allow: c 226:* rwm
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 237:* rwm
lxc.hook.pre-start: sh -c "chown 1000:1000 /dev/nvidia*"
lxc.hook.pre-start: sh -c "chown 1000:1000 /dev/dri/card1"
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/card1 dev/dri/card1 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.mount.entry: /dev/fb0 dev/fb0 none bind,optional,create=file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment