scyto/proxmox-vGPU.md

## proxmox-vGPU.md

      
    Raw
  

              proxmox-vGPU.md
            
          
    Enable & Using vGPU Passthrough

This gist is almost entirely not unlike Derek Seaman's awesome blog:
Proxmox VE 8: Windows 11 vGPU (VT-d) Passthrough with Intel Alder Lake
As such please refer to that for pictures, here i will capture the command lines I used as i sequence the commands a little differently so it makes more logic to me.
This gists assumes you are not running ZFS and are not passing any other PCIE devices (as both of these can require addtional steps - see Derek's blog for more info)
This gist assumes you are not running proxmox in UEFI Secure boot - if you are please refer entirely to dereks blog.
ALSO pleas refere to the comments section as folks have found workarounds and probably corrections (if the mistakes remain in my write up it is because i have't yet tested the corrections)
Note:i made no changes to the BIOS  defaults on the Intel Nuc 13th Gen.  This just worked as-is.
this gist is part of this series
Preparation

Install Build Requirements

apt update && apt install pve-headers-$(uname -r)
apt install git sysfsutils dkms build-* unzip -y

Install Other Drivers / Tools

This allow you to run vainfo, intel_gpu_top  for testing and non-free versions of the encoding driver - without this you will not AFAIK be able to encoding with this GPU.  This was missed in EVERY guide i saw for this vGPU, so not sure, but i had terrible issues until i did this.
edits the sources list with nano /etc/apt/sources.list
add the following lines:
#non-free firmwares
deb http://deb.debian.org/debian bookworm non-free-firmware

#non-free drivers and components
deb http://deb.debian.org/debian bookworm non-free

and save the file
apt update && apt install intel-media-va-driver-non-free intel-gpu-tools vainfo

This next step copies a driver missing on proxmox installs and will remove the -2 error for this file in dmesg.
wget -r -nd -e robots=no -A '*.bin' --accept-regex '/plain/' https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915/adlp_dmc.bin

cp adlp_dmc.bin /lib/firmware/i915/

Compile and Install the new driver

Clone github project

cd ~
git clone https://github.com/strongtz/i915-sriov-dkms.git


modify dkms.conf

cd i915-sriov-dkms
nano dkms.conf

change these two lines as follows:
PACKAGE_NAME="i915-sriov-dkms"
PACKAGE_VERSION="6.5"

save the file
Compile and Install the Driver

cd ~
mv i915-sriov-dkms/ /usr/src/i915-sriov-dkms-6.5
dkms install --force -m i915-sriov-dkms -v 6.5

and use dkms status to verify the module is now installed
Modify grub

edit the grub fle with nano /etc/default/grub
change this line in the file
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7"

note: if you have already made modifications to this line in your grub file for other purposes you should also still keep those items
finally run
update-grub
update-initramfs -u


Find PCIe Bus and update sysfs.conf

use lspci | grep VGA t find the bus number
you should see something like this:
root@pve2:~# lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04)

take the number on the far left and add to the sysfs.conf as follows - note all the proceeding zeros on the bus path are needed
echo "devices/pci0000:00/0000:00:02.0/sriov_numvfs = 7" > /etc/sysfs.conf

REBOOT
Testing On Host

check devices

check devices with dmesg | grep i915
the last two lines should read as follows:
[    7.591662] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.7 on minor 7
[    7.591818] i915 0000:00:02.0: Enabled 7 VFs

if they don't then check all steps carefully
Validate with VAInfo

validate with vainfo you should see no errors (note this needs the drivers and tool i said to install at the top)
and vainfo --display drm --device /dev/dri/cardN where N is a number from 0 to 7 - this will show you the acceleration endpoints for each VF
Check you can monitor the VFs - if not you have issues

monitor any VF renderer in real time with intel_gpu_top -d drm:/dev/dri/renderD128 there is one per VF - to see them all use ls -l /dev/dri
Configure vGPU Pool in Proxmox


navigate to Datacenter > Resource Mappings
click add in PCI devices
name the pool something like vGPU-Pool
map all 7 VFs for pve 1 but NOT the root device i.e 0000:00:02.x not 0000:00:02
click create
on the created pool lcikc the plus button next to vGPU-Pool
select mapping on node = pve 2, ad all devices and click create
repeat for pve3

The pool should now look like this:

Note: machines with PCI pass through devices cannot be live migrated, they must be shutdown, migrated offline to the new node and then started.
EVERYTIME THE KERNEL IS UPDATED IN PROXMOX YOU SHOULD DO THE FOLLOWING

update the kernel using proxox ui
dkms install -m i915-sriov-dkms -v 6.5 --force
reboot


## vGPU-container-privileged.md

      
    Raw
  

              vGPU-container-privileged.md
            
          
    How to get working in a privileged container

wow this one is hard.... you can avoid the id mapping stuff by not using a privileged container...
Assumptions:


you have a debian 12 container, you added the non-free deb and have installed the non-free drivers as per the host instructions
you have run cat /etc/groups in the container and noted down the GID for render (lets call that CTRGID) and gid for video (lets call that CTVGID).
you have run cat /etc/groups in the container and noted down the GID for render (lets call that HSTRGID) and gid for video (lets call that HSTVGID).
5 that you have va info fully working

Create Container


create container privileged, with debian 12, starts it
apt update, apt upgrade, install non free drivers, vainfo and intel_gpu_top tools
add root to user and video groups (this will mean when we get to ID mapping you don't need to tart about with user mappings - only group ones)

usermod -a -G render root
usermod -a -G video root


shutdown container

Edit container conf file


These are stored in /etc/pve/lxc and have the VMID.conf anme
nano /etc/pve/lxc/VMID.conf

Add lxc device mapping

Here you add a line for the card uyou want and the rendere.
Note if you map a VF (card) to a container it means that is hard mapped, if you have that VF in a pool for VMs please remove it from the pool (this means also these containers cannot be HA)
In the example below i chose card6 - which is renderD134
These are mapped into the container as card0 and renderD128
Change your numbers as per your own VF / card mappings
lxc.cgroup2.devices.allow: c 226:6 rwm
lxc.mount.entry: /dev/dri/card6 dev/dri/card0 none bind,optional,create=file

lxc.cgroup2.devices.allow: c 226:134 rwm
lxc.mount.entry: /dev/dri/renderD134 dev/dri/renderD128 none bind,optional,create=file


Add ID mapping (only needed in unprivileged)


add the following... and here it gets complex as it will vary based on the numbers you recorded earlier - let me try... the aim is to have a continguois block of mappings but the syntax is um difficult...

lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 CTVGID
lxc.idmap: g CTVGID HSTVGID 1
lxc.idmap: g CTVGID+1 1000{CTVGID+1} CTRGID-CTVGID-1
lxc.idmap: g CTRGID HSTVGID 1
lxc.idmap: g CTRGID+1 100{CTRGID+1} 65536-{CTRGID+1}

so as an example, these are my values:
        host > ct
video:    44 > 44
render:  104 > 106

this is what i added to my VMID.conf file (in  my case /etc/pve/lxc/107.conf
lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 61
lxc.idmap: g 106 104 1
lxc.idmap: g 107 100107 65429


add your two CT values to nano /etc/subgid  (only needed in unprivileged)

in my case:
root:106:1
root:44:1

after this you should be able to start up the container and run vainfo and perform transcoding.
check permissions with ls -la /dev/dri it should look like this:
root@vGPU-debian-test:~# ls -la /dev/dri
total 0
drwxr-xr-x 2 root   root         80 Oct  7 00:22 .
drwxr-xr-x 7 root   root        500 Oct  7 00:22 ..
crw-rw-rw- 1 nobody video  226,   0 Oct  4 21:42 card0
crw-rw-rw- 1 nobody render 226, 128 Oct  4 21:42 renderD128

if the group names do not say video and render then you did something wrong
**Note: YYMV **
For example plex HW transcoded just fine on my system.
Emby on the otherhand seems to interrogate the kernel driver directly and gets the wrong answers - this is IMHO an issue with their detection logic not supporting this scenario.
Another example is intel_gpu_top which doesn't seem to work in this mode either - this is because it only works with the PMUs not the VFs (so somoene said)
Or maybe i just have no clue what i am doing, lol.
---work in progress 2023.10.6---

  
## vGPU-windows-VM.md

      
    Raw
  

              vGPU-windows-VM.md
            
          
    add vGPU to a Windows 11 or Server 2022 VM


create VM with CPU set to host DO NOT CHANGE THIS
boot VM without vGPU and display set to default
install windows 11
install VirtIO drivers [as of 4.6.2024 do not install guest tools - this may cause repair loops]
shutdown VM and change display to VirtIO-GPU
Now add the vGPU pool as a PCI device
when creating a VM add a PCI device and add the poool as follows:


now boot into VM and install latest IrisXe drivers from intel
you should now have graphics acceleration availble to apps wether you connect by webcolse VNC, SPICE or an RDP client

From @rinze24:

If you follow the guide successfully, in Device Manager you will see:

Microsoft Basic Display Adapter - If you use Display in VM Settings
Intel iGPU - passthrough

You have 2 options (or more) to use your iGPU. Because Windows 11 decide on its own which graphics to use.

Setup Remote Desktop Connection in Windows 11 and set the display to none in VM Hardware settings.


Pro: No configuration per app, Responsive Connection.
Con: No proxmox console.


Inside Windows Set which graphics preference to use per application in Display Settings -> Graphics Settings-


Pro: Have proxmox console.
Con: Need to configure per application / program.


If you hit automatic repair loop at any point shutdown the machine and edit its conf file in /etc/pve/qemu-server and add
args: -cpu Cooperlake,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,+vmx