Some random notes on trying (and failing) to get Proxmox as host with 5700G APU GPU PCI Passthrough to Ubuntu guest VM working:
References:
- Looks like this one claims to have this working (but I have not tested the method) :
- Others trying to get this working:
- https://forum.proxmox.com/threads/amd-ryzen-7-renoir-4750g-apu-and-igpu-pass-thru-to-windows-10-guest.84849/
- https://forums.unraid.net/topic/112649-amd-apu-ryzen-5700g-igpu-passthrough-on-692/
- https://forums.unraid.net/topic/100729-help-requested-for-amd-ryzen-5-pro-4650g-passthrough/
- https://forum.level1techs.com/t/got-my-ryzen-4750g-apu-igpu-to-pass-through-to-qemu-kvm-vm-but-display-output-is-pixelated-garbage-after-amd-apu-radeon-driver-install-from-amd-or-windows-update/169903
- https://forum.level1techs.com/t/amd-apu-passthrough-is-it-even-possible/163000
- https://forum.qubes-os.org/t/amd-igpu-passthrough-attempt/6766
- Seems like the AMD / kernel devs are still struggling to make this work too: https://lore.kernel.org/all/CADnq5_PpnoGCxSO95+mEkcXuR7umWU-hTtUQh2G8q5xPNzPzrg@mail.gmail.com/T/
- General passthrough info/guides
- https://pve.proxmox.com/wiki/Pci_passthrough
- https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/
- Passing a GPU to a Proxmox VM with PCI Passthrough - https://i12bretro.github.io/tutorials/0650.html
- Single GPU Passthrough guides
- This approach might have some valuable info for disabling the GPU on the host, but likely we want to disable at boot, not sharing/switching the gpu at runtime.
- https://gitlab.com/risingprismtv/single-gpu-passthrough/-/wikis/home
- https://github.com/wabulu/Single-GPU-passthrough-amd-nvidia
Made some progress, but i am thinking this may not possible (for now without more support from AMD?) due to the shared memory of the GPU and system. Various AMD features (like PSP and TMZ) are meant for security and ensuring the programs/cpu can't read your GPU memory. I am totally unknowledgeable here, so I am speculating based off various threads. I suspect the stability issues various people mentioned may be due to issues with reserving memory for the GPU. Not sure, but it seems the memory sharing happens inside the VM at the driver level (rather than on the host at a higher level of hardware), as when i give the VM 10GB, i can see the amdgpu driver grabs 2GB and only 8GB are left to the OS on the host. So maybe this is all self-contained cleanly in the VM, and not really an issue. Not sure.
Anyway, I'm trying Proxmox and I was able to get an external display working (only on first boot??) inside an Ubuntu guest VM, and it seems like various 3d apps run at high fps (vkcube, glxgears) although i haven't installed any games yet to test, and not sure if this is really running on the GPU or CPU since the grpahic driver is shown as llvmpipe
. Additionally, the amdgpu
driver is still not loading and is giving some errors even though the monitor and those 3d apps work. Also, no audio yet either :( .
In case it helps others solve this, here's some things i've learned:
- On the Host, modify
/etc/default/grub
and then runupdate-grub
:
GRUB_DEFAULT=0
GRUB_TIMEOUT=0
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt video=efifb:off pcie_acs_override=downstream,multifunction amdgpu.exp_hw_support=1 modprobe.blacklist=amdgpu,snd_hda_intel,ccp textonly loglevel=0 silent text nomodeset"
GRUB_TERMINAL=serial
#GRUB_TERMINAL_OUTPUT -- comment this out along with GRUB_TERMINAL_INPUT if included, since GRUB_TERMINAL overrides
GRUB_FORCE_HIDDEN_MENU="true"
-
Not sure if all of these kernel params are needed, and some may be excessive, but most of this is random guesses trying to get the host not to touch any sort of display during boot, etc. Also, the pcie_acs_override is to split the PCI IOMMU groups since by default the GPU group included USB devices that i did not want to pass through. I am not sure if this is a problem, and possibly the full original group (including USB needs to be passed through.
-
On the host, blacklist these via /etc/modprobe.d/blacklist.conf (or similar) to avoid having your host initialize the AMD GPU and related devices (snd_hda_intel for audio and ccp for PSP security device i think) :
echo "blacklist amdgpu" >> /etc/modprobe.d/pve-blacklist.conf
echo "blacklist snd_hda_intel" >> /etc/modprobe.d/pve-blacklist.conf
echo "blacklist ccp" >> /etc/modprobe.d/pve-blacklist.conf
-
reboot
-
ensure these weren't activated by checking the boot logs such as
sudo dmesg | grep amdgpu
- note,
ccp
still gets loaded bykvm_amd
no matter what i tried. not sure if this is a blocker
- note,
-
Confirm your IOMMU groups are split if needed so you can pass in the GPU+audio on its own by looking for different groups via this script: https://gist.github.com/flungo/428c374c040de1d0a30fd4a593d39040
-
Install/Setup your VM using q35 and OVMF (UEFI) without setting up PCI passthrough yet .. for me in ProxMox, this is just for initial setup in the
noVNC
console as hardware display may not work yet.. I had to switch to SeaBIOS after setup was complete to get the physical display output working. I would hope/assume we could get UEFI working, but it didn't see to work for me at all with this vbios. -
After you have a working VM, install a backup way to get in such as regular VNC or openssh-server, then shut it down, and switch the VM to SeaBIOS .. i left the UEFI disk in place so i could switch back and forth as needed.
-
To add the PCI Passthrough you'll need to and get a vbios rom:
-
I found a copy of the 5700G VBIOS from a similar machine here: https://rog.asus.com/us/desktops/mid-tower/rog-strix-g10dk-series/helpdesk_bios ..
-
Extract using VBiosFinder : https://github.com/coderobe/VBiosFinder
- Install all the dependencies. Especially UEFIExtract [ binaries ] and rom-parser
-
This will output a few rom's, and
vbios_1002_1638_1.rom
seemed to work the best for me. (Sorry, not sure of the legality of posting the actual rom file) -
Copy this rom onto your Host in the appropriate dir. For me on proxmox, this was in
/usr/share/kvm
-
Get your PCI id's via
lspci
on the host, which for me are:
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev c8)
05:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
05:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
05:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
05:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
05:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor [1022:15e2] (rev 01)
05:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
- I think I only want #0 and #1 and #2 to passthrough (although again, i wonder if the others are required). Not sure about the #2 PSP (especially since i'm disabling PSP later) and 5/6 audio if they are duplicate of #1 audio. I did confirm that including the USB controllers will crash the host system, so if this is even possible you'd need the pcie_acs_override in grub to exclude those into different IOMMU groups. In your host's terminal, add your PCI id's to
/etc/pve/qemu-server/100.conf
(assuming this is your first VM #100) since you can't add the romfile setting via the proxmox web screens:
hostpci0: 0000:05:00.0;0000:05:00.1;0000:05:00.2,pcie=1,x-vga=1,romfile=vbios_1002_1638_1.rom
-
Note that
x-vga=1
will force this to be the primary display, and proxmox's built in "noVNC" viewer will stop working.- Ensure you have another way to get in to your VM before this, such as passing through a usb keyboard/mouse, or openssh-server, regular VNC, etc.
-
Start your guest VM and you may get video output on your external display (i'm using the HDMI output), and various errors in the guest's dmesg if you grep for amdgpu ..
-
To fix error complaining about PSP unable to load rom.. Inside the Guest (NOT THE HOST!!!) , add these kernel boot params (for me via
/etc/default/grub
and thenupdate-grub
) : amgdpu.fw_load_type=0- This disables PSP which was failing to read the vbios rom, not sure if this has other side effects or removes needed functionality, but it gets some of the errors to go away.
GRUB_CMDLINE_LINUX_DEFAULT="amdgpu.fw_load_type=0"
-
Also, I played with disabling TMZ in the Guest's
/etc/default/grub
kernel params via amdgpu.tmz=0 .. not sure this did anything, so i left it out -
reboot the guest and you should get rid of many errors. I am left with:
[ 0.000000] Linux version 5.13.0-27-generic (buildd@lcy02-amd64-014) (gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #29-Ubuntu SMP Wed Jan 12 17:36:47 UTC 2022 (Ubuntu 5.13.0-27.29-generic 5.13.19)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic root=/dev/mapper/vgubuntu-root ro amdgpu.fw_load_type=0 amdgpu.dc=1 radeon.cik_support=0 amdgpu.cik_support=1
[ 0.017576] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic root=/dev/mapper/vgubuntu-root ro amdgpu.fw_load_type=0 amdgpu.dc=1 radeon.cik_support=0 amdgpu.cik_support=1
[ 11.189820] [drm] amdgpu kernel modesetting enabled.
[ 11.189897] amdgpu: CRAT table not found
[ 11.189899] amdgpu: Virtual CRAT table created for CPU
[ 11.189906] amdgpu: Topology: Add CPU node
[ 11.189947] fb0: switching to amdgpudrmfb from VESA VGA
[ 11.190056] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
[ 11.190329] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[ 11.197046] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 11.197048] amdgpu: ATOM BIOS: 113-CEZANNE-017
[ 11.229736] amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[ 11.229738] amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[ 11.229739] amdgpu 0000:01:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
[ 11.229791] [drm] amdgpu: 2048M of VRAM memory ready
[ 11.229793] [drm] amdgpu: 3072M of GTT memory ready.
[ 12.708341] amdgpu 0000:01:00.0: amdgpu: SMU is initialized successfully!
[ 12.872471] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 12.872675] [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
[ 12.872825] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v9_0> failed -110
[ 12.872939] amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
[ 12.872942] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
[ 12.872993] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
- amdgpu driver on guest still doesn't load as shown above
- audio doesn't work, likely same issue
- general stability. haven't tracked this thoroughly, but maybe only the first boot of the VM works.. after that, subsequent boots of the VM can freeze, and even once my proxmox host froze. again, i think the shared memory allocation may be a problem, but no clue.
- On host. i think the USB controllers in the original IOMMU group really are tied to the APU, and there is an error trying to reset the extra audio devices after the VM shuts down. I had to remove these devices from the pci passthrough to get rid of this error.
Jan 23 17:54:55 minispve kernel: usb 1-2: reset low-speed USB device number 2 using xhci_hcd
Jan 23 17:59:30 minispve QEMU[7222]: kvm: vfio: Cannot reset device 0000:05:00.6, depends on group 17 which is not owned.
Jan 23 17:59:30 minispve QEMU[7222]: kvm: vfio: Cannot reset device 0000:05:00.5, depends on group 17 which is not owned.
- I want to try a Windows10 VM as well to see if there's any difference
AMD reset bug??