Skip to content

Instantly share code, notes, and snippets.

@mcastelino
Last active June 18, 2024 22:25
Show Gist options
  • Save mcastelino/08f6e49f2faba295eb690a3a8ee44c70 to your computer and use it in GitHub Desktop.
Save mcastelino/08f6e49f2faba295eb690a3a8ee44c70 to your computer and use it in GitHub Desktop.
QEMU VFIO in Nested VM vIOMMU

How to use VFIO to assign a device to nested VM

  • Here the vfio-pci device is passed in into the L1 VM
  • The L1 VM is setup with kernel_irqchip=split
  • The L0 exposes a virtual IOMMU to the L1 VM
qemu-system-x86_64 \
    -machine q35,accel=kvm,kernel_irqchip=split \
    -enable-kvm \
    -bios OVMF.fd \
    -smp sockets=1,cpus=4,cores=2 -cpu host \
    -m 1024 \
    -vga none -nographic \
    -drive file="$IMAGE",if=virtio,aio=threads,format=raw \
    -netdev user,id=mynet0,hostfwd=tcp::${VMN}0022-:22,hostfwd=tcp::${VMN}2375-:2375 \
    -device virtio-net-pci,netdev=mynet0 \
    -device virtio-rng-pci \
    -monitor telnet:127.0.0.1:55555,server,nowait \
    -debugcon file:debug.log -global isa-debugcon.iobase=0x402 $@ \
    -device intel-iommu,intremap=on,caching-mode=on \
    -device vfio-pci,host=b3:00.0 \

Within the VM you will see

root@clr-d8a5d96d9a844656bcab094780f420b2 ~ # dmesg | grep -e DMAR -e IOMMU
[    0.000000] ACPI: DMAR 0x000000003E86C000 000048 (v01 BOCHS  BXPCDMAR 00000001 BXPC 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.145746] DMAR: Host address width 39
[    0.145747] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    0.145769] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 12008c22260286 ecap f00f5a
[    0.145776] DMAR: No RMRR found
[    0.145776] DMAR: No ATSR found
[    0.145825] DMAR: dmar0: Using Queued invalidation
[    0.218192] DMAR: Setting RMRR:
[    0.218193] DMAR: Prepare 0-16MiB unity mapping for LPC
[    0.219038] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    0.257194] DMAR: Intel(R) Virtualization Technology for Directed I/O

You will also see IOMMU groups setup within the VM

root@clr-d8a5d96d9a844656bcab094780f420b2 ~ # lspci -v -s 00:03.0
00:03.0 Serial controller: MosChip Semiconductor Technology Ltd. 4-Port PCIe Serial Adapter (prog-if 02 [16550])
        Subsystem: Device a000:1000
        Flags: bus master, fast devsel, latency 0, IRQ 23
        I/O ports at 60e0 [size=8]
        Memory at 90003000 (32-bit, non-prefetchable) [size=4K]
        Memory at 90002000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Kernel driver in use: serial
root@clr-d8a5d96d9a844656bcab094780f420b2 ~ # find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/5/devices/0000:00:1f.2
/sys/kernel/iommu_groups/5/devices/0000:00:1f.0
/sys/kernel/iommu_groups/5/devices/0000:00:1f.3
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/4/devices/0000:00:04.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
root@clr-d8a5d96d9a844656bcab094780f420b2 ~ # readlink /sys/kernel/iommu_groups/3/devices/0000:00:03.0
../../../../devices/pci0000:00/0000:00:03.0

This device we assigned through VFIO is now in its own IOMMU groups and can be assigned using VFIO in L1 to a L2 VM.

@mcastelino
Copy link
Author

Very nice!
I tried to do that for SRIOV VF device (assign it to L2 VM), but I get these errors:

[   95.767571] iavf 0000:05:00.0: Failed to communicate with PF; waiting before retry
[   48.757624] iavf 0000:05:00.0: Admin queue command never completed

Any idea why?

Where do you see this error. Also I am surprised if this happens inside the VM. Are you passing in the PF to the VM. Ideally you should pass in the a VF that was created.

@staysh
Copy link

staysh commented Sep 11, 2021

Do you know of a way to pass those qemu command line options via virt-install?

Alternatively could you post XML of the devices created with those arguments?

@ormergi
Copy link

ormergi commented Nov 22, 2021

Where do you see this error. Also I am surprised if this happens inside the VM. Are you passing in the PF to the VM. Ideally you should pass in the a VF that was created.

I saw it inside L2 guest VM dmesg log, once I bumped the VM RAM memory it didnt occurred again.

@mikeyo
Copy link

mikeyo commented Feb 5, 2024

Very late to the party with this but I managed to get this working on an Intel 11900K build running KVM/UNRAID/PROXMOX using the following Qemu args -

<qemu:arg value='-machine'/>
<qemu:arg value='kernel-irqchip=split'/>
<qemu:arg value='-device'/>
<qemu:arg value='intel-iommu,intremap=on,caching-mode=on'/>

However, the same args do not expose the devices for passthrough on my AMD 3950x build.
What do I need to change for AMD?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment