Skip to content

Instantly share code, notes, and snippets.

@peppergrayxyz
Last active July 19, 2025 16:53
Show Gist options
  • Save peppergrayxyz/fdc9042760273d137dddd3e97034385f to your computer and use it in GitHub Desktop.
Save peppergrayxyz/fdc9042760273d137dddd3e97034385f to your computer and use it in GitHub Desktop.
QEMU with VirtIO GPU Vulkan Support

QEMU with VirtIO GPU Vulkan Support

With its latest reales qemu added the Venus patches so that virtio-gpu now support venus encapsulation for vulkan. This is one more piece to the puzzle towards full Vulkan support.

An outdated blog post on clollabora described in 2021 how to enable 3D acceleration of Vulkan applications in QEMU through the Venus experimental Vulkan driver for VirtIO-GPU with a local development environment. Following up on the outdated write up, this is how its done today.

Definitions

Let's start with the brief description of the projects mentioned in the post & extend them:

  • QEMU is a machine emulator
  • VirGL is an OpenGL driver for VirtIO-GPU, available in Mesa.
  • Venus is an experimental Vulkan driver for VirtIO-GPU, also available in Mesa.
  • Virglrenderer is a library that enables hardware acceleration to VM guests, effectively translating commands from the two drivers just mentioned to either OpenGL or Vulkan.
  • libvirt is an API for managing platform virtualization
  • virt-manager is a desktop user interface for managing virtual machines through libvirt

Merged Patches:

Work in progress:

Prerequisites

Make sure you have the proper version installed on the host:

  • linux kernel >= 6.13 built with CONFIG_UDMABUF
  • working Vulkan and kvm setup
  • qemu >= 9.2.0
  • virglrenderer with enabled venus support
  • mesa >= 24.2.0

You can verify this like so:

$ uname -r
6.13.0
$ ls /dev/udmabuf
/dev/udmabuf
$ ls /dev/kvm
/dev/kvm
$ qemu-system-x86_64 --version
QEMU emulator version 9.2.0
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers

Check your distros package sources how they build virglrenderer.

For Vulkan to work you need the proper drivers to be installed for your graphics card.

To verfiy your setup, install vulkan-tools. Make sure mesa >= 24.2.0 and test vkcube:

$ vulkaninfo --summary | grep driverInfo
	driverInfo         = Mesa 24.2.3-1ubuntu1
	driverInfo         = Mesa 24.2.3-1ubuntu1 (LLVM 19.1.0)
...
$ vkcube
Selected GPU x: ..., type: ...

Building qemu

If your distro doesn't (yet) ship and updated version of qemu, you can build it yourself from source:

wget https://download.qemu.org/qemu-9.2.0.tar.xz
tar xvJf qemu-9.2.0.tar.xz
cd qemu-9.2.0
mkdir build && cd build
../configure --target-list=x86_64-softmmu  \
  --enable-kvm                 \
  --enable-opengl              \
  --enable-virglrenderer       \
  --enable-gtk                 \
  --enable-sdl
make -j4

The configuration step will throgh errors if packages are missing. Check the qemu wiki for further info what to install: https://wiki.qemu.org/Hosts/Linux

Create and run an image for QEMU

Create an image & fetch the distro of your choice:

Host

ISO=ubuntu-24.10-desktop-amd64.iso  
wget https://releases.ubuntu.com/oracular/ubuntu-24.10-desktop-amd64.iso  

IMG=ubuntu-24-10.qcow2
qemu-img create -f qcow2 $IMG 16G

Run a live version or install the distro

qemu-system-x86_64                                               \
    -enable-kvm                                                  \
    -M q35                                                       \
    -smp 4                                                       \
    -m 4G                                                        \
    -cpu host                                                    \
    -net nic,model=virtio                                        \
    -net user,hostfwd=tcp::2222-:22                              \
    -device virtio-vga-gl,hostmem=4G,blob=true,venus=true        \
    -vga none                                                    \
    -display gtk,gl=on,show-cursor=on                            \
    -usb -device usb-tablet                                      \
    -object memory-backend-memfd,id=mem1,size=4G                 \
    -machine memory-backend=mem1                                 \
    -hda $IMG                                                    \
    -cdrom $ISO                                                  

Adjust the parameters accordingly:

  • smp: number of cpu cores
  • m: RAM
  • hostmem,size: VRAM

Guest

Install mesa-utilites and vulkan-tools to test the setup:

$ glxinfo -B
$ vkcube
Selected GPU x: ..., type: ...

If the deive is llvmpipe somehting is wrong. The device should be virgl (...).

Troubleshooting

  • (host) add -d guest_errors to show error messages from the guest
  • (guest) try installing vulkan virtio drivers and mesa
  • check the original blog post

Ubuntu 24.10

This is how you do it on Ubuntu

kernel

Install mainline: https://github.com/bkw777/mainline

sudo add-apt-repository ppa:cappelikan/ppa
sudo apt update
sudo apt install mainline

find the latest kernel (>= 6.13), at the time of writing 6.13 is a release candidate, so include those:

$ mainline check --include-rc

Install kernel:

$ sudo mainline install 6.13-rc1

Verfify installed kernels:

$ mainline list-installed
mainline 1.4.10
Installed Kernels:
linux-image-6.11.0-13-generic
linux-image-generic-hwe-24.04
linux-image-unsigned-6.13.0-061300rc1-generic
mainline: done

reboot into new kernel

verify running kernel

$ uname -r
6.13.0-061300rc1-generic

virglrenderer

the ubuntu package is not compiled with the proper flags.

If installed remove it: $ sudo apt-get remove libvirglrenderer-dev

download, build & install from source with venus enabled

wget    https://gitlab.freedesktop.org/virgl/virglrenderer/-/archive/1.1.0/virglrenderer-1.1.0.tar.gz
sudo apt-get install python3-full ninja-build libvulkan-dev libva-dev
python3 -m venv venv
venv/bin/pip install meson
venv/bin/meson build -Dvideo=true -Dvenus=true
ninja -C build
ninja install

qemu

install qemu >= 9.2.0, at the time of writing ubuntu has not yet packaged it

Install build depdencies: https://wiki.qemu.org/Hosts/Linux

sudo apt-get install build-essential pip libslirp-dev slirp
sudo apt-get install git libglib2.0-dev libfdt-dev libpixman-1-dev zlib1g-dev ninja-build
sudo apt-get install git-email
sudo apt-get install libaio-dev libbluetooth-dev libcapstone-dev libbrlapi-dev libbz2-dev
sudo apt-get install libcap-ng-dev libcurl4-gnutls-dev libgtk-3-dev
sudo apt-get install libibverbs-dev libjpeg8-dev libncurses5-dev libnuma-dev
sudo apt-get install librbd-dev librdmacm-dev
sudo apt-get install libsasl2-dev libsdl2-dev libseccomp-dev libsnappy-dev libssh-dev
sudo apt-get install libvde-dev libvdeplug-dev libvte-2.91-dev libxen-dev liblzo2-dev
sudo apt-get install valgrind xfslibs-dev 
sudo apt-get install libnfs-dev libiscsi-dev

build and run as described

virt-manager

-- work in progress --

Currently this is work in progress, so there is no option to add vulkan support in virt-manager. There are no fields to configure this. Also xml doesnt work, because libvirt doesn't know about these options either, so xml validation fails. There is however an option for QEMU command-line passthrough which bypasses the validation.

If you setup a default machine with 4G of memory, you can do this:

  <qemu:commandline>
    <qemu:arg value="-device"/>
    <qemu:arg value="virtio-vga-gl,hostmem=4G,blob=true,venus=true"/>
    <qemu:arg value="-object"/>
    <qemu:arg value="memory-backend-memfd,id=mem1,size=4G"/>
    <qemu:arg value="-machine"/>
    <qemu:arg value="memory-backend=mem1"/>
    <qemu:arg value="-vga"/>
    <qemu:arg value="none"/>
  </qemu:commandline>

Which gives this error:

qemu-system-x86_64: virgl could not be initialized: -1

Changing the number from 4G to 4194304k (same as memory) leds to this error:

qemu-system-x86_64: Spice: ../spice-0.15.2/server/red-qxl.cpp:435:spice_qxl_gl_scanout: condition `qxl_state->gl_draw_cookie == GL_DRAW_COOKIE_INVALID' failed

to be further investigated.

@zzyiwei
Copy link

zzyiwei commented Apr 11, 2025

Based on the logs for (5), comparing to (3) and (4), your setup does need the patched host kernel to mitigate the EPT PAT issue. Previously the issue is partially hidden behind Venus perf options.

After patching host kvm, at least case (5) would be running fine. Meanwhile, case (6) is the one I'm also curious about with the patched kernel.

No messages were logged in the host dmesg. However, no messages were logged when running without the environment variables either. I have noticed the messages from NVIDIA driver are sporadic and do not correspond to every run of vkcube. Some runs, usually the first two or three after a host reboot, log a message on the host but the rest don't, regardless of environment variables. Rebooting the guest does not seem to have any effect on inducing fresh driver messages on the host.

This is really helpful info! If the same is also observed with the patched kvm, then the issue resides in some level of proper VK->GL external memory import. Another follow-up experiment to do would be redirect x server to use Zink-on-Venus as the gbm backing, so that we can tell whether the brokenness with hw scanout images are due to VirGL or host GL driver issues ; )

@myrslint
Copy link

myrslint commented Apr 11, 2025

Based on the logs for (5), comparing to (3) and (4), your setup does need the patched host kernel to mitigate the EPT PAT issue. Previously the issue is partially hidden behind Venus perf options.

After patching host kvm, at least case (5) would be running fine. Meanwhile, case (6) is the one I'm also curious about with the patched kernel.

No messages were logged in the host dmesg. However, no messages were logged when running without the environment variables either. I have noticed the messages from NVIDIA driver are sporadic and do not correspond to every run of vkcube. Some runs, usually the first two or three after a host reboot, log a message on the host but the rest don't, regardless of environment variables. Rebooting the guest does not seem to have any effect on inducing fresh driver messages on the host.

This is really helpful info! If the same is also observed with the patched kvm, then the issue resides in some level of proper VK->GL external memory import. Another follow-up experiment to do would be redirect x server to use Zink-on-Venus as the gbm backing, so that we can tell whether the brokenness with hw scanout images are due to VirGL or host GL driver issues ; )

Firstly, I want to thank you for following up on this issue and investing so much time and effort into resolving it. I very much appreciate your help. Below, are my findings based on your pointers.

I built the patched kernel again, installed it on the host, and rebooted the host. This is 6.14.2 kernel built from kernel.org source tarball using the default Arch kernel config, with many unnecessary device drivers disabled to make the build take less time. The only addition was PATCH-5-5-KVM-VMX-Always-honor-guest-PAT-on-CPUs-that-support-self-snoop. For my CPU (i7-3770k) /proc/cpuinfo includes the following line:

flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d

The flag ss here indicates the CPU supports self-snoop.

Installing the patched kernel on the host made the triggering of host dmesg error message consistent. Any run of vkcube on the guest which resulted in the messages MESA-VIRTIO: debug: stuck in fence wait with iter at N on the guest, also resulted in a corresponding error in the host dmesg reading NVRM: Xid (PCI:0000:01:00): 69, pid=1737, name=vkr-ring-9, Class Error: ChId 002d, Class 0000c197, Offset 00000d78, Data 00000024, ErrorCode 0000009c. I confirmed this through numerous runs.

I ran all the commands you provided again and collected the logs in this gist. These can be summarized as:

  1. As per your prediction, commands 3, 4, and 5 do result in a properly spinning cube being rendered. These run rather sluggishly and heavily engage the CPU while leaving the GPU mostly idle (based on nvtop observation). My understanding is that some aspect of the work is being done in software on the CPU, rather than running on the GPU.
  2. Commands 1, 2, and 6 result in the black window with no spinning cube. These produce the usual error message on the guest and also reliably trigger the NVIDIA driver error message in the host dmesg.

All these tests were performed from a terminal (foot) running on the compositor sway. Running gamescope -- vkcube from the console i.e., using gamescope's DRM backend which prior to patching the host kernel resulted in the semaphore wait error message instead resulted in QEMU's GTK frontend showing a blank screen and briefly displaying the error message Display output is not active. From this point on the VM seems to become non-responsive. Pressing [Escape], to exit vkcube if possible, and then blindly attempting a soft reboot does not help. A hard reset of the VM using QEMU facilities is required.

For the follow-up tests, my understanding is that Zink is the name for running OpenGL programs on top of a Vulkan graphics stack with a translation layer from OpenGL to Vulkan in-between. Some online reading and this small wrapper seemed to indicate I should have prefixed the commands with __GLX_VENDOR_LIBRARY_NAME=mesa MESA_LOADER_DRIVER_OVERRIDE=zink GALLIUM_DRIVER=zink LIBGL_KOPPER_DRI2=1 to set those environment variables.

To test Zink-on-Venus based on above understanding, I tried running sway (from the console, with and without WLR_RENDERER=vulkan, the default renderer being gles2), eglgears_x11, gamescope (from the console), glxgears, and vkgears on the guest with these variables prefixed. The logs, where applicable, are collected in this gist. My description of what happened is:

  1. sway ran but presented only a blank screen with the brief Display output is not active message. This could be recovered from by blindly exiting sway. Upon exiting the console would be displayed again normally.
  2. gamescope ran with the same error as sway and this could not be recovered from in any way other than resetting the VM.
  3. eglgears_x11 run on top of sway (started normally from console) displayed a black window and on the terminal printed the errors seen in the collected logs. This (MESA: error: CreateSwapchainKHR failed with VK_ERROR_OUT_OF_HOST_MEMORY) was, to me, a new type of error message but might be what's underlying the previous errors as well.
  4. glxgears displayed a black window and then quickly exited with the errors seen in the collected logs. These consisted of the same Vulkan error as with eglgears_x11 as well as a more specific GLX error.
  5. vkgears expectedly demonstrated the same problem as other Vulkan demo programs. Zink (OpenGL atop Vulkan) seemingly had no hand in that.
  6. Notably, runs of OpenGL/EGL programs did not trigger NVIDIA driver error messages on the host. Any run of vkgears, however, did do so.

Next, I tried a test not listed above. On the host I installed the Arch Linux vulkan-swrast package and forced QEMU to use llvmpipe software rasterizer by specifying VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.x86_64.json in the command invoking qemu-system-x86_64. Naturally, this resulted in vulkaninfo on the guest reporting llvmpipe (LLVM 19.1.7, 256 bits) as its GPU. With that change I could run vkcube without any environment variables. This displayed the spinning cube but, in addition to the expected high CPU usage, the cube's spinning was juddery and erratic, and some short-lived, noisy, black-colored artefacts would repeatedly appear and disappear across a relatively fixed rectangle within the vkcube window. In the terminal QEMU was run from and logging to this error message could be read: virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200.

As an aside, I think I should mention that guest-to-host OpenGL encapsulation (without Zink) does not work entirely well on my hardware and software combination either. Demo programs such as eglgears_x11 and glxgears run successfully--achieving smooth motion and consistent frame rates--and even glmark2 runs its default benchmark scenes without issues. However, more complex OpenGL benchmarks such as Unigine Heaven and Unigine Valley, while producing commendable frame rates, also produce highly noticeable visual glitches such as momentarily disappearing and reappearing parts of some scenes, occasional vertex explosions, and occasional duplicated and/or misplaced scene objects. These glitches do not appear consistently in every cycle of the same run of the same benchmark on the guest VM, and they don't appear at all if the same benchmark is run on the host. Nonetheless, the glitches have steadily been reduced and the output quality improved over the past few months that I have tried these. It may be of note that although OpenGL rendering is clearly being offloaded to and done by the GPU there still is quite high CPU usage, entirely consuming 3-4 threads of a 4C/8T CPU. In some cases, such as the Refresh2025 benchmark (the OpenGL version since the Vulkan one only stalls as with vkcube) increasing rendered object count above some threshold seems to result in GPU utilization decreasing in proportion to the CPU's inability to keep up i.e., benchmark performance becomes CPU-bound.

@zzyiwei
Copy link

zzyiwei commented Apr 12, 2025

Firstly, I want to thank you for following up on this issue and investing so much time and effort into resolving it. I very much appreciate your help. Below, are my findings based on your pointers.

I'm the one to say thanks 🙏 I haven't used any nv gpu for years so mostly rely on community folks with nv setups for these sorts of investigations. Thanks again for bearing with me >_<

Installing the patched kernel on the host made the triggering of host dmesg error message consistent. Any run of vkcube on the guest which resulted in the messages...on the guest, also resulted in a corresponding error in the host dmesg...I confirmed this through numerous runs.

No random behaviors then. I was suspecting there existed two issues tangled here: Intel EPT issue and wsi issue. Before guest par being honored, sometimes one issue could shield the other due to timing +/- VN_PERF options.

I ran all the commands you provided again and collected the logs in this gist...

As per your prediction, commands 3, 4, and 5 do result in a properly spinning cube being rendered.

Cool! Based on the observations, now I have a rough idea of what has gone wrong. Let me do some more homework before knocking myself out, or proposing any workarounds for nv setup.

...These run rather sluggishly and heavily engage the CPU while leaving the GPU mostly idle (based on nvtop observation). My understanding is that some aspect of the work is being done in software on the CPU, rather than running on the GPU.

That's expected. vkcube uses prerecorded cmds so normally it's just throttled acquired and present calls at runtime from the app side, while the x server is doing composition with GL driver. The wsi debug option used has engaged a cpu buffer to share with the x server instead of direct/zero-copy device memory sharing, ending up with heavier cpu usage.

@zzyiwei
Copy link

zzyiwei commented Apr 12, 2025

Could you help apply this hack to your guest mesa, and see if vkcube without any env vars works? It forces venus to take the prime blit path, and might not work...

I suspect the wsi side issue is the proprietary nv vulkan driver not waiting for implicit fence attached to the external memory. It might have such support for native wsi extension, but venus layers wsi atop external memory. The implicit fence not being properly waited is likely the one from host gl sampling from the venus wsi image (guest x server doing composition). That could explain why MESA_VK_WSI_DEBUG=sw,buffer makes things work. Currently I don't have any good way to workaround this because that implicit fence is entirely unknown to guest venus. If forcing prime blit can't hide the issue, I'll draft a workaround in host venus (vkr) to explicitly wait for the implicit fence before submitting to the nv driver.

@thesword53
Copy link

I have the same issue as @myrslint with vkcube, vkgears and vkmark (MESA-VIRTIO: debug: stuck in fence wait with iter at 1024), but DXVK and VKD3D games are working fine. I can also run GTK4 application with the Vulkan renderer.
GPU: RTX 2080 SUPER
CPU: AMD Ryzen 7 3700X

The following logs on host mean Graphics Engine class error according to https://docs.nvidia.com/deploy/xid-errors/index.html.

NVRM: Xid (PCI:0000:01:00): 69, pid=2608, name=vkr-ring-9, Class Error: ChId 0035, Class 0000c197, Offset 00000d78, Data 00000024, ErrorCode 0000009c
NVRM: Xid (PCI:0000:01:00): 69, pid=2636, name=vkr-ring-9, Class Error: ChId 0035, Class 0000c197, Offset 00000d78, Data 00000024, ErrorCode 0000009c

@zzyiwei
Copy link

zzyiwei commented Apr 12, 2025

Your amd cpu + nv dgpu setup doesn't have the pat issue so is only affected by the wsi side issue.

...but DXVK and VKD3D games are working fine. I can also run GTK4 application with the Vulkan renderer.

They happen to hide the synchronization issue potentially because they have reasonable frame pacing and they all have certain amount of cpu workloads before making the submission that involves the wsi image, which gives enough time for the implicit fence attached by the compositor to signal.

Two experiments against vkcube/vkmark/etc can be done to confirm the theory:

  1. Override to increase the x11 swapchain length to something much bigger with vk_x11_override_min_image_count env var.
  2. Add some sleep at the end of vn_AcquireNextImage2KHR.

@myrslint
Copy link

myrslint commented Apr 12, 2025

Could you help apply this hack to your guest mesa, and see if vkcube without any env vars works? It forces venus to take the prime blit path, and might not work...

I suspect the wsi side issue is the proprietary nv vulkan driver not waiting for implicit fence attached to the external memory. It might have such support for native wsi extension, but venus layers wsi atop external memory. The implicit fence not being properly waited is likely the one from host gl sampling from the venus wsi image (guest x server doing composition). That could explain why MESA_VK_WSI_DEBUG=sw,buffer makes things work. Currently I don't have any good way to workaround this because that implicit fence is entirely unknown to guest venus. If forcing prime blit can't hide the issue, I'll draft a workaround in host venus (vkr) to explicitly wait for the implicit fence before submitting to the nv driver.

I recompiled Mesa from latest commit (676e26aed58) on the main branch using the AUR package previously mentioned. Thankfully, the PKGBUILD also has a section that applies patch files so I added this patch from your fork to the sources array as vn-force-prime-blit.patch which applied cleanly against the checked-out tree and compiled fine.

I installed the resulting Mesa package in the guest. The current software configuration consists of latest stock Arch Linux on the host and guest, with a patched kernel on the host and patched Mesa on the guest. With this configuration, vkcube runs on the guest without any environment variables. There is a short period of a black window being displayed followed by the spinning cube being displayed. There are no artefacts but the spinning is somewhat erratic. vkgears and vkmark, however, exhibit the same symptoms as before and print the same error message they did previously. I have uploaded a screen recording of the VM window to give a sense of the erratic motion mentioned. It also contains the terminal window showing the programs being run and the error messages.

@zzyiwei
Copy link

zzyiwei commented Apr 15, 2025

@myrslint @thesword53 hi, would you like to give https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34516 a try on your nvidia setup? with that, venus should have properly handled the implicit compositor release fence.

@myrslint
Copy link

@myrslint @thesword53 hi, would you like to give https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34516 a try on your nvidia setup? with that, venus should have properly handled the implicit compositor release fence.

If I have understood your instructions correctly, I recompiled Mesa from latest commit (b1af5780d13) with only MR #34516 applied as a patch. I installed the resulting patch in the guest and ran vkcube. With this, vkcube once again showed only a black window and the same error messages as before were logged in both guest and host.

The log from running VN_DEBUG=all VN_PERF=all vkcube is found in this gist.

@zzyiwei
Copy link

zzyiwei commented Apr 15, 2025

Thanks! What about together with the prior hack (force buffer blit) and see if the mentioned erratic motion is improved?

There might exist multiple issues on the wsi path. Previously with the PAT fix, (5) working but (6) not suggested host Nvidia driver has issues dealing with linear image import.

@myrslint
Copy link

Thanks! What about together with the prior hack (force buffer blit) and see if the mentioned erratic motion is improved?

There might exist multiple issues on the wsi path. Previously with the PAT fix, (5) working but (6) not suggested host Nvidia driver has issues dealing with linear image import.

I did guess I had misunderstood your instructions.

This time I recompiled Mesa from latest commit (09896ee79e3 as of when I pulled from FDO) with both vn-force-prime-blit and vn-fix-acquire-fence applied as patches to that commit. Then, I installed the resulting package in the guest. The host kernel is the same patched 6.14.2 Arch kernel as before. The guest kernel is a stock 6.14.2 Arch kernel as before.

In this gist logs from running the commands 1-6 are collected and numbered. Visually, 1-5 result in a spinning cube being drawn. The ones with sw in MESA_VK_WSI_DEBUG generally appear more correct but much slower. The ones without are quicker but demonstrate what appears to be abrupt changes of the spinning speed or skipped frames every so often. They are nonetheless somewhat improved over the previous case of applying only vn-force-prime-blit.

6 still does not result in a spinning cube, only a black window, and triggers the host dmesg error message from NVIDIA driver.

I have also made a screen recording of a VN_DEBUG=all VN_PERF=all vkcube run to hopefully give a sense of the cube's motion.

@zzyiwei
Copy link

zzyiwei commented Apr 15, 2025

One more question before I go back to do more homework: do you see improvements with just vkcube (w/o any additional env vars)? as compared to the video attached on your previous #gistcomment-5537654?

The video on your latest #gistcomment-5541821 looks fine to me already. The occasional janks likely came from the compositor stack backpressure, but the out-of-order issue is gone per my visual check. Just need to confirm this ; )

@myrslint
Copy link

One more question before I go back to do more homework: do you see improvements with just vkcube (w/o any additional env vars)? as compared to the video attached on your previous #gistcomment-5537654?

The video on your latest #gistcomment-5541821 looks fine to me already. The occasional janks likely came from the compositor stack backpressure, but the out-of-order issue is gone per my visual check. Just need to confirm this ; )

Yes, it has improved significantly when run without any environment variables. It has gone from no cube and a black window at the very beginning to a proper rendering of the cube and no glaringly erratic motion. The issues that stand out are the following:

  1. vkcube --wsi wayland still stalls with the fence wait error, even though the same command runs fine on the host.
  2. With XCB WSI on the guest, vkcube runs and displays fine but there is a brief display of a black background before the rendering starts. This does not happen on the host with either XCB or Wayland WSI.
  3. The cube's motion, while significantly better than with just the prime blit patch, still is not smooth. As you have pointed out, it does not seem to go back and forth anymore but occasionally speeds up and slows down.

@zzyiwei
Copy link

zzyiwei commented Apr 16, 2025

For those who encounter stuck in fence wait, adding environment variable

VN_PERF=no_fence_feedback

to /etc/environment might help.

Just in case some folks are affected by this, I happen to realize that the stock mesa driver from the stable debian bookworm is with Mesa 22.3.6 which contains an Intel ANV bug hit by Venus sync feedback optimization path. So if you see the issue with Intel iGPU setup, you can compile a separate ANV driver from the latest mesa release and the issue will be gone with optimized Venus performance.

@zzyiwei
Copy link

zzyiwei commented Apr 16, 2025

  1. With XCB WSI on the guest, vkcube runs and displays fine but there is a brief display of a black background before the rendering starts. This does not happen on the host with either XCB or Wayland WSI.

Hi @myrslint , will the black period go away with MESA_SHADER_CACHE_DISABLE=true? if so, that's due to the slow filesystem ops for shader disk cache. The initial loading occurs during VkDevice creation time.

@myrslint
Copy link

myrslint commented Apr 16, 2025

  1. With XCB WSI on the guest, vkcube runs and displays fine but there is a brief display of a black background before the rendering starts. This does not happen on the host with either XCB or Wayland WSI.

Hi @myrslint , will the black period go away with MESA_SHADER_CACHE_DISABLE=true? if so, that's due to the slow filesystem ops for shader disk cache. The initial loading occurs during VkDevice creation time.

Hello there again 🙂

Adding the environment variable MESA_SHADER_CACHE_DISABLE=true to vkcube runs does not make that period of black window display go away. The VM's storage uses virtio, relatively fast virtual storage driver, and is backed by a qcow2 file on a relatively fast SSD so I doubt disk operations would be a bottleneck anywhere.

However, as seen in the debug logs during the time a black window is displayed before the cube is shown messages indicate that a swapchain is created three times (in parallel?) and destroyed for some reason before a fourth successful one is finally created.

@phreer
Copy link

phreer commented Apr 25, 2025

Hi @zhangyiwei, thanks a lot for your information in this thread!

The workaround VN_PERF=no_fence_feedback looks like to cause disordered presence of the guest images, i.e., an out-of-date image could show up again after presenting the current frame... Is this an expected behavior? IIUC, adding this environment variable should only introduce performance pushiment without any impact to the functionality. Am I wrong?

I looked through the communications on this patch set: https://lore.kernel.org/all/20240309010929.1403984-1-seanjc@google.com/, but unfortunately I am still unclear about the reason why this patch (KVM: VMX: Always honor guest PAT on CPUs that support self-snoop) is mandatory for venus to work properly with an Intel CPU. My murky understanding is that without this patch, PAT will not be honnored, leading to memory shared by host and guest being mapped as unexpected cache type (UC?) and data written by host not being visible to the guest... Could you please share more insights about the cause?

@zzyiwei
Copy link

zzyiwei commented Apr 25, 2025

Hi @zhangyiwei, thanks a lot for your information in this thread!

The workaround VN_PERF=no_fence_feedback looks like to cause disordered presence of the guest images, i.e., an out-of-date image could show up again after presenting the current frame... Is this an expected behavior? IIUC, adding this environment variable should only introduce performance pushiment without any impact to the functionality. Am I wrong?

That's not expected behavior, and very likely it'd be broken with or without on your setup (more details?). You may give https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34516 a try to see if it helps.

I looked through the communications on this patch set: https://lore.kernel.org/all/20240309010929.1403984-1-seanjc@google.com/, but unfortunately I am still unclear about the reason why this patch (KVM: VMX: Always honor guest PAT on CPUs that support self-snoop) is mandatory for venus to work properly with an Intel CPU. My murky understanding is that without this patch, PAT will not be honnored, leading to memory shared by host and guest being mapped as unexpected cache type (UC?) and data written by host not being visible to the guest... Could you please share more insights about the cause?

No, it's the other way: guest cpu uploads end up not visible to host gpu. The missing background can be found here: https://patchwork.kernel.org/project/dri-devel/cover/20200213213036.207625-1-olvaffe@gmail.com/

@AlxHnr
Copy link

AlxHnr commented May 8, 2025

The commands in the original gist for passing venus="true" to qemu no longer work in recent versions of libvirt. Below you will find an updated snippet to copy-paste. Note that it is still broken, as I'm getting the same "Display output is not active", which the original author had.

In virt-manager's "Details" view:

  • Memory -> Enable shared memory
  • Display Spice -> Listen type -> None
  • Display Spice -> OpenGL ✔️
  • Video Virtio -> 3d acceleration ✔️
  • Overview -> XML tab (make sure "editable" is enabled in the preferences)

Copy paste this before the closing </domain>:

  <override xmlns="http://libvirt.org/schemas/domain/qemu/1.0">
    <device alias="video0">
      <frontend>
        <property name="blob" type="bool" value="true"/>
        <property name="venus" type="bool" value="true"/>
        <property name="hostmem" type="unsigned" value="4194304"/>
      </frontend>
    </device>
  </override>

@zzyiwei
Copy link

zzyiwei commented May 9, 2025

Unfortunately I'm only getting

error: kvm run failed Bad address
RAX=0000000000000000 RBX=00007ffcca3ae490 RCX=0000000000000000 RDX=00007fd754f2dff0
...
EFER=0000000000000d01
Code=f3 41 0f 11 07 f3 0f 11 0a 4c 01 e2 f3 0f 11 0a 66 0f ef c9 <f3> 42 0f 11 04 22 8d 14 08 44 01 e8 66 0f ef c0 01 c8 01 d2 01 c0 f3 0f 2a ca 8b 54 24 50

as soon as the desktop envrionment starts :-(

Any ideas? Followed the tutorial 1:1

This is very likely transparent huge page issue via KVM that has been fixed on newer kernel (6.13 or later) which has https://lore.kernel.org/all/20241010182427.1434605-1-seanjc@google.com/. Or you can recompile your current host kernel with CONFIG_TRANSPARENT_HUGEPAGE disabled to give it another try.

@amshafer
Copy link

I'll draft a workaround in host venus (vkr) to explicitly wait for the implicit fence before submitting to the nv driver.

@zhangyiwei Sorry if I missed it in this thread but did you ever try this workaround?

In gdb I see the virtio app issue a WaitForFences, but the vkWaitForFences issued in vkr_dispatch_vkWaitForFences returns immediately and succeeds. I'm a little confused why the app is stuck waiting in vn_update_sync_result if the server side command already completed? I'm guessing I am not understanding something about the command dispatch correctly.

@zzyiwei
Copy link

zzyiwei commented May 13, 2025

I'll draft a workaround in host venus (vkr) to explicitly wait for the implicit fence before submitting to the nv driver.

@zhangyiwei Sorry if I missed it in this thread but did you ever try this workaround?

In gdb I see the virtio app issue a WaitForFences, but the vkWaitForFences issued in vkr_dispatch_vkWaitForFences returns immediately and succeeds. I'm a little confused why the app is stuck waiting in vn_update_sync_result if the server side command already completed? I'm guessing I am not understanding something about the command dispatch correctly.

It turns out no workaround needed on the renderer side. The guest virtgpu fence has the correct payload installed. So on drivers without implicit fencing support, the issue can be mitigated with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34516. I just closed that MR since I figured a better way to do it within venus. Would be siimilar but no need to touch common vk there so that review can be faster.

@zzyiwei
Copy link

zzyiwei commented May 14, 2025

@amshafer you still have to force prime blit via venus. Here's the hack at this point. Will make it the default for venus on nv proprietary later.

@pakin1
Copy link

pakin1 commented May 14, 2025

Works like a charm with QEMU command line.
Now I need to figure out how to use this with virt manager.

EDIT:

LG GRAM 2022 - 17Z90Q-G.AD78Y 
Linux 6.14.5-300.fc42.x86_64
OS:  Aurora Version: latest-42.20250512.2
CPU: 12th Gen Intel(R) Core(TM) i7-1260P
IGPU: Intel Iris Xe Graphics @ 1.40 GHz
MESA : mesa-dri-drivers-25.0.2-1.fc42.x86_64
Virglrenderer: virglrenderer-1.1.0-2.fc42.x86_64

@myrslint anything else I forgot to mention that could be useful for others?
Regards

@myrslint
Copy link

myrslint commented May 14, 2025

Works like a charm with QEMU command line. 6.14.5-300.fc42.x86_64

Now I need to figure out how to use this with virt manager.

It would be nice if any users who report a working or non-working result also included the relevant parts of their hardware and software configuration e.g., CPU make and model, GPU make and model, Linux distribution on the host and the guest, versions of Mesa and virglrenderer and their patch states.

@myrslint anything else I forgot to mention that could be useful for others? Regards

Looks comprehensive to me. Thank you!

@amshafer
Copy link

@zhangyiwei unfortunately even with both of those (34516 and force blit) I still see the same "stuck in fence wait" others have reported. Are those known to fix that issue? I'm happy to help test whatever your venus-only follow up solution is if that helps.

@zzyiwei
Copy link

zzyiwei commented May 14, 2025

@zhangyiwei unfortunately even with both of those (34516 and force blit) I still see the same "stuck in fence wait" others have reported. Are those known to fix that issue? I'm happy to help test whatever your venus-only follow up solution is if that helps.

If VN_PERF=no_fence_feedback env var can workaround that, you might be missing the host kvm bits I mentioned on https://docs.mesa3d.org/drivers/venus.html for NV, and for the final shape of the reland, see https://gitlab.freedesktop.org/mesa/mesa/-/issues/12806#note_2861587.

@zzyiwei
Copy link

zzyiwei commented May 14, 2025

@zhangyiwei unfortunately even with both of those (34516 and force blit) I still see the same "stuck in fence wait" others have reported. Are those known to fix that issue? I'm happy to help test whatever your venus-only follow up solution is if that helps.

In addition, that force blit hack is for x11 only. The proper way to force it via venus for both x11 and wayland has been added in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34965.

@amshafer
Copy link

I'm actually reproducing this without any QEMU/VM/etc setup at all, running virgl_test_server --venus on the NVIDIA discrete GPU and vkcube --wsi wayland on virtio. Someone from this thread had pointed me at that and I was investigating to see if we have any NVIDIA bugs on our driver side, although so far I don't see anything obvious. I think because I'm not using QEMU I don't need the KVM patch you mentioned?

VN_PERF=no_fence_feedback does not help my case unfortunately. Manually turning on prime blit or using 34965 also did not help.

@zzyiwei
Copy link

zzyiwei commented May 14, 2025

I'm actually reproducing this without any QEMU/VM/etc setup at all, running virgl_test_server --venus on the NVIDIA discrete GPU and vkcube --wsi wayland on virtio. Someone from this thread had pointed me at that and I was investigating to see if we have any NVIDIA bugs on our driver side, although so far I don't see anything obvious. I think because I'm not using QEMU I don't need the KVM patch you mentioned?

VN_PERF=no_fence_feedback does not help my case unfortunately. Manually turning on prime blit or using 34965 also did not help.

You better test with VM setup instead, because it's purely impl defined behavior for vtest. Venus over vtest can only work if host driver supports dma-buf export AND the driver's hook for dma-buf mmap sets the correct pat entry. The reason is dma-buf mmap uapi has explicit flush/invalidate call sequence so some implementations just map those cached non-coherent since the cache can be handled at the dma-buf flush/invalidate api boundary. e.g. Venus on Intel MTL over vtest won't work for this reason.

There's no plan to extend vtest protocol to support opaque fd export for device memory, so we are unable to steer to use Vulkan vkMapMemory to rely on what vk api guarantees for coherent memory type.

Once you have a VM setup, since you have control over the host driver, you can also try with VK_EXT_external_memory_dma_buf extension enabled/disabled from NV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment