Skip to content

Instantly share code, notes, and snippets.

@abaire
Last active July 5, 2022 18:17
Show Gist options
  • Save abaire/8345ff167c4d89d4ff4a3a78e87b9ae4 to your computer and use it in GitHub Desktop.
Save abaire/8345ff167c4d89d4ff4a3a78e87b9ae4 to your computer and use it in GitHub Desktop.
xemu GPU + CPU draws result in incorrect ouput

Context

xemu's current (at least v0.7.55 and below) handling of GPU draws interleaved with direct CPU-based manipulation of (guest) VRAM lead to incorrect results.

See xemu-project/xemu#652

HW behavior

The nv2a uses the concept of two "surfaces" to control the output of 3D GPU-based draws, a color surface and a zeta (generally depth or depth+stencil) surface. These are configured via commands such as NV097_SET_SURFACE_PITCH, NV097_SET_SURFACE_FORMAT, NV097_SET_CONTEXT_DMA_COLOR, and NV097_SET_SURFACE_COLOR_OFFSET (amongst others).

In practice, the surface configuration only affects how actual GPU draws are translated into VRAM, and thus may be combined with CPU-based direct manipulation of VRAM. For example, in Pirates: The Legend of Black Kat the game configures an anti-aliased surface pointing at the backbuffer and does some 3D rendering. It later performs CPU-based writes direct to the backbuffer, copying decoded video frames from the intro FMV. These CPU-based writes match the framebuffer configuration (presumably set via a prior call to AvSetDisplayMode) and do not contain anti-aliased data. Since the display of the framebuffer is goverened by AvSetDisplayMode, the resulting output looks correct.

xemu behavior

In an attempt to minimize copying between host RAM and GPU, xemu captures SurfaceBinding instances when performing (guest) GPU-based mutations. These store the surface configuration that was active at the time of a draw as well as hold the name of the GL buiffer that was configured using that configuration.

When applying CPU-based mutations, xemu continues to use this surface configuration, assuming that the content in VRAM matches the format that was configured for GPU draws. This leads to incorrect behavior in most of the aforementioned tests, as the format of VRAM is arbitrary and its interpretation is configured via AvSetDisplayMode (for the framebuffer) or the appropriate texture configuration settings (e.g., NV097_SET_TEXTURE_FORMAT, NV097_SET_TEXTURE_IMAGE_RECT, ...) in the case of textures.

xemu also uses the SurfaceBinding when performing optimized rendering of the host final output in nv2a_get_framebuffer_surface via pgraph_gl_sync. In the case of #652, this results in an incorrect pitch and size as the anti-aliasing format used in a (now stale and irrelevant) previous draw is applied to a region of guest VRAM that has been entirely overwritten by the CPU.

There is an additional failure path ([see #1165) where a game can use PVIDEO to render without initializing a 0x97 surface at all. xemu currently detects that no surface exists, falls back to blitting the framebuffer VRAM, and skips rendering the PVIDEO overlay entirely.

Proposed fix

Break dependence on 0x97 surface configuration

xemu has a fallback path (xb_surface_gl_create_texture) that creates a GL texture appropriate for the current framebuffer configuration.

This fallback could be replaced with a direct upload to gl_display_buffer in pgraph_render_display. Specifically, stop returning early here and the fallback handling in sdl2_gl_refresh would only be necessary in the case where VGA has not yet been initialized (framebuffer == NULL).

This would also require some minor additional handling to decouple the GL sync from PFIFO activity (see nv2a_get_framebuffer_surface)

Acceptance tests

  1. Verify that #652 and #1165 are resolved
  2. Verify that the multiframe test in https://github.com/abaire/nxdk_pgraph_tests/blob/main/src/tests/antialiasing_tests.cpp works as expected and does not suffer from frame duplication/skipping

The nxdk_pgraph_tests already contain a test that performs the following:

  1. CPU blit to VRAM
  2. Configure a surface whose format is non-standard (e.g., using anti aliasing or a pitch != width * bpp)
  3. Do a nop draw (this triggers xemu to create the surface binding and upload guest VRAM to host GPU)

xemu test results

See xemu-project/xemu#652 (comment) for the output of most of the aforementioned tests in xemu 0.7.55.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment