xemu's current (at least v0.7.55 and below) handling of GPU draws interleaved with direct CPU-based manipulation of (guest) VRAM lead to incorrect results.
The nv2a uses the concept of two "surfaces" to control the output of 3D GPU-based draws, a color surface and a zeta
(generally depth or depth+stencil) surface. These are configured via commands such as NV097_SET_SURFACE_PITCH
,
NV097_SET_SURFACE_FORMAT
, NV097_SET_CONTEXT_DMA_COLOR
, and NV097_SET_SURFACE_COLOR_OFFSET
(amongst others).
In practice, the surface configuration only affects how actual GPU draws are translated into VRAM, and thus may be combined with
CPU-based direct manipulation of VRAM. For example, in Pirates: The Legend of Black Kat
the game configures an anti-aliased
surface pointing at the backbuffer and does some 3D rendering. It later performs CPU-based writes direct to the backbuffer,
copying decoded video frames from the intro FMV. These CPU-based writes match the framebuffer configuration (presumably
set via a prior call to AvSetDisplayMode
) and do not contain anti-aliased data. Since the display of the framebuffer is
goverened by AvSetDisplayMode
, the resulting output looks correct.
In an attempt to minimize copying between host RAM and GPU, xemu captures SurfaceBinding
instances when performing (guest) GPU-based mutations. These store the surface configuration that was active at the time of a
draw as well as hold the name of the GL buiffer that was configured using that configuration.
When applying CPU-based mutations, xemu continues to use this surface configuration, assuming that the content in VRAM matches
the format that was configured for GPU draws. This leads to incorrect behavior in most of the aforementioned tests, as the
format of VRAM is arbitrary and its interpretation is configured via AvSetDisplayMode
(for the framebuffer) or the appropriate
texture configuration settings (e.g., NV097_SET_TEXTURE_FORMAT
, NV097_SET_TEXTURE_IMAGE_RECT
, ...) in the case of textures.
xemu also uses the SurfaceBinding
when performing optimized rendering of the host final output in
nv2a_get_framebuffer_surface
via
pgraph_gl_sync
.
In the case of #652, this results in an incorrect pitch and size as the anti-aliasing format used in a (now stale and
irrelevant) previous draw is applied to a region of guest VRAM that has been entirely overwritten by the CPU.
There is an additional failure path ([see #1165) where a game can use PVIDEO to render without initializing a 0x97 surface at all. xemu currently detects that no surface exists, falls back to blitting the framebuffer VRAM, and skips rendering the PVIDEO overlay entirely.
xemu has a fallback path (xb_surface_gl_create_texture
)
that creates a GL texture appropriate for the current framebuffer configuration.
This fallback could be replaced with a direct upload to gl_display_buffer
in pgraph_render_display
. Specifically,
stop returning early here and the fallback
handling in sdl2_gl_refresh
would only be necessary in
the case where VGA has not yet been initialized (framebuffer == NULL
).
This would also require some minor additional handling to decouple the GL sync from PFIFO activity
(see nv2a_get_framebuffer_surface
)
- Verify that #652 and #1165 are resolved
- Verify that the multiframe test in https://github.com/abaire/nxdk_pgraph_tests/blob/main/src/tests/antialiasing_tests.cpp works as expected and does not suffer from frame duplication/skipping
The nxdk_pgraph_tests already contain a test that performs the following:
- CPU blit to VRAM
- Configure a surface whose format is non-standard (e.g., using anti aliasing or a pitch
!= width * bpp
) - Do a nop draw (this triggers xemu to create the surface binding and upload guest VRAM to host GPU)
See xemu-project/xemu#652 (comment) for the output of most of the aforementioned tests in xemu 0.7.55.