Skip to content

Instantly share code, notes, and snippets.

@RSDuck
Last active April 12, 2024 22:48
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RSDuck/b60d89fbcb4374977cb8fbc7fccdbde4 to your computer and use it in GitHub Desktop.
Save RSDuck/b60d89fbcb4374977cb8fbc7fccdbde4 to your computer and use it in GitHub Desktop.
deko3d Notes

So you want to use deko3d for maximum speed and minimum bloat?

Great!

deko3d is fantastic, working on the same layer of abstraction or even lower than Vulkan, but (even without the C++ wrapper) you tend to only need half as much code to do the same thing.

I've used it for a while now and kind of done the same thing wrong one too many times, so here's a list of those things.

Some things listed here are also documented in the primer while others are not documented there.

If some of these questions seem contrived, they aren't, this is the result of a lot of trial and error.

How can I barrier stuff which uses the copy engine (dkCmdBufCopyBuffer, dkCmdBufCopyBufferToImage, dkCmdBufCopyImageToBuffer) or 2D engine (dkCmdBufBlitImage, dkCmdBufResolveImage)?

You don't need to. In fact you don't even need to use a barrier before it, if you want everything on the 3D engine to finish and only start after the 2D/copy engine is done and you also don't need to worry about cache coherency*. In the GPU cmdbufs there's this concept of subchannels, each engine has one. When you switch from one to another commands are flushed and deko3D sourrounds each copy engine and 2D engine command by nops on the 3D engine.

* It's not 100% confirmed, but I've tested it in a few different scenarios with updating texture between draw calls and the there never were cache coherency issues.

I don't fully understand this, but when doing a readback from the GPU (via dkCmdBufCopyImageToBuffer) and then immediately waiting on the result with a fence it is necessary to insert a primitive barrier and an L2 cache flush. While the latter makes sense (as deko3D will only insert a cache flush to be executed at the beginning of the next command list), the fence wait shouldn't be visible?

Can 2D engine operations have the same source and destination, like glCopyImageSubData or glBlitFramebuffer

Yes, overlapping source and destination regions are to be undefined behaviour though like in the OpenGL.

For fun's sake what seems to happen is that it actually manages to integrate the in progress result resulting in fractal like images with a few glitches in the lowest levels.

Thanks to Pharynx for helping with the last two.

I push data with dkCmdBufPushData to descriptors/samplers/whatever similar to dkCmdBufPushConstants (i.e. multiple times between draw calls with no barrier inbetween), why doesn't it update the data?

Contrary to it's name, dkCmdBufPushData doesn't seem to have push semantics, but it also doesn't use the copy engine (which would result in everything being ordered correctly). It uses the 3D engine and thus needs explicit fences and cache flushes for correct ordering.

Though it's probably more advisable to just use a larger descriptor buffer containing all the descriptors than to insert a bunch of slow barriers.

A function which accepts a DkImageView segfaults?

DkImageView objects contain a pointer to the DkImage which was used to create it, so they may not outlive it.

I get weird errors with fences

The memory access of a fence is written into the command list, it's address can't be changed anymore afterwards, so make sure that memory stays where it is until the fence is signaled.

Secret deko3d info: Internally there are external and internal fences. Almost all of the fences your fences will be internal ones, where this applies, but e.g. the fences from dkSwapchainAcquireImage are not, their memory is external (you can see this in action in the implementation of dkQueueAcquireImage how the advice given here about the fence having to live long enough is just ignored).

I get an "Attempt to submit to queue in error state error", what should I do?

You probably misconfigured the drawing pipeline (the error should go away if you remove the draw calls), double check all of that, especially all the vertex buffer setup.

Happened to me twice: Passing DkStage_Vertex|DkStage_Fragment to dkCmdBufBindShaders when it should be DkStageFlag_Vertex|DkStageFlag_Fragment.

I get freezes once I try executing inside the acquire/present loop?

The queue needs to be flushed otherwise the already submitted command buffers will never start executing and dead lock occurs. dkQueuePresentImage flushes the queue.

My command lists never execute, even though I flushed the queue with dkCmdBufSignalFence??

The flush parameter in dkCmdBufSignalFence doesn't flush the queue, it only flushes the cache, only calling dkQueueFlush (and some other operations like presenting an image) does actually flush it.

Also happened to me twice.

Some small texture sizes (e.g. 32x8) behave weirdly (e.g. parts are cut off)

devkitPro/deko3d#10

I have a cmdbuf with a callback to allocate further memory. Why doesn't it call it after callingdkCmdBufClear?

dkCmdBufClear does not remove all the memory associated with the command buffer, it only moves back to the beginning of the last appened slice.

I want to resolve only part of a multisampled (MSAA) framebuffer into another part of an image like in Vulkan, how can I do this?

Good news, resolving framebuffers seems to be just an image blit with linear filter and dkCmdBufBlitImage with DkBlitFlag_FilterLinear seems works just as well while allowing to crop and even scale!

I'm protecting per frame resources (cmdbuf memory, upload buffers, etc.) with dkQueueAcquireImage and dkQueuePresentImage (no other fences)? Why do I get flickering/other glitches and instabilities.

dkQueueAcquireImage inserts a fence wait into the queue to wait for the framebuffer to be writeable again, but it (and the associated resources) aren't writeable when it returns. You either need to call dkSwapchainAcquireImage (instead of dkSwapchainAcquireImage) yourself and then wait on the fence yourself (which would work but is stupid) or use your own (possibly finer grained) fences.

How can shaders be unbound?

dkCmdBufBindShaders always replaces the entire shader pipeline. Unspecified stages are disabled.

Ref the defered shading deko3D example. Is it possible to render "subpasses" (via tile barrier) and not use separate framebuffers for the different "subpasses" like in the example?

Looks like it works! Unlike Vulkan (where the special subpassLoad function in the shader has to be used) the same image is bound as render target and texture and then accessed with texelFetch. Accesses outside the same tile are funny.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment