NotKyon/rendering_frames.md

## rendering_frames.md

      
    Raw
  

              rendering_frames.md
            
          
    Preamble

What this is

This is a document detailing the planned approach for a rendering system I intend to implement, the reasons driving the approach, and the general rationale for the new design.
Who should read this

If you're interested in rendering APIs and want to look at different approaches, this document might be for you. I'm not claiming to be an expert in any field, especially not one as diverse and dynamic as rendering approaches, so treat everything here like I'm building a wall to keep good technology out and making you pay for it. In other words, consider whether better ways might exist and whether either of us are biased toward something.
Why I'm writing this

Okay, so Doll could really use a revamp to its general rendering design. There are some things that make the design inelegant, bugs waiting to happen, and bugs that are present but aren't triggered yet:

Resources are not tracked! We should be aware of which resources are needed, and the order they're needed in.
Commands cannot be submitted asynchronously! We can't generate rendering commands on a separate thread then pass them along.
Rendering abstraction is not well suited to modern workloads! We can't use shaders, nor does the abstraction API get enough high-level information to properly optimize.
Inefficient for lower level APIs! We are unable to make effective use of Vulkan, Direct3D 12, and Metal with our existing design. They could be used, but it would not be proper use.
Overused memory cache not being cleared out on OS request! We do not respect low memory scenarios even though we could. Some operating systems will let us know when we're running low on memory and we do not clear up cached data (system side or GPU side) that could be cleared. (e.g., resizing an automatically created texture atlas when resources aren't being used.)
Clumsy design! We have an ad hoc design of the whole API with increasing technical debt that we want to eliminate.

I'm planning to write a wrapper around Doll for Tenshi – A Basic-like language I'm slowly developing for nostalgic/hobbyist purposes – and want to improve the general design and eliminate as many bugs as I can to make working with both of these easier.
Design

Is this user-facing?

NO. This is an internal design, the average user of the API is very unlikely to have to deal with any detail of it. This is for implementing the internals of the engine, not for using the engine.
Existing work

Portions of this are heavily inspired by:

FrameGraph: Extensible Rendering Architecture
Parallelizing the Naughty Dog Engine Using Fibers

Layers

Rendering is divided into "layers," which can be thought of as not dissimilar to a layer in a painting program, like GIMP or Photoshop. Layers are arranged in a tree hierarchy which gets rendered breadth-first, thus causing the super-nodes to be visually drawn under the sub-nodes. Sub-nodes of the tree cannot visually extend past their super-nodes, however. They will be clipped, as with a standard UI.
The rendering API receives commands like draw(image: someTexture, at: somePosition) through per-layer command queues. Each layer caches the commands it has received. When it is time to draw the layer, these commands are enumerated (but not drained), causing a so-called "primitive buffer" to be filled with the raw vertex data. When a state change is made necessary (e.g., the "real texture" – more on that later – changes) a draw call will be issued with that data.
Layers also partially take care of "sprite groups."
Layers: Revisions to be made

Nothing major, currently. The general design is acceptable. Minor gardening improvements may make their way in though, and unused features (like the "layer effect system") might get axed if no specific need for them is rediscovered.
Command submission should be adjusted such that it's broken up into separate individual "command packets" diced up and ready for rendering submission with all associated state. In that scenario the original command queue would be retained only in "DEVELOPMENT" and "DEBUG" build variants, rather than "PROFILE" and "RELEASE" variants.
Likewise, there's a lot of room for improving responses to "low memory" situations.
Sprite groups

Sprite groups are collections of "sprites" with settings associated with how they are to be rendered. Layers invoke sprite group rendering routines when that layer is rendered. The sprite group will then apply a projection and other transformations to the sprites being rendered (at some virtual resolution).
When rendered, the sprites insert themselves into a separate primitive buffer from the one independent rendering commands go to for that layer. This primitive buffer is per sprite group. The reason a separate primitive buffer is used, instead of reusing the one for the rendering commands, is that sprites and user-issued render commands do not necessarily get updated in lock-step. This additionally would theoretically allow a sprite group to be reused in another layer, as only the projection would need to change. This avoids reupdating the sprites for those additional layers, not that doing so is a common case (or even supported at a high-level).
Sprite groups: Revisions to be made

The overall sprite group design is fine, however the "virtual resolution" they offer, and other similarities, should be moved into layers as that would ensure a more consistent experience, and improve the general high-level rendering API.
Sprites themselves should be made more general. They do not allow for different images to be used in animations, for example, nor do they offer the ability to render things other than quad-mapped images. They also do not allow for more advanced "stencil rendering" methods. These things aren't quite the sort of low-level details I intend to go over here though.
Resources

We need frequent access to resources like textures and buffers. In the lower level APIs and modern API usage we also require shaders, "render pipeline objects" and similar setting management for such things as uniform buffers. This all needs to fit within the general immediate-style API as well.
Textures

Currently the only resource users can pass in directly to a rendering command is a texture, so let's focus on that.
When a texture is given to the user, what the user is actually getting is a handle to a portion of a GPU-backed texture. For example, suppose you were to load, one by one, several 32x128 images, "plr-idle-0.png," "plr-idle-1.png," "plr-idle-2.png," "enemy1-idle-0.png," "enemy2-idle-0.png," and so on. Rather than allocating separate GPU resources for each of these, the individual images are loaded into a "texture atlas," a collection of textures. The atlas itself might be 2048x2048 texels in size, allowing for many images to be loaded in.
Thanks to this approach, state changes needed for rendering commands, and thus draw calls, are reduced significantly. (It's not unreasonable for an entire scene's resources to live inside one texture.)
However, we do not currently track which textures are needed. Some systems (such as the "OSText renderer") create temporary textures for user-convenience, then manage those textures. This is very error prone. Take the following, for example:
void renderSomethingEachFrame() {
    static RTexture *oldTex = nullptr;
    
    gfx_deleteTexture( oldTex );
    AX_EXPECT_MEMORY( oldTex = gfx_newTexture( ... ) );
    
    gfx_blitImage( oldTex, ... );
}
Suppose the above function is called once each frame. No problem! Suppose the above function is called twice each frame... Well, we're not currently tracking which textures are still being used by the command queue. So deleting the previous texture, as we diligently do in each invocation, means we end up breaking the texture, so one of the blit commands will be very wrong. (Or crash sometimes.) Or it could work perfectly fine on your machine, but not on anyone elses.
Okay, let's say we took that into account, storing each texture we allocated and deleting them on the next frame. That works if all of the layers you're rendering into are always updated each frame... but that isn't necessarily the case. Some layers may be generated once, then never updated ever again, allowing quick rendering of the layer when necessary.
In that case, we could have an image (or several) that were valid on the frame that layer was generated, but became invalid on subsequent frames (even though it might not visually look like they were invalidated for quite some time). So for instance, a texture was made on frame 3, and referenced by layer A that same frame, 3. Then comes frame 4 and that texture gets deleted without another one coming to take its place. That texture has become invalid, and so we'd be using corrupt data (even if it doesn't appear like that yet).
Textures and resources: Revisions to be made


TRACK RESOURCES. This can be done at the command queue level and managed with internal reference counting. Command queues should generate a list of which resources are inputs and which resources are outputs.
Some resources should be marked as "caches" which can be regenerated. Texture atlases should be markable as shrinkable, allowing "low memory" notifications to adjust the texture.
Introduce a LRU (least recently used) cache system with abstract "texture source" interfaces. A texture source implementation could be a file reader, or a procedural generator, for example. This facilitates hot reload support (e.g., changing a texture file would optionally allow the asset to be reloaded automatically), as well as allowing eviction of resources that can be reloaded or remade, which would significantly help with low memory notifications.

TO-DO: More later.