Goals:
- support vsynced double and triple buffering
- zero buffer copies
User space execution flow:
- acquire buffer handle (can block if none is available yet)
- map buffer into virtual memory region
- draw into buffer
- unmap buffer
- queue buffer for display
Kernel execution flow:
- scanout current foreground buffer
- at end of frame, if no new buffer is queued, repeat same buffer (goto 1)
- new buffer becomes the foreground buffer
- the previous foreground buffer becomes available to be acquired by user space
Buffer states:
- waiting for scanout: 0-2 buffers
- currently being scanned: 1 buffer
- waiting to be acquired by user space: 0-2 buffers
- owned by user space: 0-1 buffers
The total number of buffers is 2 for double buffering and 3 for triple buffering. The mechanism would work with more than 3 buffers, but you'd be adding latency then with no clear benefit.
A buffer is put into "waiting for scanout" when user space hands it back to the kernel; this operation doesn't block. A buffer is put into "waiting to be acquired" on the vblank interrupt, assuming a new buffer was waiting to replace it as the front buffer.
Notes on buffer mapping:
- the buffer can be mapped into user space virtual memory with any cache setting
- cache flush happens on unmap
- mapping could use a huge page, so it needs fewer TLB entries
- the buffer doesn't need to be mapped while it's owned by the kernel, since we only DMA from it