nanokatze/04. Descriptors.md

## 04. Descriptors.md

      
    Raw
  

              04. Descriptors.md
            
          
    A good way to think about a buffer or an image descriptor is to imagine it as a
very fat pointer. This is, in fact, not too far removed from reality, as we
shall see.
Taking a peek at radv, we find the uniform buffer and storage buffer descriptors
to be a 4-word tuple, where the first two words make up the address, followed by
length in bytes for bounds checking and an extra word, which holds format
information and bounds checking behavior ¹.
Similarly, the sampled image descriptor is a 16-word tuple containing an
address, a format, extent, number of samples, mip levels, layers, and other bits
the user provides when creating an VkImageView ².
The sampler descriptor is an odd one out in that in radv most sampler
descriptors are pure fat but some keep an index into a stash of samplers' extra
bits. That is, a sampler descriptor is a 4-word tuple, which holds all of the
sampler bits, unless custom border color is used, in which case the last word
also maintains an index into an array of custom border colors ³.
Combining these bits of knowledge, it is easy to guess that a combined
image-sampler descriptor is, in fact, a sampled image and a sampler descriptors
glued together.
Descriptor Sets and Descriptor Set Layouts

Descriptors are grouped into descriptor sets, not unlike variables are composed
into structures in C, with descriptor set layout being akin to a type
definition. Let's conceive an arbitrary descriptor set layout
// A list of VkDescriptorSetLayoutBindings making up an "everything"
// descriptor set. For simplicity, all stages can access all bindings.

	{0, VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT, 128, VK_SHADER_STAGE_ALL, NULL}, // camera
	{1, VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, 1, VK_SHADER_STAGE_ALL, NULL},             // transforms
	{2, VK_DESCRIPTOR_TYPE_SAMPLER, 2, VK_SHADER_STAGE_ALL, NULL},                    // samplers
	{3, VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE, 10000, VK_SHADER_STAGE_ALL, NULL},          // manyimages

In our C analogy, such a descriptor set layout would be written as follows
struct DescriptorSetOfEverything {
	char camera[128];
	StorageBuffer transforms;
	Sampler samplers[2];
	SampledImage manyimages[10000];
};

with a slight caveat that the offsets are unspecified and are hidden inside
VkDescriptorSetLayout ⁴. Nevertheless, let's put on shoes of radv and
calculate the descriptor offsets and the size of the descriptor set. First, we
shall familiarize ourselves with size and alignment of each descriptor


Descriptor
Size
Alignment


sampler
16
16


storage buffer
16
16


sampled image
64
32


inline uniform block
1
16


Then let there be a sequence nᵢ, n₀ = 0, nᵢ₊₁ = roundup(nᵢ, aᵢ) + kᵢmᵢ, where
aᵢ, mᵢ are, respectively, the alignment and size of i-th binding's descriptor,
kᵢ is i-th binding's descriptor count and
roundup(x, y) = min {yn | yn ≧ x, n ∈ ℤ}. For each binding i, roundup(nᵢ, aᵢ) is
the offset of the binding's first descriptor and given the number of bindings p,
nₚ is the descriptor set's size. Writing out this sequence, we get 0, 128, 144,
176, 640192. The offsets of each binding's first descriptor are thus 0, 128,
144, 192 and the size of a descriptor set of this layout would be 640192 bytes.
Memory

Note: VK_EXT_descriptor_buffer removes descriptor management APIs, leaving
descriptor management entirely up to the user and making this section
irrelevant. Fast forward to Push Descriptors.
Prior to descriptor buffers, there was no malloc for descriptor sets and, in
fact, no good analogy that a reader would be familiar with appears to exist.
Descriptor pools can be confusing. Read the following closely, lest you will
find your program only works on your computer.
A good starting point for reasoning about descriptor pools is to pretend that a
VkDescriptorPool is a VkDeviceMemory for descriptor sets. The list of
descriptor pool sizes taken by vkCreateDescriptorPool specifies the size of
the underlying VkDeviceMemory as a sum of each descriptor size times
descriptor count ⁵. For a concrete example, let's consider the following list
of pool sizes
	{VK_DESCRIPTOR_TYPE_SAMPLER, 1},
	{VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, 100},

In radv, sampler and combined image-sampler descriptors take up 32 and 96 bytes
respectively, thus the VkDeviceMemory inside such descriptor pool will be
32⋅1 + 96⋅100 = 9632 bytes. This is plenty to allocate a descriptor set of 200
UNIFORM_BUFFER descriptors and such a vkAllocateDescriptorSets call will
indeed succeed on radv, where a buffer descriptor takes 32 bytes. This behavior
is to be exploited, but not to be relied upon.
A simple method to deal with the allocation is to use a very large capacity
descriptor pool and allocate descriptor sets until VK_ERROR_OUT_OF_POOL_MEMORY
is returned. In the out of pool memory case, the pool becomes a zombie and when
the descriptor sets it backs are not needed any more, the pool can be freed.
VkDescriptorPoolSize poolSizes[] = {
	{VK_DESCRIPTOR_TYPE_SAMPLER, 1000},
	{VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE, 1000},
	{VK_DESCRIPTOR_TYPE_STORAGE_IMAGE, 1000},
	{VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, 1000},
	{VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, 1000},
};
// ...

	if ((r = vkCreateDescriptorPool(device, &(VkDescriptorPoolCreateInfo) {
		.sType         = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
		.maxSets       = 10000,
		.poolSizeCount = nelem(poolSizes),
		.pPoolSizes    = poolSizes,
	}, NULL, &descriptorPool)) != VK_SUCCESS) {
		// Handle error.
	}

While this method is simple, it can waste significant amounts of memory for some
applications, unless some tuning is done.
This inefficiency may be remedied by creating a descriptor pool per descriptor
set layout, which will accomodate some number of descriptor sets of this layout.
If a lot of descriptor sets have the same lifetime such as in cases, for
example, when the application allocates all descriptor sets during
initialization, it's possible to compute optimal pool size and use a single
pool, side-stepping descriptor pool cycling headaches entirely.
Push Descriptors

TODO
Future Directions

VK_EXT_descriptor_buffer removes the descriptor management APIs, letting the
user manage descriptors almost like any other data. Special treatment still
applies, such as stricter memory type requirements and the way descriptors are
accessed in the shader is different from accessing ordinary data. Structures
that wish to refer to the objects that descriptors represent need to establish
correspondence between some non-opaque data (for example, an integer) and a
descriptor. Section Solution Space in VK_EXT_descriptor_buffer ⁶, option 5
presents a more flexible way of accessing descriptors, providing an idea of what
shader descriptor access could look like in the future.
Footnotes


https://gitlab.freedesktop.org/mesa/mesa/-/blob/04be7934df765eea0623360f748249870487baee/src/amd/vulkan/radv_descriptor_set.c#L982 ↩


https://gitlab.freedesktop.org/mesa/mesa/-/blob/04be7934df765eea0623360f748249870487baee/src/amd/vulkan/radv_image.c#L1675 ↩


https://gitlab.freedesktop.org/mesa/mesa/-/blob/04be7934df765eea0623360f748249870487baee/src/amd/vulkan/radv_device.c#L7486 ↩


https://gitlab.freedesktop.org/mesa/mesa/-/blob/04be7934df765eea0623360f748249870487baee/src/amd/vulkan/radv_descriptor_set.c#L196 ↩


https://gitlab.freedesktop.org/mesa/mesa/-/blob/04be7934df765eea0623360f748249870487baee/src/amd/vulkan/radv_descriptor_set.c#L721 ↩


https://github.com/KhronosGroup/Vulkan-Docs/blob/main/proposals/VK_EXT_descriptor_buffer.adoc#2-solution-space ↩


## Handling Window Resize.md

      
    Raw
  

              Handling Window Resize.md
            
          
    Handling Window Resize

A rather common mistake in handling resize requests is initiating resize in
response to vkAcquireNextImageKHR or vkQueuePresentKHR returning
VK_ERROR_OUT_OF_DATE_KHR and querying swapchain extent with either
vkGetPhysicalDeviceSurfaceCapabilitiesKHR or through windowing system-specific
means.
On Windows, resizing windows of programs that interleave DispatchMessage with
redrawing will cause DispatchMessage to block. The window will appear to have
its contents "frozen" for duration of the resize and "unfreeze" when the user is
done. Programs that have message loop and redrawing concurrent will race
vkCreateSwapchainKHR against resize and break
VUID-VkSwapchainCreateInfoKHR-imageExtent-01274.
On Wayland, VK_ERROR_OUT_OF_DATE_KHR is never returned and there's no window
size to query: VkSurfaceCapabilitiesKHR returned by
vkGetPhysicalDeviceSurfaceCapabilitiesKHR will have currentExtent set to
0xFFFFFFFF×0xFFFFFFFF, minImageExtent to 1×1 and maxImageExtent to some
large value. Instead, the window size is specified by the program using
imageExtent when creating the swapchain. When the user resizes the window, a
resize message will be delivered, specifying the desired size, but it is up to
the program to resize the swapchain (and thus the window).
Another common mistake is assuming that the width and height are always
positive. This assumption is easily broken by shrinking the window to zero.
The correct approach to handling resizes is thus to listen to the windowing
system's resize message and be defensive about the window size. After a
swapchain is created, at least a single frame should be redrawn so that the
program is not stuck handling resize events, without ever having something to
present to the user.
The example program is structured into redraw and resize functions. To avoid
the mistake covered in the first paragraph of this section, redraw will simply
return instead of initiating resize (creating new swapchain).
VkDevice device;

VkSurfaceKHR surface;

VkSwapchainKHR swapchain;
bool swapchain_ok;

void
redraw(void)
{
	VkResult r;

	// Calling vkAcquireNextImageKHR or vkQueuePresentKHR on swapchain
	// for which a prior call returned VK_ERROR_OUT_OF_DATE_KHR is an
	// error, so it is our responsibility to make it sticky.
	if (swapchain_ok) {
		r = vkAcquireNextImageKHR(/* ... */);
		if (r == VK_ERROR_OUT_OF_DATE_KHR) {
			swapchain_ok = false;
		} else if (r != VK_SUCCESS && r != VK_SUBOPTIMAL_KHR) {
			// Handle the error.
		}
	}
	if (!swapchain_ok)
		return;

	// Record commands and submit.

	r = vkQueuePresentKHR(/* ... */);
	if (r == VK_ERROR_OUT_OF_DATE_KHR) {
		swapchain_ok = false;
	} else if (r != VK_SUCCESS && r != VK_SUBOPTIMAL_KHR) {
		// Handle the error.
	}
}

void
resize(uint32_t width, uint32_t height)
{
	VkResult r;

	assert(width > 0 && height > 0);

	vkDeviceWaitIdle(device);

	VkSwapchainKHR oldSwapchain = swapchain;

	if ((r = vkCreateSwapchainKHR(device, &(VkSwapchainCreateInfoKHR) {
		.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR,
		.surface = surface,
		.imageExtent = (VkExtent2D) {width, height},
		/* ... */
	})) != VK_SUCCESS) {
		// Handle the error.
	}

	// If it is possible for an image acquired from oldSwapchain to still
	// not be presented at this point, it should be made a zombie instead.
	vkDestroySwapchainKHR(device, oldSwapchain, NULL);

	// Create swapchain-dependent resources.

	swapchain_ok = true;

	redraw();
}

Now, to wire up with the windowing system
GLFW
static void resize_callback(GLFWwindow *w, int width, int height)
{
	if (width > 0 && height > 0)
		resize(width, height);
}

int main(int argc, char **argv)
{
	// ...

	glfwSetWindowSizeCallback(w, resize_callback);

	while (!glfwWindowShouldClose(window))
	{
		if (swapchain_ok)
		{
			glfwPollEvents();
		}
		else
		{
			glfwWaitEvents(); // don't burn cpu
			continue;
		}

		redraw();
	}

	// ...
}

SDL2
int main(int argc, char **argv) {
	// ...

	for (;;) {
		SDL_Event e;
		while ((swapchain_ok ? SDL_PollEvent(&e) : SDL_WaitEvent(&e)) != 0) {
			switch (e.type) {
			case SDL_WINDOWEVENT:
				if (e.window.event == SDL_WINDOWEVENT_SIZE_CHANGED && e.window.data1 > 0 && e.window.data2 > 0)
					resize(e.window.data1, e.window.data2);
				break;

			// Handle the remaining cases.
			}
		}

		redraw();
	}

	// ...
}

If Windows API is used, it is important to be careful when using CreateWindow.
Misusing CreateWindow will lead to the first WM_SIZE being lost.
static LRESULT CALLBACK
WndProc(HWND hwnd, UINT msg, WPARAM wparam, LPARAM lparam)
{
	switch (msg) {
	case WM_CREATE:
		// Create VkSurfaceKHR here
		break;

	case WM_SIZE: {
		uint32_t width = lparam & 0xffff;
		uint32_t height = (lparam >> 16) & 0xffff;

		if (width > 0 && height > 0)
			resize(width, height);
		break;
	}

	// Handle the remaining cases.
	}
}

On X11, XCB_CONFIGURE_NOTIFY is not sent in response to window being created.
The program should create the swapchain with size of the window it created. Note
that on X11, vkCreateSwapchainKHR always races against resize. This will
inevitably break VUID-VkSwapchainCreateInfoKHR-imageExtent-01274, which is
expected and should be muted in the validation layer.
int
main(int argc, char **argv)
{
	// ...

	resize(/* width and height the window was created with */);

	for (;;) {
		void *e;
		while ((e = xcb_poll_for_event(X)) != NULL) {
			xcb_generic_event_t *generic = e;

			switch (generice->response_type&~0x80) {
			case XCB_CONFIGURE_NOTIFY: {
				xcb_configure_notify_event_t *configure_notify = e;

				// Note that this message is also sent when
				// the window is being moved and not resized.
				// It might be desirable to ignore configure
				// notifications that do not change width and
				// height.

				if (configure_notify->width > 0 && configure_notify->height > 0)
					resize(configure_notify->width, configure_notify->height);
				break;
			}

			// Handle the remaining cases.
			}
			free(e);
		}

		redraw();
	}

On Wayland, the application never receives VK_ERROR_OUT_OF_DATE_KHR. It is
also much easier to handle the case when the window is shrunk to zero.
bool swapchain_ok = true; // can be made a constant on Wayland :-)

bool closed;

static void xdg_surface_handle_configure(void *data,
	struct xdg_surface *xdg_surface, uint32_t serial)
{
	xdg_surface_ack_configure(xdg_surface, serial);
}

static const struct xdg_surface_listener xdg_surface_listener = {
	.configure = xdg_surface_handle_configure,
};

static void
xdg_toplevel_configure(void *data,
	struct xdg_toplevel *xdg_toplevel, int32_t width, int32_t height,
	struct wl_array *states)
{
	if (width <= 1)
		width = 1;
	if (height <= 1)
		height = 1;
	resize(width, height);
}

static void
xdg_toplevel_close(void *data, struct xdg_toplevel *toplevel)
{
	closed = true;
}

static const struct xdg_toplevel_listener xdg_toplevel_listener = {
	.configure = xdg_toplevel_configure,
	.close = xdg_toplevel_close,
};

int
main(int argc, char **argv)
{
	// ...

	resize(/* any width and height desired */);

	while (!closed) {
		wl_display_dispatch_pending(display);

		redraw();
	}

	// ...
}

Inevitably, there will be bugs. Worse, drivers sometimes have hard to reproduce
bugs of their own, lurking in the swapchain. Should an issue arise, a
vkDeviceWaitIdle at the beginning of resize is often a sufficient
workaround.
Concurrency

Some programs may wish to process input at a rate independent from that of
redrawing. This requires that the windowing system message handling and
redrawing are made independent of each other, and are allowed to execute
concurrently.
typedef /* intentionally left blank */ Mutex;

void threadcreate(void (*f)(void*), void *a); // spawns a thread executing f(a)
void lock(Mutex *m);                          // locks m
void unlock(Mutex *m);                        // unlocks m

Mutex mu; // protects state accessed by resize and redraw

 void
 resize(uint32_t width, uint32_t height)
 {
 	VkResult r;

 	assert(width > 0 && height > 0);
+
+	lock(&mu);

 	vkDeviceWaitIdle(device);

 	VkSwapchainKHR oldSwapchain = swapchain;

 	if ((r = vkCreateSwapchainKHR(device, &(VkSwapchainCreateInfoKHR) {
 		.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR,
 		.surface = surface,
 		.imageExtent = (VkExtent2D) {width, height},
 		/* ... */
 	})) != VK_SUCCESS) {
 		// Handle the error.
 	}

 	// If some images acquired from oldSwapchain have not yet been
 	// presented, it should be made a zombie instead.
 	vkDestroySwapchainKHR(device, oldSwapchain, NULL);

 	// Create swapchain-dependent resources.

 	swapchain_ok = true;

 	redraw();
+
+	unlock(&mu);
 }

void
redrawLoop(void *a)
{
	while (/* redraw stopping condition */) {
		lock(&mu);
		redraw();
		unlock(&mu);
	}
}

int
main(int argc, char **argv)
{
	// Setup.

	threadcreate(redrawLoop, NULL);

	while (/* message loop stopping condition */) {
		// Exchange messages with the windowing system.
	}

        // Done. Some programs may want to join thread execing redrawLoop at
	// this point.
}

Careful readers will note the busy-waiting that occurs when the window is shrunk
to zero. It is desirable that the waiting is done by communicating what the
program is waiting for to the environment, so that CPU is not being burned in
vain. redraw needs to be modified so as to communicate to the caller when to
begin waiting. The caller will then wait on a condition variable, which will be
signaled after a swapchain is created.
typedef /* intentionally left blank */ Cond;
void wait(Cond *c, Mutex *m); // unlocks m, waits on c, locks m
void signal(Cond *c);         // wakes up waiters on c

Cond cond;

-void
+bool
 redraw(void)
 {
 	VkResult r;

 	// Calling vkAcquireNextImageKHR or vkQueuePresentKHR on swapchain
 	// for which a prior call returned VK_ERROR_OUT_OF_DATE_KHR is an
 	// error, so it is our responsibility to make it sticky.
 	if (swapchain_ok) {
 		r = vkAcquireNextImageKHR(/* ... */);
 		if (r == VK_ERROR_OUT_OF_DATE_KHR) {
 			swapchain_ok = false;
 		} else if (r != VK_SUCCESS && r != VK_SUBOPTIMAL_KHR) {
 			// Handle the error.
 		}
 	}
 	if (!swapchain_ok)
-		return;
+		return false;

 	// Record commands and submit.

 	r = vkQueuePresentKHR(/* ... */);
 	if (r == VK_ERROR_OUT_OF_DATE_KHR) {
 		swapchain_ok = false;
 	} else if (r != VK_SUCCESS && r != VK_SUBOPTIMAL_KHR) {
 		// Handle the error.
 	}
+
+	return true;
 }

 void
 resize(uint32_t width, uint32_t height)
 {
 	VkResult r;

 	assert(width > 0 && height > 0);

	lock(&mu);

 	vkDeviceWaitIdle(device);

 	VkSwapchainKHR oldSwapchain = swapchain;

 	if ((r = vkCreateSwapchainKHR(device, &(VkSwapchainCreateInfoKHR) {
 		.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR,
 		.surface = surface,
 		.imageExtent = (VkExtent2D) {width, height},
 		/* ... */
 	})) != VK_SUCCESS) {
 		// Handle the error.
 	}

 	// If some images acquired from oldSwapchain have not yet been
 	// presented, it should be made a zombie instead.
 	vkDestroySwapchainKHR(device, oldSwapchain, NULL);

 	// Create swapchain-dependent resources.

 	swapchain_ok = true;

+	// Don't care if redraw fails, if it does, the next resize event will
+	// be handled shortly.
 	redraw();

	unlock(&mu);
+
+	// Wake up waiters on cond. There is only ever at most a single waiter.
+	// Doesn't matter if signal happens before dropping mu or after.
+	signal(&cond);
 }

 void
 redrawLoop(void *a)
 {
 	while (/* redraw stopping condition */) {
 		lock(&mu);
-		redraw();
+		if (!redraw()) {
+			// Unlock mu and begin waiting on cond. Spurious wake
+			// ups are okay, because redraw will just fail and end
+			// up waiting again (or resize happens in the
+			// meanwhile).
+			wait(&cond, &mu);
+		}
 		unlock(&mu);
 	}
 }

Always remember to sanitize your threads once in a while!
Redraw on Request
Descriptor	Size	Alignment
sampler	16	16
storage buffer	16	16
sampled image	64	32
inline uniform block	1	16