/gist:7918813

## gistfile1.txt
A vertex buffer conceptually encodes a chunk of voxel data
which has some finite x/y extent, but fully covers z (where
 is height).

Ignorning chunks which are at same x/y of the chunk viewer
is in, it's possible to skip two of the four n/s/e/w faces
in the chunk. (I.e. if the chunk is to the NW of viewer,
the N and W faces of the chunk cannot be visible.)

If we sort the faces in the index buffer to separate each
of the directions, we effectively end up with four index
buffers (one for each direction), and we only have to draw
two of those index buffers, thus halving the number of faces
the GPU needs to process. (This is interesting because I'm
currently vertex or primitive-limited, submitting 20M tris/frame.)

If the faces are stored E,N,W,S consecutively then we can render
E+N, or N+W, or W+S in a single draw call, and only in the S+E
case do we need two draw calls.

We also need to draw the U/D faces. If we sort all the U/D
faces from top to bottom, then based on the viewer's height,
we can draw the appropriate subset of the indices which are
visible. Naively, this requires two more draw calls, putting
us at 3-4 draw calls.

Suppose there are m up faces and n down faces; only
faces below the viewer are visible, i.e. only the bottommost up
faces are visible, so we draw the up faces numbered [0,k1)
for some k1 0 <= k1 <= m; similarly only the down faces above
the viewer are visible, so we draw [k2,n) of the downward-facing
faces.

So we're always drawing the beginning of the up faces and the
ending of the down faces. If we pack those together, so up faces
are stored immediately after down faces, we can do this in only
one draw call, putting us at 2-3 draw calls (average 2.25).

Alternatively, we could pack the full index buffer:
    D,E,N,W,S,U
then we can merge the draw of the D faces with the E faces,
and the draw of the U faces with the S faces.

This leads to a worst case of drawing: D,NW,U (3 draw calls),
and the other cases are 2 draw calls: D+E+N,U; D,W+S+U;
D+E,S+U. So also 2.25 average, same outcome as the "simple"
approach. I don't think there's anyway to avoid needing 3
draw calls in one case, except if we duplicate some of the
indices. For example, we replicate the E indices:
   D,U,  E,N,W,S,E
This wouldn't be that bad; with 8-byte vertices and ~50%
vertex sharing, my index buffers are currently about 60%
of the size of my vertex buffers, so adding 1/6th more would
just bring them to 70%, which is a total size increase of
0.1/1.6 or 6%.

In practice, for me, all of this is probably irrelevant; I
currently have more than one draw call per chunk anyway, as
I split chunks up into multiple fixed-size vertex buffers for
memory management reasons, so I can balance the increase in
draw calls by using larger fixed-size buffers or maybe by
changing how the chunk is split into multiple vertex buffers
to split based on faces (although this has utilization issues).
	A vertex buffer conceptually encodes a chunk of voxel data
	which has some finite x/y extent, but fully covers z (where
	is height).

	Ignorning chunks which are at same x/y of the chunk viewer
	is in, it's possible to skip two of the four n/s/e/w faces
	in the chunk. (I.e. if the chunk is to the NW of viewer,
	the N and W faces of the chunk cannot be visible.)

	If we sort the faces in the index buffer to separate each
	of the directions, we effectively end up with four index
	buffers (one for each direction), and we only have to draw
	two of those index buffers, thus halving the number of faces
	the GPU needs to process. (This is interesting because I'm
	currently vertex or primitive-limited, submitting 20M tris/frame.)

	If the faces are stored E,N,W,S consecutively then we can render
	E+N, or N+W, or W+S in a single draw call, and only in the S+E
	case do we need two draw calls.

	We also need to draw the U/D faces. If we sort all the U/D
	faces from top to bottom, then based on the viewer's height,
	we can draw the appropriate subset of the indices which are
	visible. Naively, this requires two more draw calls, putting
	us at 3-4 draw calls.

	Suppose there are m up faces and n down faces; only
	faces below the viewer are visible, i.e. only the bottommost up
	faces are visible, so we draw the up faces numbered [0,k1)
	for some k1 0 <= k1 <= m; similarly only the down faces above
	the viewer are visible, so we draw [k2,n) of the downward-facing
	faces.

	So we're always drawing the beginning of the up faces and the
	ending of the down faces. If we pack those together, so up faces
	are stored immediately after down faces, we can do this in only
	one draw call, putting us at 2-3 draw calls (average 2.25).

	Alternatively, we could pack the full index buffer:
	D,E,N,W,S,U
	then we can merge the draw of the D faces with the E faces,
	and the draw of the U faces with the S faces.

	This leads to a worst case of drawing: D,NW,U (3 draw calls),
	and the other cases are 2 draw calls: D+E+N,U; D,W+S+U;
	D+E,S+U. So also 2.25 average, same outcome as the "simple"
	approach. I don't think there's anyway to avoid needing 3
	draw calls in one case, except if we duplicate some of the
	indices. For example, we replicate the E indices:
	D,U, E,N,W,S,E
	This wouldn't be that bad; with 8-byte vertices and ~50%
	vertex sharing, my index buffers are currently about 60%
	of the size of my vertex buffers, so adding 1/6th more would
	just bring them to 70%, which is a total size increase of
	0.1/1.6 or 6%.

	In practice, for me, all of this is probably irrelevant; I
	currently have more than one draw call per chunk anyway, as
	I split chunks up into multiple fixed-size vertex buffers for
	memory management reasons, so I can balance the increase in
	draw calls by using larger fixed-size buffers or maybe by
	changing how the chunk is split into multiple vertex buffers
	to split based on faces (although this has utilization issues).