Create a gist now

Instantly share code, notes, and snippets.

Parallelizable Gather-based Delaunay Transforms

Parallelizable Gather-based Delaunay Transforms

by Leonard Ritter, Duangle GbR

Today I figured out how to do GPU-friendly Delaunay transforms, and I'd like to describe how the algorithm works before I forget everything.

What the algorithm does is to flip edges in a half-edge triangle mesh where a flip would improve the quality of the adjacent triangles (an equilateral triangle has perfect quality). The triangle mesh I'm using is embedded in R2, that is, it's on a 2D plane, has no holes, is not a manifold, and is typically a heavily subdivided square; but the algorithm should very well work for 2-manifolds in 3D too.

The mesh is not guaranteed to be Delaunay conforming after a single transform, and there is no halting condition. The algorithm is designed to be run in realtime, continuously refining a mesh with moving vertices each frame. The algorithm supports constrained and unconstrained Delaunay transforms.


There are three passes to this algorithm, inspired by the approach of Navarro et al in [1]. Navarro et al's approach is scatter-based and requires atomics to work, but was a good starting point for figuring out how a lo-fi version might work. Each pass can be massively parallelized (as the iteration is order-independent), requires no GPGPU implementation or compute shaders and therefore is optimal for OpenGL 3.2 compatible hardware.

The first pass iterates all edges of the mesh and evaluates flip candidates based on whether a flip would improve the quality. The second pass iterates all triangles and does the flip, using a simple precedence heuristic to avoid collisions. The third pass fixes broken adjacency information.

Data Structure

The data structure I use for the triangle mesh is a reduced version of the structure described by Fabian Giesen in his Half-edges Redux article [2]. Each triangle has only three edges sorted in counter-clockwise order so that edgeId(triangleId, corner) = triangleId * 3 + corner. That makes it possible to store the vertex indices for each edge (which map to the vertex opposite each edge inside the same triangle, not each edge's starting vertex) as a linear element array, the kind of format that OpenGL prefers for indexed triangles. There are no wedges in this structure, and the algorithm needs to be modified if there are.

I also store an array of opposites for each vertex (which I will call twins for brevity), so that vertex(twin(edgeId)) gives the vertex opposite each edge in the adjacent triangle. That means each potential quad that an edge is the diagonal of is described by the vertices vertex(edgeId), vertex(next(edgeId)), vertex(twin(edgeId)) and vertex(prev(edgeId)) in counter-clockwise order. The algorithm needs the twin array to work, as well as a temporary copy of the twin array.

Here is a breakdown of the data structure:

V vertices
T triangles
E=3*T half-edges

// position[vertexId] = position of each vertex in R2
// Read in pass 1.
vec2 position[V];

// vertex[edgeId] = vertexId of inner vertex opposite a half-edge
// Read in passes 1 & 2, written in pass 2.
uint vertex[E];

// twin[edgeId] = edgeId of the twin half-edge connecting the same vertices
// Read in passes 1, 2 & 3, written in pass 3.
uint twin[E];

// temporary flag array, three bits per triangle (one bit per half-edge).
// Written in pass 1, read in pass 2.
uint8 candidates[T];

// temporary twin array to avoid a race condition.
// Written in pass 2, read in pass 3.
uint tmptwin[E];


The algorithm needs three passes to perform, which are as follows:

  1. For each triangle, for each of its half-edges, collect the four adjacent vertices that form a quad with this half-edge as its diagonal and estimate the quality of triangles with both possible diagonals. I use a quality estimator described by Müller et al. in [3]. If a flip would improve the quality, store a bit for this edge in candidates. By default, candidates[triangleId] must be initialized to zero.

    It is important that the quality computations use the exact vertex order from both sides of each edge to compute the exact same floating point values and arrive at the exact same predicate, or floating point imprecisions will cause asymmetrical results, and subsequent steps will destroy the mesh. I got good results with a fixed point implementation of the quality estimator that is insensitive to vertex order.

    If the transform is supposed to be constrained, then constrained edges should simply never be tagged.

  2. For each triangle, find the first half-edge that is tagged by candidates as a flip candidate. If no half-edge is tagged, then skip. If more than one half-edge is tagged, the tagged half-edge with the lowest lexicographic index wins.

    The lexicographic index is an uint64 built from the two half-edge ID's that make up a single edge. index(edgeId) = min(edgeId,twin[edgeId]) | (max(edgeId,twin[edgeId]) << 32).

    The half-edge candidate can still collide with tagged half-edges in the neighboring triangle. If its lexicographic index is not the lowest among its tagged neighbors, then skip. Otherwise, do this triangle's part of the flip:

    The flip is executed by mapping the half-edge's end vertex to the outer opposing vertex so that vertex[prev(edgeId)] = vertex[twin[edgeId]]. Because this moves two half-edges, the local twin information must also be corrected, so that tmptwin[next(edgeId)] = next(twin[edgeId]) and tmptwin[edgeId] = twin[next(twin[edgeId])].

    Because both triangles participating in this flip arrive at the same conclusions, do the same operation independently, and the flip operation itself is symmetric, the connectivity will be restored without collision or ambiguity.

    If no operation is performed, tmptwin must still be initialized so that tmptwin[edgeId] = twin[edgeId].

  3. There's one last problem to correct, which is the invalid adjacency information stored in half-edges bordering on triangles that have been flipped.

    For each half-edge, determine if its twin has changed by comparing tmptwin[edgeId] != twin[edgeId]. If the twin has changed, then we simply update twin[edgeId] = tmptwin[edgeId].

    We also check if this half-edge's information is still up to date by verifying the bi-directionality of the link using the predicate edgeId == tmptwin[tmptwin[edgeId]]. If the predicate fails, then we have to search for the new, correct adjacent half-edge, whose location is fortunately invariant.

    We now need to discern whether edgeId does belong to a half-edge bordering on a flipped triangle using the predicate tmptwin[tmptwin[tmptwin[edgeId]]] == tmptwin[edgeId].

    If the comparison holds true, we update the half-edge information to the invariant location of its new half-edge so that twin[edgeId] = prev(tmptwin[tmptwin[edgeId]]), and we're done.


[1]Navarro et al. A parallel GPU-based algorithm for Delaunay edge-flips
[2]Fabian Giesen Half-edges redux
[3]Müller et al. Air Meshes for Robust Collision Handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment