Skip to content

Instantly share code, notes, and snippets.

@paniq
Last active February 19, 2024 09:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save paniq/ba664b099871f8cc7e653fd5f80084b3 to your computer and use it in GitHub Desktop.
Save paniq/ba664b099871f8cc7e653fd5f80084b3 to your computer and use it in GitHub Desktop.
The Knowable Heap

Following conditions need to be met to make the C heap serializable:

  1. get start of allocation from offset pointer
  2. get size of allocation
  3. addresses appearing in allocation must be tagged as pointers (1:64 metadata)
  4. associate foreign pointers/allocations with plugins; the simplest is dlsym, but there's also file handles, graphics resources, JIT functions, stack vars etc.

(4) is what makes the whole affair very interesting.

above conditions also make it possible to run a GC. it is also possible to perfectly deduplicate the heap in subquadratic time, to validate pointers, guard against bad load/stores & to merge different heaps.

the runtime bookkeeping would be very light; metadata only needs to be set when a resource/allocation is created; memset/memcpy can be performed tag-aware; union types need to update their pointer tags when they change. it's a little extra work that barely interferes with normal operations.

as to (3):

tagbits can be fairly lightweight; when your heap operates on a 46 bit range, then you reserve a 40 bit range for tagbits. the two ranges map bijectively: address base_a+p maps to bit base_b+(p - base_a)/8, as addresses are always 8-byte aligned.

another highly interesting extension to this is thunking: the userfault() feature can be used to load heap resources lazily, the first time the page where the allocation appears is accessed. any large allocation spanning one or multiple pages exclusively qualifies for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment