Skip to content

Instantly share code, notes, and snippets.

@o11c
Last active December 2, 2023 03:21
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save o11c/dee52f11428b3d70914c4ed5652d43f7 to your computer and use it in GitHub Desktop.
Save o11c/dee52f11428b3d70914c4ed5652d43f7 to your computer and use it in GitHub Desktop.

List of intended memory-management policies (note: the actual policies are below, after both kinds of modifiers):

structure modifiers:

  • relative - own address is added to make effective pointer. Useful for realloc and memcpy, as well as shared memory.
  • middle_pointer - actually points to the middle of an object, with a known way to find the start
    • note particularly how this interacts with subclasses. Possibly that should be the only way to create such a pointer? It isn't too crazy to synthesize a subclass just for the CowString trick ...
  • (other arithmetic tricks possible: base+scale+offset, with each of these possibly hard-coded (which requires that there be multiple copies of some ownership policies! we can't just use an enum) or possibly embedded)
  • (but what about "object is stored in a file too large to mmap" and such? Or should those only satisfy "ChunkyRandomIterator" concept?)

policy modifiers:

  • nomove
    • is it useful to have an explicit "owned by foreign GC"? That would imply an additional "foreign GC root", but I was handling that as 'resources' currently ...
  • singlethread (avoid atomic operations - this does win in benchmarks, though it's not insane to omit it if you elide useless changes)
  • mostlyonethread (has an owner, others have to pay a toll to access)
  • variant<policy1, policy2> e.g. "maybe shared, maybe borrowed"
    • unless_singleton - allow dereferencing the special object at the head of a linked list, but otherwise always follows the main policy.

      (Note that the policy is usually "unique", although I suppose "shared" might make sense for graphs.)

policies:

  • complex_shared (allocated as 2 objects, unless make_shared. Can do weird tricks with unrelated pointers if you want.)
    • weak
  • simple_shared (uses a refcount, so forbids oddly-sliced objects)
    • weak (distinct from complex_shared's weak)
  • shared_exactly_twice - there are exactly two owners initially; others can only make weak pointers if anything. The two owners can only be moved, mostly like unique.
  • unique
    • deep/forward (would overflow the stack destructing in order - e.g. linked lists, but also unbalanced trees; beware the case where it is a tree without uniform type!)
    • weak (distinct from the shared kind of weak. It may be worth making multiple variants of unique based on how they support this. BEWARE SAFETY PROBLEMS, especially with threads - what can you do with the weak pointer?)
      • temporarily_shared - causes unique's dtor to kill the process if any are still alive - some call this "constraint references"
    • copy (avoiding slicing, but logically similar to value. This probably doesn't need a distinct type, only an optional trait - but CoW may change this)
  • circular (always applied to a single member of the same type as we are in - in multiply-linked lists, others are borrowed)
  • borrowed (note overlap with unique's weak)
    • parent/back (this allows easier checking, since we know someone will clean up after us)
      • the parent object might have various unique-like ownership policies of the current object; deep and circular are notable. But even shared isn't completely nonsensical; consider a DAG with only a single "shortest" backpath.
      • note that for non-list-like cases, the parent object often has indirect ownership, e.g. via a list of child widgets.
      • remember to support the case where the parent isn't currently alive; the list-head exception might be useful
    • borrowed-and-not-immediately-dying - solve the string().c_str() problem without a full Rust-style borrow checker
  • raw (definitely needed at least for implementing other pointers; see resources in other file)
  • static
    • zero-init, constant-init, reloc-init, code-init
    • thread_local (requires allocation except in the main thread. May or may not be async-signal-safe, depending on TLS model)
  • value (slicing)
  • pool (not sure how this fits in; comparable to complex_shared)
    • if there is a pool of fixed-size objects, you can use "generation counter" to detect dead weak references that are dead.

Stuff that needs to be handled in a destructor

  • memory, by the main program's allocator
    • stack memory, as an optimization thereof
    • raw memory - just an object of type byte[]
    • remember that other CPU-supported address spaces usually exist
  • foreign object memory
    • may or may not support refcount and weak; may support weak conditionally like Python does.
  • locks
    • file locks
  • "objects" in an non-CPU-supported address space (e.g. files larger than mmap can support, but used similar to memory objects, e.g. with refcounting. This is hard to support right since you can't form references, only pointers.)
  • syscall-allocated resources:
    • file descriptors/handles
    • messages sent over a socket (or perhaps an inter-thread queue)
    • threads/processes (join or detach? or sometimes kill)
    • hardware, despite that SIGKILL can prevent closing (this is why changing X11 resolution is bad!)
  • global/thread_local (or contained-but-not-exclusively-owned, e.g. within a "god object") variable manipulations
    • implement dynamic-scoped variables, by unwinding a stack (think GL matrix transforms or clipping)
      • this includes redirecting stdout/stderr, although that may be on a syscall level
    • return this object's ID to a pool
    • unhooking listeners/observers (in either direction)
  • commit or rollback a transaction
  • close an XML tag
  • start/stop a timer/profiler
@Verdagon
Copy link

Hey o11c! I saw this from your comment on HN. This part is particularly interesting to me:

shared_exactly_twice - there are exactly two owners initially; others can only make weak pointers if anything. The two owners can only be moved, mostly like unique.

Is this reference-counted at run-time? Or would it be compile-time, e.g. by requiring that both owning references be present and destroyed at the same time? I've been doing some thinking on the latter, and have some ideas there if you'd like. Cheers!

@o11c
Copy link
Author

o11c commented Sep 25, 2023

Hi Verdagon. Sorry that I don't check my github as regularly as my HN or Lemmy (programming.dev) ones.

shared_exactly_twice is interesting since the two owners are asymmetric, at least in the motivational case I came across.

When I did this, it was in the context of the timer half of an event loop, implemented with a priority queue:

  • the "Key" is the owning object actually pushed to the heap. It probably needs about 3 fields: the timestamp, the sequence number (as a tiebreaker, because reordering is nasty), and a pointer to Handle.
  • the "Handle" is the object returned from the "schedule an event" API, and can be used to cancel or detach the event (I found "cancel" was by far the more useful default if the handle got destructed). It is usually assigned to a meaningfully-named field of the high-level application object (e.g. a mob might have think_timer, walk_timer, and attack_timer). It probably just contains a pointer to Key.
  • the "Payload" contains what to do for the event (call a function (usually referencing Handle's owner), maybe schedule it again for N seconds later), owned (somehow) by both "Key" and "Handle". I'm pretty sure I embedded it into one of the others, but it itself does not actually participate in this ownership mess - it's the thing being owned.

Doing this in C++ is nice since you get explicit moves, so the two "owners" can update their peer when their address inevitably changes.

If either Handle's or Key's destructor got called, it would invade the other and replace it with a default-constructed instance that no longer owns the Payload (or, obviously, knows about the dead peer), and Payload itself would be destroyed. Handle's "cancel" method was just a nicer name for replacing with a default-constructed object (In C++, the necessity of a moved-from state means Optional isn't nearly as compelling).

Key's lifetime cannot end without executing the Payload, so Handle can never be the sole owner.

Handle's "detach" method suicides, leaving only one owner (so I guess Payload must've been embedded in Key?). Or I suppose it could be considered moving it to within Key (or within Payload I guess), to be destroyed when that does, but that would require more special logic, not less.

Whether "detach" should be called or not depends entirely on whether the Payload's function references an application-level object (the one that would own the Handle) or not. Actually I guess it could probably be an argument to the event loop's register call instead of being called after registry, but I guess I didn't want to forbid a late detach ... or maybe I just didn't want to refactor the API too much. The important thing is that Payload does not keep Handle's owner alive.

(note that a cancel'ed Key must remain in the PQ since none of these things actually know about what they in turn are owned by. And also you need to maintain the PQ invariant ... related, one irritating limitation of stdlib container implementations is their inability to transparently delete stale objects while navigating for unrelated entries, even though the underlying algorithms can easily support that. If there's a WeakSet etc. it is usually pretty silly and limited)

I didn't support weak ownership of the Payload since I had no use case, but since the usual implementation is that all weakrefs share a single strong-ish reference, it seems viable to add (possibly via shared_exactly_three_times? but that shouldn't be exposed). Those would require a real reference count of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment