Skip to content

Instantly share code, notes, and snippets.

@jwhear
Last active December 21, 2022 21:30
Show Gist options
  • Save jwhear/612fd4c500e1caaa6991dd976048c522 to your computer and use it in GitHub Desktop.
Save jwhear/612fd4c500e1caaa6991dd976048c522 to your computer and use it in GitHub Desktop.

Here's a "fun" challenge for Zig's comptime design. I have implemented the Adaptive Radix Tree and am now looking to generalize it. To save you from having to read the whole paper, it's a radix tree but inner nodes can be one of several types in an attempt to reduce node size and corresponding memory cache issues. My Zig implementation of the paper works fine, but I want to generalize so that, for instance, when using smaller key types, the types of the inner nodes can be changed. For instance, if using []const u5, a fully dense node has only 32 children so the Node48 type doesn't make sense.

Here's a sketch of what the ART types look like with various details elided:

pub const Key = []const u8;
pub const KeySegment = u8;
pub const Payload = usize;
pub const KeySpace = std.math.maxInt(KeySegment) + 1;

pub const Leaf = struct {
    payload: Payload,
    key: Key,
};

pub const NodeType = enum {
    unused,
    leaf,
    node4,
    node16,
    node48,
    node256,
};

pub const unused = Node{ .unused=0 };

pub const Node = union(NodeType) {
    unused: usize, // use this instead of ?Node
    leaf: *Leaf,
    node4: *Node4,
    node16: *Node16,
    node48: *Node48,
    node256: *Node256,
};

pub const InnerHeader = struct {
    num_children: u8 = 0,
};

pub const Node4 = struct {
    header: InnerHeader,
    keys: []usize,
    children: []Node,
}
pub const Node16 = struct {
    header: InnerHeader,
    keys: []usize,
    children: []Node,
}
pub const Node48 = struct {
    header: InnerHeader,
    keys: [KeySpace]u8,
    children: [48]Node,
}
pub const Node256 = struct {
    header: InnerHeader,
    children: [256]Node,
}

I've taken a stab at implementing a fully generalized system which abstracts the inner node types into three type functions: SparseKeys(capacity), DenseKeys(capacity), and DenseChildren(capacity) where the original types above map like so:

const Node4 = SparseKeys(4);
const Node16 = SparseKeys(16);
const Node48 = DenseKeys(48);
const Node256 = DenseChildren(256);

The idea is that you would be able to define an Adaptive Radix Tree something like this:

const MyTree = AdaptiveRadixTree(
    u5, // key segment type
    usize, // payload type
    .{
        .{ .kind=.sparse_keys, .capacity=4 },
        .{ .kind=.dense_keys, .capacity=16 },
        .{ .kind=.dense_children, .capacity=32},
    }
);

I can write a function that generates the std.builtin.Type for the Node union and supporting tag enum, however I run into a recursive definition problem: defining the Node union type requires knowledge of all field types, but those depend on the Node type to define their children arrays. The non-general implementation works because all these types can be entered as top-level declarations, but the generalized version seems to require using functions to generate the types and this seems to be incompatible with mutual recursion.

I'm curious if anyone can suggest a solution that is reasonably Ziggish, e.g. no falling back to the equivalent of void*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment