Here's a "fun" challenge for Zig's comptime design. I have implemented the Adaptive Radix Tree and am now looking to generalize it. To save you from having to read the whole paper, it's a radix tree but inner nodes can be one of several types in an attempt to reduce node size and corresponding memory cache issues. My Zig implementation of the paper works fine, but I want to generalize so that, for instance, when using smaller key types, the types of the inner nodes can be changed. For instance, if using []const u5
, a fully dense node has only 32 children so the Node48
type doesn't make sense.
Here's a sketch of what the ART types look like with various details elided:
pub const Key = []const u8;
pub const KeySegment = u8;
pub const Payload = usize;
pub const KeySpace = std.math.maxInt(KeySegment) + 1;
pub const Leaf = struct {
payload: Payload,
key: Key,
};
pub const NodeType = enum {
unused,
leaf,
node4,
node16,
node48,
node256,
};
pub const unused = Node{ .unused=0 };
pub const Node = union(NodeType) {
unused: usize, // use this instead of ?Node
leaf: *Leaf,
node4: *Node4,
node16: *Node16,
node48: *Node48,
node256: *Node256,
};
pub const InnerHeader = struct {
num_children: u8 = 0,
};
pub const Node4 = struct {
header: InnerHeader,
keys: []usize,
children: []Node,
}
pub const Node16 = struct {
header: InnerHeader,
keys: []usize,
children: []Node,
}
pub const Node48 = struct {
header: InnerHeader,
keys: [KeySpace]u8,
children: [48]Node,
}
pub const Node256 = struct {
header: InnerHeader,
children: [256]Node,
}
I've taken a stab at implementing a fully generalized system which abstracts the inner node types into three type functions: SparseKeys(capacity), DenseKeys(capacity), and DenseChildren(capacity) where the original types above map like so:
const Node4 = SparseKeys(4);
const Node16 = SparseKeys(16);
const Node48 = DenseKeys(48);
const Node256 = DenseChildren(256);
The idea is that you would be able to define an Adaptive Radix Tree something like this:
const MyTree = AdaptiveRadixTree(
u5, // key segment type
usize, // payload type
.{
.{ .kind=.sparse_keys, .capacity=4 },
.{ .kind=.dense_keys, .capacity=16 },
.{ .kind=.dense_children, .capacity=32},
}
);
I can write a function that generates the std.builtin.Type for the Node
union and supporting tag enum, however I run into a recursive definition problem: defining the Node
union type requires knowledge of all field types, but those depend on the Node
type to define their children arrays. The non-general implementation works because all these types can be entered as top-level declarations, but the generalized version seems to require using functions to generate the types and this seems to be incompatible with mutual recursion.
I'm curious if anyone can suggest a solution that is reasonably Ziggish, e.g. no falling back to the equivalent of void*
.