Skip to content

Instantly share code, notes, and snippets.

@dezren39
Forked from kellabyte/holo_fast.md
Created May 10, 2026 21:24
Show Gist options
  • Select an option

  • Save dezren39/c51d21b8f4e3de35035e5b24699d85b0 to your computer and use it in GitHub Desktop.

Select an option

Save dezren39/c51d21b8f4e3de35035e5b24699d85b0 to your computer and use it in GitHub Desktop.
HoloFast: a kernel assisted eBPF fast path for accelerating Accord consensus

HoloFast: a kernel assisted eBPF fast path for accelerating Accord consensus

Core idea

Inspired by Electrode, HoloFast is an Accord acceleration layer for HoloStore’s consensus library, implemented partly as a small fixed-frame eBPF protocol running in the Linux kernel. It lets the kernel take one outgoing Accord message and fan it out to multiple replicas, then watch replies and notify Rust only when enough replicas have responded for quorum. The important Accord decisions stay in Rust; eBPF only handles repetitive packet work like fan-out, fan-in, duplicate filtering, and steering.

Electrode saw up to 128.4% higher throughput and 41.7% lower latency by moving repetitive fan-out and quorum-waiting work into eBPF; HoloFast applies that idea to HoloStore’s Accord path.

1. Background: what Electrode proved and why it matters

Electrode is the closest prior system to HoloFast. It is an NSDI 2023 project that accelerates distributed protocols by moving narrow, repetitive networking operations into eBPF while keeping the actual distributed-protocol logic in user space. Its key observation is that the normal Linux networking stack is attractive because it preserves compatibility, security/isolation, and load-aware CPU behavior, but consensus protocols pay heavily for repeated user/kernel crossings and kernel networking-stack traversal when they broadcast phase messages and collect quorum replies.12

The most important lesson for HoloFast is not "put consensus in the kernel." The lesson is:

Keep correctness-critical protocol logic in user space.
Move only the mechanical packet work into eBPF.
Use fallback whenever packet state becomes too complex.

Electrode applied that model to classic Multi-Paxos using three eBPF optimizations: message broadcasting, fast acknowledging, and wait-on-quorums.3 These are directly relevant to HoloFast because HoloStore's Accord hot path has the same broad mechanical shape: send a phase message to a replica group, receive several replies, count a quorum, and advance the state machine.

1.1 Electrode's reported gains

Electrode reported substantial gains for Multi-Paxos. Under 3, 5, and 7 replicas, it improved maximum throughput by 34.9%, 104.8%, and 128.4%, respectively. The improvement grew with replica count because more replicas mean more leader-side broadcast messages, ACK messages, user/kernel crossings, and kernel-stack traversal to remove.4

For latency, Electrode reported 12.5%, 20.0%, and 25.6% lower median latency under 3, 5, and 7 replicas, respectively, and 11.8%, 24.7%, and 41.7% lower 99th-percentile tail latency. The paper attributes much of the latency reduction to fast acknowledging, which avoids two user/kernel crossings, two kernel-stack traversals, and one user-space wakeup on the follower side for each Multi-Paxos request.4

Electrode's optimization breakdown is especially useful for HoloFast. With 5 replicas, the paper reports that eBPF-based message broadcasting improved maximum throughput by 31.7%, fast acknowledging reduced median latency by 4.3%–12.7% before saturation, and wait-on-quorums improved maximum throughput by 57.7%. This division of labor is exactly how HoloFast should be evaluated: broadcast, quorum aggregation, and any future acknowledgment optimization must be benchmarked independently instead of being treated as one giant feature.5

1.2 Why the findings transfer to HoloStore only partially

The findings transfer because HoloStore/Accord has the same expensive network pattern:

coordinator -> broadcast phase request to replicas
replicas    -> send phase responses
coordinator -> count enough responses for a quorum or fast-path condition
coordinator -> advance to the next phase or commit

The findings do not transfer perfectly because Accord is not classic Multi-Paxos. In HoloStore, a PreAcceptResponse carries dependency and sequence information, and Accord's fast path depends on whether dependencies converge. That means HoloFast should copy Electrode's message broadcasting and wait-on-quorums first, but it should not blindly copy Electrode's fast acknowledging for Accord PreAccept. Replica-side dependency computation belongs in Rust user space, not eBPF.

The safe mapping is:

Electrode finding HoloFast implementation consequence
Broadcasting is expensive and scales with replica count Use TC egress cloning/rewrite so Rust sends one logical phase packet and eBPF fans it out.
Quorum fan-in creates avoidable wakeups Use XDP ingress aggregation so duplicate/non-quorum replies do not always wake Rust.
Fast ACK improves follower-side latency in Multi-Paxos Defer for Accord PreAccept; only consider bounded duplicate/stale ACK cases later.
Benefits grow with replica count Benchmark 3, 5, and 7 replicas separately.
eBPF is modular Ship HoloFast in stages: UDP fixed frame, TC broadcast, XDP quorum, conservative PreAccept candidate aggregation.
Complex cases need fallback Pass large dependencies, recovery, oversized packets, map overflow, and parse failures to Rust/gRPC fallback.

1.3 What Electrode implies for HoloFast's design

Electrode strongly supports the HoloFast design choice to replace gRPC only on the hot consensus path. eBPF cannot elegantly parse HTTP/2/gRPC/protobuf streams at XDP/TC hook points, so HoloFast needs a compact fixed-frame protocol whose header exposes exactly the fields eBPF needs: phase, kind, group ID, membership epoch, range generation, transaction ID, ballot, source replica, status, sequence, and dependency digest.

The core implementation principle should be:

Electrode offloaded packet mechanics.
HoloFast should offload packet mechanics.
Accord correctness stays in Rust.

So HoloFast should use eBPF for:

TC egress broadcast
XDP ingress response classification
quorum bitmaps
kernel-to-user quorum events
duplicate/stale reply filtering
optional packet steering

And HoloFast should keep these in Rust:

dependency calculation
PreAccept fast-path validation
Accept/Commit correctness
recovery
WAL durability
range split/merge correctness
state-machine execution

1.4 Why HoloFast should remain kernel-assisted, not kernel-bypass-first

Electrode also compared its kernel-assisted approach with a kernel-bypassing baseline and found that pure kernel bypass can be faster, but at the cost of a more specialized operating model. That matters for HoloFast because cloud portability and operational simplicity are major design goals. The right first implementation is therefore not DPDK or a custom kernel driver; it is a fixed-frame HoloFast UDP transport plus eBPF/XDP/TC acceleration, with AF_XDP or provider-specific bypass left as optional future backends.6

This gives HoloFast the most useful part of Electrode's result: large reductions in avoidable kernel/user overhead while preserving the normal Linux deployment model and keeping the dangerous correctness-sensitive parts of Accord out of the kernel.


2. Executive summary

HoloStore is already a strong candidate for an Electrode-style optimization because its hot path has the same mechanical shape that the Electrode paper targets: a coordinator sends phase messages to replicas, waits for enough replies, and then advances the consensus state. HoloStore's README describes it as a strongly consistent key/value store built on Accord, using pre-accept -> accept -> commit, dependency tracking, fast-path behavior when dependencies converge, per-partition Accord groups, batched consensus RPCs, a WAL, and range generation IDs.7 HoloStore's current transport is a gRPC transport layer for Accord quorum rounds and read paths, with per-peer batching workers, bounded in-flight concurrency, and queue/latency counters.8 Its proto file explicitly defines hot consensus RPCs such as PreAcceptBatchV2, AcceptBatchV2, CommitBatchV2, and RecoverBatchV2, using packed unary batch envelopes to reduce protobuf wrapper overhead.9

The proposed project, HoloFast, does not turn HoloStore into an eBPF database. Instead, it makes HoloStore an eBPF-friendly Accord database:

Keep in Rust user space:
  Accord safety and liveness logic
  dependency calculation
  fast-path vs slow-path decisions
  recovery
  WAL and durability
  range split/merge correctness
  state-machine execution
  large command transfer

Move or assist with eBPF:
  fixed-header packet parsing
  TC egress broadcast
  XDP ingress quorum aggregation
  duplicate/stale packet suppression
  kernel-to-user quorum events
  optional CPU/NIC queue steering
  metrics for fallback and packet path behavior

The design follows the Electrode paper's division of labor. Electrode found that distributed protocols under the normal Linux networking stack can spend substantial time on user/kernel crossings and kernel stack traversal; it offloaded message broadcasting, fast acknowledging, and wait-on-quorums to eBPF, reporting up to 128.4% higher throughput and 41.7% lower latency for Multi-Paxos.1 Electrode's Figure 1 and Section 4 identify three offloads: message broadcasting, fast acknowledging, and wait-on-quorums.3

For HoloStore/Accord, the safe initial mapping is:

Electrode optimization HoloFast mapping Initial status
Message broadcasting TC egress cloning/rewrite for PreAccept, Accept, and Commit requests Yes
Wait-on-quorums XDP ingress response aggregation with bitmap counting and quorum events Yes
Fast acknowledging Mostly not for PreAccept; possible only for carefully bounded duplicate/late responses or later specialized cases Defer

The important Accord-specific difference is that a PreAcceptResponse is not just a simple ACK. HoloStore's current proto shows PreAcceptResponse carrying ok, seq, deps, and promised; AcceptResponse carries ok and promised; CommitResponse carries ok.10 That means eBPF should not pretend to make Accord decisions. It should accelerate mechanical network work and emit candidate quorum events. Rust user space remains the final authority.


3. Problem statement

3.1 What is expensive today

HoloStore currently makes gRPC calls for hot consensus phases. Even with packed v2 unary batches, the hot path still has layers that are not ideal for packet-level offload:

Accord phase event
  -> Rust async task
  -> batching queue
  -> gRPC method
  -> HTTP/2 framing
  -> protobuf payload
  -> socket send
  -> kernel networking stack
  -> peer kernel networking stack
  -> gRPC/protobuf decode
  -> Rust consensus handler
  -> response path repeats in reverse

This is maintainable and portable, but it hides consensus structure from eBPF. At XDP/TC level, eBPF sees Ethernet/IP/UDP/TCP packets, not PreAcceptBatchV2 or AcceptBatchV2 method calls. That makes eBPF parsing fragile or impossible if gRPC remains the hot path.

Electrode's motivation maps directly to this. The paper argues that standard Linux networking has benefits, but protocol performance suffers from repeated user/kernel crossings and kernel stack traversal; in a five-replica leader-based Multi-Paxos path, the leader has to handle many sends/receives per request.2

3.2 What HoloFast should optimize

HoloFast should optimize these mechanical costs:

  1. Fan-out: one logical Accord phase message must reach several replicas.
  2. Fan-in: the coordinator receives several replies but often needs only a quorum.
  3. Late/duplicate traffic: retransmissions, already-counted replies, wrong epoch, and stale range-generation packets should not always wake the Rust runtime.
  4. CPU locality: packets for the same Accord group should land on the same queue/core when possible.
  5. Tail latency: the Rust runtime should wake when the next useful consensus transition is available, not for every individual packet.

3.3 What HoloFast should not optimize

HoloFast should not attempt to move the following into eBPF:

dependency-set computation
Accord fast-path correctness decisions
ballot/recovery correctness
WAL append or fsync
range split/merge correctness
large command storage
linearizable read barriers
state-machine execution
membership changes

This is non-negotiable. eBPF's programming model is intentionally constrained. Electrode notes the difficulty caused by the verifier and the lack of dynamic memory allocation, and it keeps complex behavior in user space.11 The Linux verifier also enforces program safety, including restrictions around memory access and program behavior.12


4. Design goals

Goal 1: preserve Accord correctness

No eBPF result is trusted as final consensus. eBPF may tell Rust:

"A candidate quorum appears to exist for txn X, phase Y, ballot B, epoch E."

Rust must still validate:

cluster epoch
range generation
membership/electorate
ballot
transaction ID
phase
quorum size
replica uniqueness
response contents
pre-accept dependency data

Goal 2: reduce kernel/user crossings on the hot path

The primary win should come from doing fewer send/recv wakeups per Accord phase. Electrode's message-broadcasting offload uses bpf_clone_redirect() to clone and modify packets in kernel, so the application sends once and the kernel fans out.13 Electrode's wait-on-quorums offload maintains bitsets and forwards only quorum-relevant packets or events to user space.14

Goal 3: make fallback simple and safe

Every eBPF optimization must have a fallback:

parse failure            -> XDP_PASS / TC pass
unknown version          -> pass to user-space socket
map overflow             -> pass and mark fallback counter
epoch mismatch           -> pass or drop only if provably stale
large payload            -> cold path
dependency overflow      -> cold path
unexpected response      -> cold path
verifier/load failure    -> use HoloFast without eBPF or existing gRPC

Goal 4: make performance measurable

The project should be built as an experiment with clear stages:

gRPC baseline
custom HoloFast UDP without eBPF
+ TC broadcast
+ XDP Accept/Commit quorum aggregation
+ bounded PreAccept candidate aggregation
+ optional CPU/AF_XDP steering

Each stage must be benchmarked separately. Electrode's paper breaks down the independent contribution of message broadcasting, fast acknowledging, and wait-on-quorums; HoloFast should copy that methodology.5


5. Proposed architecture

graph LR
    Client[Redis/client frontend] --> Accord[Accord engine]
    Accord --> Transport[Transport trait]
    Transport --> Fast[HoloFast transport]
    Transport --> Grpc[gRPC fallback/control]

    Fast --> UDP[Fixed-frame UDP socket]
    UDP --> TC[TC egress broadcast]
    TC --> Net[Network]
    Net --> XDP[XDP ingress parser]
    XDP --> Quorum[XDP quorum aggregator]
    Quorum --> Ring[BPF ringbuf quorum events]
    Ring --> Runtime[HoloFast event loop]
    Runtime --> Accord

    Grpc --> Control[Membership, recovery, snapshots, large fetches]
Loading

5.1 Two transport lanes

HoloStore should split the transport into two lanes.

Lane A: existing gRPC / control lane

Keep gRPC for:

membership management
range split/merge/rebalance
snapshots
FetchCommand / large command bytes
recovery when fast path lacks data
admin/control APIs
compatibility with old nodes
fallback when eBPF is disabled

This is aligned with HoloStore's current proto, which includes both consensus hot methods and many control/range methods in the same gRPC service.9

Lane B: HoloFast hot consensus lane

Add a custom UDP-like fixed-frame lane for:

PreAccept request/response
Accept request/response
Commit request/response
small command inline payloads
quorum events
heartbeat/health pings if useful

The hot lane should be designed for eBPF parsing. No HTTP/2, no protobuf varints on fields eBPF needs, and no unbounded parsing.


6. HoloFast packet format

6.1 Requirements

The header must be:

fixed-size
8-byte aligned
network-byte-order or explicitly little-endian, but consistent
bounded and verifier-friendly
sufficient for eBPF to route/count/drop safely
versioned
protected by a cheap checksum or header CRC

The header should contain only fields eBPF needs. Variable data should follow after the header and should be parsed only by Rust unless it fits in a small bounded dependency section.

6.2 Proposed fixed header

#[repr(C, packed)]
pub struct HoloFastHeader {
    pub magic: u32,              // 'HOLF'
    pub version: u8,             // protocol version
    pub header_len: u8,          // bytes
    pub phase: u8,               // PREACCEPT, ACCEPT, COMMIT, RECOVER, etc.
    pub kind: u8,                // REQUEST, RESPONSE, QUORUM_EVENT, NACK

    pub flags: u16,              // BROADCAST, INLINE_CMD, HAS_DEPS, FALLBACK, etc.
    pub payload_len: u16,        // bytes after header
    pub header_crc32: u32,       // optional, can be zero in early prototype

    pub cluster_id: u64,         // stable cluster fingerprint
    pub membership_epoch: u64,   // cluster/range membership version
    pub range_generation: u64,   // HoloStore range ownership epoch
    pub group_id: u64,           // Accord group/shard

    pub from_node: u64,
    pub to_node: u64,            // 0 or broadcast marker for logical broadcast
    pub coordinator_node: u64,

    pub txn_origin_node: u64,
    pub txn_counter: u64,

    pub ballot_counter: u64,
    pub ballot_node: u64,

    pub seq: u64,                // Accord sequence/timestamp component used by HoloStore
    pub proposed_seq: u64,       // optional phase-specific value; zero if unused

    pub command_digest_hi: u64,
    pub command_digest_lo: u64,
    pub deps_digest_hi: u64,
    pub deps_digest_lo: u64,

    pub deps_count: u16,
    pub status: u8,              // ok/promised/reject class
    pub quorum_class: u8,        // simple, fast, recovery, commit-ack, etc.
    pub reserved: u32,
}

6.3 Why include range_generation

HoloStore's design notes say range generations disambiguate writes across split/merge cutovers and prevent late commands from an old range/group from overwriting newer child-range values.15 The HoloFast header should carry range_generation so eBPF and Rust can reject or fallback stale traffic early.

6.4 Why include deps_digest

Accord fast-path behavior depends on dependency convergence. HoloStore's README says Accord fast path is possible when dependencies converge without conflicts.7 eBPF should not compute dependencies, but it can compare fixed-size digests. If a quorum of PreAcceptResponses agrees on (ok, seq, deps_digest, promised), eBPF can emit a candidate event. Rust must still verify the actual dependency set.

6.5 Payload layout

The payload should be phase-specific and length-bounded.

PREACCEPT_REQUEST payload:
  optional inline command if <= HOLOFAST_INLINE_CMD_MAX
  otherwise command_digest only, command bytes fetched through cold path

PREACCEPT_RESPONSE payload:
  optional bounded normalized deps list
  if deps list too large, set FALLBACK_REQUIRED and pass to user space

ACCEPT_REQUEST payload:
  optional deps list or digest + cold fetch reference
  optional command bytes if inline

ACCEPT_RESPONSE payload:
  usually empty; header status/promised is enough

COMMIT_REQUEST payload:
  optional command bytes or digest
  committed seq/deps metadata

COMMIT_RESPONSE payload:
  usually empty

6.6 MTU rule

The hot eBPF path should target single-packet messages. Electrode explicitly targets UDP protocols with application-level retransmission and notes that this works well for Paxos-style messages that are small enough to fit in one packet in datacenter environments.16

Recommended constants:

HOLOFAST_MTU_BUDGET          = 1200 bytes initial safe target
HOLOFAST_INLINE_CMD_MAX      = 512 bytes initial
HOLOFAST_INLINE_DEPS_MAX     = 8 or 16 TxnIds initial
HOLOFAST_MAX_REPLICAS        = 9 or 15 initial, compile-time bounded
HOLOFAST_MAX_INFLIGHT        = configurable, e.g. 64k quorum states

If payload exceeds the hot-path budget, use:

HoloFast header + digest
FetchCommand or recovery over gRPC/control lane

HoloStore already has command_digest and has_command fields in its current AcceptRequest, CommitRequest, and RecoverResponse, so digest-based command fetch is already conceptually present.10


7. eBPF program design

7.1 Program 1: tc_egress_broadcast

Hook: TC egress on the HoloStore network interface.
Purpose: turn one logical broadcast packet into N per-peer packets.

Input:

A HoloFast packet with:
  flags contains BROADCAST
  to_node = BROADCAST_MARKER
  phase in {PREACCEPT, ACCEPT, COMMIT}
  kind = REQUEST

Map reads:

GroupMembershipMap[(cluster_id, membership_epoch, group_id)] -> GroupMembership
NodeAddrMap[node_id] -> ethernet/IP/UDP destination
FeatureSwitchMap -> enabled/disabled per phase

Behavior:

1. Parse Ethernet/IP/UDP/HoloFast header.
2. Verify magic, version, length, and cluster_id.
3. Look up group membership by group_id and membership_epoch.
4. For every peer in the bounded membership list:
   - clone packet
   - rewrite destination MAC/IP/UDP port
   - set to_node = peer_id
   - update checksums
   - bpf_clone_redirect()
5. Let the original packet either go to one peer or drop it after clones, depending on implementation.
6. Increment counters.
7. If any lookup fails, pass packet unchanged to user-space fallback path.

Electrode's message broadcasting design is the direct inspiration: it replaces repeated user-space sends with in-kernel packet clones and rewrites, reducing user/kernel crossings and stack traversal for one-to-many phase messages.13

7.2 Program 2: xdp_ingress_dispatcher

Hook: XDP ingress.
Purpose: very quickly classify HoloFast packets.

Behavior:

1. Parse L2/L3/L4 headers with strict bounds checks.
2. Ignore non-HoloFast traffic with XDP_PASS.
3. Validate magic, version, header_len, payload_len.
4. Check feature switch and protocol version.
5. Dispatch by (kind, phase):
   - RESPONSE + ACCEPT  -> xdp_accept_quorum
   - RESPONSE + COMMIT  -> xdp_commit_quorum
   - RESPONSE + PREACCEPT -> xdp_preaccept_candidate_quorum
   - REQUEST paths -> pass to Rust handler unless future safe handler exists
6. Unknown or unsupported packets -> XDP_PASS.

XDP is appropriate because it runs on ingress packets early, before the expensive socket-buffer path in many cases; XDP programs can pass, drop, redirect, or manipulate packets.17

7.3 Program 3: xdp_accept_quorum

Hook: tail-called from dispatcher.
Purpose: count AcceptResponses and notify Rust when a quorum is reached.

Key:

#[repr(C)]
pub struct QuorumKey {
    cluster_id: u64,
    membership_epoch: u64,
    range_generation: u64,
    group_id: u64,
    txn_origin_node: u64,
    txn_counter: u64,
    phase: u8,
    ballot_counter: u64,
    ballot_node: u64,
}

State:

#[repr(C)]
pub struct QuorumState {
    seen_bitmap: u64,
    ok_bitmap: u64,
    reject_bitmap: u64,
    first_seen_ns: u64,
    quorum_threshold: u8,
    emitted: u8,
    status: u8,
    promised_counter: u64,
    promised_node: u64,
}

Behavior:

1. Verify from_node is a member of group_id at membership_epoch.
2. Compute bit = member_index(from_node).
3. If bit already seen, drop or pass depending on debug mode.
4. Set seen bit.
5. If status=OK, set ok bit.
6. If status=REJECT/PROMISED_HIGHER, set reject bit and either pass immediately or emit rejection event.
7. If popcount(ok_bitmap) >= simple_quorum_threshold and emitted=0:
   - set emitted=1
   - write QuorumEvent to BPF ringbuf
   - optionally pass this packet to user space
8. If not quorum-reaching, drop packet in optimized mode or pass in debug mode.

Electrode's wait-on-quorums design uses bitsets rather than a simple counter to avoid double-counting duplicate ACKs, and forwards only quorum-reaching or overflow-relevant packets/events to user space.14

7.4 Program 4: xdp_commit_quorum

This is nearly identical to xdp_accept_quorum, but the payload is simpler because CommitResponse is only ok in the current HoloStore proto.10

Possible policies:

strict: wait for commit quorum event before completing coordinator future
relaxed: commit broadcast is fire-and-forget after WAL/consensus decision, only metrics are collected
hybrid: wait for quorum under debug/testing, fire-and-forget in production if Accord/HoloStore permits

This policy must be decided by HoloStore's existing correctness requirements. If commit replies are only liveness/diagnostic acknowledgments, they are an ideal eBPF aggregation target.

7.5 Program 5: xdp_preaccept_candidate_quorum

This is the most valuable but riskiest offload. It must be conservative.

PreAccept responses include dependency information, and Accord fast-path correctness depends on dependency/timestamp agreement. HoloStore's proto says PreAcceptResponse contains ok, seq, deps, and promised.10 The Accord whitepaper's consensus algorithm has PreAcceptOK responses containing timestamp/dependency data and uses fast-path criteria before skipping Accept.18

Safe initial behavior

Only aggregate PreAccept responses when all of these are true:
  ok == true
  promised is empty/zero or equals expected ballot policy
  deps_count <= HOLOFAST_INLINE_DEPS_MAX
  deps payload is present in normalized fixed-bounded form
  seq/proposed timestamp matches the candidate group
  deps_digest matches candidate group
  phase/ballot/epoch/range_generation all match

If any condition fails:

mark fallback_required for this txn
pass all future PreAccept responses for this txn to user space
let Rust perform normal Accord logic

Candidate event

#[repr(C)]
pub struct PreAcceptCandidateEvent {
    pub key: QuorumKey,
    pub seen_bitmap: u64,
    pub ok_bitmap: u64,
    pub seq: u64,
    pub deps_digest_hi: u64,
    pub deps_digest_lo: u64,
    pub deps_count: u16,
    pub deps_inline: [TxnIdWire; HOLOFAST_INLINE_DEPS_MAX],
    pub event_flags: u32, // candidate_only, fallback_required, overflow, etc.
}

Rust validation:

1. Look up the proposal future by txn_id.
2. Validate the event's epoch/generation/ballot/group/phase.
3. Validate that the bitmap satisfies the required Accord fast quorum/electorate.
4. Recompute deps_digest from inline deps.
5. Confirm that inline deps are complete for this fast-path case.
6. Only then advance the proposal as if those responses were received.

This gives eBPF a useful fast path for the common low-contention case where deps are empty or tiny. It avoids making eBPF responsible for computing or interpreting dependency sets.

7.6 Program 6: optional packet steering

After the basic offloads work, add packet steering:

group_id -> CPU queue / worker
coordinator_node -> queue
phase -> queue class

XDP can redirect packets using map-backed mechanisms such as CPUMAP and XSKMAP; XSKMAP can redirect frames to AF_XDP sockets without traversing the full network stack.19 This is a later optimization, not the first milestone.


8. BPF map layout

8.1 Required maps

FeatureSwitchMap
  key: feature_id
  value: enabled/version/config bits

NodeAddrMap
  key: node_id
  value: mac, ip, udp_port, queue_hint

GroupMembershipMap
  key: cluster_id + membership_epoch + group_id
  value: member_count, node_ids[], quorum_thresholds, fast_electorate bitmap

QuorumStateMap
  key: QuorumKey
  value: QuorumState
  type: LRU hash or bounded hash

PreAcceptStateMap
  key: QuorumKey
  value: PreAcceptCandidateState
  type: bounded hash

Ringbuf
  producer: eBPF
  consumer: Rust HoloFast event loop

CountersMap
  key: counter_id
  value: atomic counter

FallbackMap / DebugMap
  key: reason
  value: count

The BPF ring buffer is a good fit for kernel-to-user events; Linux's BPF ring buffer docs describe it as a mechanism for BPF programs to communicate event records to user space.20

8.2 State cleanup

HoloFast must not leak quorum states. Use three cleanup mechanisms:

1. User-space deletion after proposal completes.
2. TTL sweep in Rust: periodically delete old QuorumKey entries.
3. LRU map fallback: if state is evicted, eBPF passes packets to user space.

8.3 Overflow policy

Overflow must be safe:

QuorumStateMap full       -> pass packet, increment overflow counter
Ringbuf full              -> pass packet, increment ringbuf_drop counter
deps too large            -> pass packet, mark preaccept_fallback
membership missing        -> pass packet
unknown replica           -> pass packet or drop if strict anti-spoofing is enabled

Electrode also forwards messages to user space when fixed-size in-kernel structures cannot handle the case.21


9. Rust integration in HoloStore

9.1 Crate/module layout

Proposed layout:

crates/
  holo_fast/
    Cargo.toml
    src/
      lib.rs
      wire.rs              # shared wire structs and constants
      encode.rs
      decode.rs
      transport.rs         # FastTransport implementing Accord Transport
      event_loop.rs        # ringbuf consumer and proposal wakeups
      membership.rs        # BPF map sync
      metrics.rs
      fallback.rs
    ebpf/
      Cargo.toml           # if using Aya
      src/
        main.rs            # XDP/TC programs
        maps.rs
        parse.rs
        quorum.rs

crates/holo_store/src/
  transport.rs             # keep GrpcTransport; add selection wrapper
  fast_transport_adapter.rs # optional thin adapter

Alternative if using libbpf/CO-RE:

crates/holo_fast/bpf/
  holo_fast.bpf.c
  holo_fast.h
  build.rs

9.2 Transport trait integration

The existing Accord engine already calls a transport abstraction for pre_accept, accept, commit, and recover based on the current GrpcTransport comments.8 Add a FastTransport implementation with the same semantic API.

Conceptual API:

#[async_trait]
impl Transport for FastTransport {
    async fn pre_accept(
        &self,
        peer: NodeId,
        req: PreAcceptRequest,
    ) -> anyhow::Result<PreAcceptResponse>;

    async fn accept(
        &self,
        peer: NodeId,
        req: AcceptRequest,
    ) -> anyhow::Result<AcceptResponse>;

    async fn commit(
        &self,
        peer: NodeId,
        req: CommitRequest,
    ) -> anyhow::Result<CommitResponse>;

    async fn recover(
        &self,
        peer: NodeId,
        req: RecoverRequest,
    ) -> anyhow::Result<RecoverResponse>;
}

But the optimization becomes larger if HoloStore adds a group-aware broadcast API instead of calling per peer:

#[async_trait]
pub trait BroadcastTransport: Transport {
    async fn pre_accept_group(
        &self,
        group: GroupId,
        req: PreAcceptRequest,
        quorum: QuorumSpec,
    ) -> anyhow::Result<PreAcceptQuorumResult>;

    async fn accept_group(
        &self,
        group: GroupId,
        req: AcceptRequest,
        quorum: QuorumSpec,
    ) -> anyhow::Result<AcceptQuorumResult>;

    async fn commit_group(
        &self,
        group: GroupId,
        req: CommitRequest,
        policy: CommitAckPolicy,
    ) -> anyhow::Result<CommitResult>;
}

This is the cleaner interface. It exposes Accord's true operation — broadcast and wait for quorum — instead of hiding it behind per-peer RPCs.

9.3 Transport selection

Add a runtime selection enum:

pub enum ConsensusTransportKind {
    Grpc,
    HoloFastNoBpf,
    HoloFastBpfBroadcast,
    HoloFastBpfQuorum,
    HoloFastBpfFull,
}

Configuration:

HOLO_CONSENSUS_TRANSPORT=grpc|holofast|holofast-bpf
HOLO_FAST_ENABLE_TC_BROADCAST=true|false
HOLO_FAST_ENABLE_XDP_QUORUM=true|false
HOLO_FAST_ENABLE_PREACCEPT_CANDIDATE=true|false
HOLO_FAST_DEBUG_PASS_ALL=true|false
HOLO_FAST_INLINE_CMD_MAX=512
HOLO_FAST_INLINE_DEPS_MAX=8
HOLO_FAST_MAX_INFLIGHT=65536

9.4 Event loop

Rust must consume ringbuf events and wake proposal futures.

pub enum HoloFastEvent {
    AcceptQuorum(AcceptQuorumEvent),
    CommitQuorum(CommitQuorumEvent),
    PreAcceptCandidate(PreAcceptCandidateEvent),
    Fallback(FallbackEvent),
}

Event loop pseudocode:

loop {
    let event = ringbuf.poll(timeout)?;
    match event {
        HoloFastEvent::AcceptQuorum(e) => {
            if validate_accept_event(&e, membership, proposals) {
                proposals.complete_accept_quorum(e.key, e.ok_bitmap, e.promised);
            } else {
                metrics.invalid_bpf_event += 1;
                proposals.force_fallback(e.key);
            }
        }
        HoloFastEvent::CommitQuorum(e) => { ... }
        HoloFastEvent::PreAcceptCandidate(e) => {
            if validate_preaccept_candidate(&e, membership, proposals) {
                proposals.complete_preaccept_candidate(e);
            } else {
                proposals.force_fallback(e.key);
            }
        }
        HoloFastEvent::Fallback(e) => {
            proposals.force_fallback(e.key);
        }
    }
}

9.5 Membership sync

When HoloStore commits a membership/range change:

1. Rust updates normal HoloStore membership state.
2. Rust builds a new GroupMembershipMap entry with new membership_epoch.
3. Rust loads node address records into NodeAddrMap.
4. Rust flips FeatureSwitchMap for the new epoch.
5. Old epoch remains for a grace period.
6. After outstanding proposals expire, Rust removes old epoch map entries.

Never let eBPF invent membership. It only reads a Rust-owned map.

9.6 Fallback path

FastTransport should retain a normal socket receive path. If eBPF passes a packet, Rust decodes and handles it as a normal HoloFast packet. If HoloFast fails entirely, the transport falls back to gRPC.

Fallback triggers:

large command
large dependency list
missing BPF map entry
quorum map overflow
ringbuf overflow
packet parse error
unsupported version
preaccept mismatch
recovery path
membership transition
network loss/reordering beyond hot-path assumptions

10. Phase-by-phase consensus mapping

10.1 PreAccept

Accord/HoloStore semantics:

coordinator sends PreAccept to participating replicas/electorate
replicas compute conflict/dependency/timestamp information
coordinator may fast-path if responses converge
otherwise coordinator moves to Accept

HoloFast behavior:

TC broadcast:
  yes, strong fit

XDP quorum aggregation:
  yes, but candidate-only and conservative

Fast ACK:
  no initial support; unsafe because response requires actual dependency computation

Fast-path case:

1. Rust serializes one PreAccept request with BROADCAST flag.
2. TC clones it to all fast-electorate members.
3. Replicas process in Rust and send PreAcceptResponse with fixed header.
4. Coordinator XDP sees responses.
5. If responses match compact fast-path conditions, XDP emits PreAcceptCandidateEvent.
6. Rust validates and advances to Commit or Accept as Accord requires.
7. If mismatch/overflow, Rust receives normal responses and executes existing logic.

10.2 Accept

Accord/HoloStore semantics:

coordinator asks replicas to accept the chosen seq/deps/ballot
replicas respond ok or promised-higher
coordinator waits for simple quorum

HoloFast behavior:

TC broadcast:
  yes

XDP quorum aggregation:
  yes, strong fit

Fast ACK:
  no initial support; user-space replica must still update protocol state

This phase is easier than PreAccept because AcceptResponse is compact.

10.3 Commit

Accord/HoloStore semantics:

coordinator disseminates decided commit
replicas append/record/apply according to HoloStore's durability/execution design

HoloStore's design notes say the WAL is the authoritative durability source and commits are appended before apply; commit-log appends are batched to amortize syscalls and fsync.15

HoloFast behavior:

TC broadcast:
  yes, strong fit

XDP quorum aggregation:
  yes, if commit ACKs are required

Fast ACK:
  no if ACK implies durable commit; maybe no ACK needed depending on current semantics

10.4 Recover

Keep recovery on gRPC/control lane initially.

Reason:

variable state
large command transfer
ballot edge cases
must handle old/incomplete transactions
higher correctness risk

Electrode similarly keeps complex, uncommon cases in user space rather than forcing them into eBPF.11


11. Implementation roadmap

Phase 0: profiling and evidence

Before changing transport, collect evidence.

Metrics:

transport queue wait
per-phase p50/p95/p99/p999 latency
send/recv syscall rate
context switches
softirq CPU
kernel vs user CPU
protobuf/gRPC encode/decode CPU
batch size distribution
in-flight requests per lane
WAL latency
dependency conflict rate
fast-path success rate

Tools:

existing HOLOSTATS / HOLOMETRICS
eBPF tracing for syscalls and sched wakeups
perf/flamegraph
packet counters
runtime task latency counters

Exit criteria:

clear evidence that consensus network transport overhead is material
or clear evidence that WAL/storage/dependency conflicts dominate and transport work should wait

Phase 1: HoloFast without eBPF

Build the fixed-frame UDP transport first. This answers the question: how much is gRPC/protobuf/HTTP2 costing by itself?

Tasks:

1. Add holo_fast::wire structs.
2. Add encode/decode with exhaustive bounds checks.
3. Add UDP socket send/recv path.
4. Implement per-peer sends matching existing Transport trait.
5. Keep all batching in Rust.
6. Run linearizability tests.
7. Benchmark vs gRPC.

Exit criteria:

HoloFastNoBpf is correct under local 3-node tests
HoloFastNoBpf can run existing benchmarks
fallback to gRPC works

Phase 2: TC egress broadcast

Tasks:

1. Add group-aware broadcast API.
2. Add BPF maps for node addresses and group membership.
3. Load TC egress program.
4. Send one BROADCAST packet per phase.
5. Verify every replica receives exactly one packet.
6. Test with 3/5/7 replicas.

Exit criteria:

send syscall count drops for broadcast phases
correctness unchanged
packet loss/retransmission handled by user space

Phase 3: XDP Accept/Commit quorum aggregation

Tasks:

1. Add QuorumKey and QuorumState maps.
2. Add ringbuf events.
3. Aggregate AcceptResponse and CommitResponse.
4. Drop duplicate/non-quorum ACKs in optimized mode.
5. Add debug mode that passes all packets while still producing events.
6. Compare results between debug and optimized modes.

Exit criteria:

receive wakeups drop for Accept/Commit replies
quorum events match user-space counted quorum in debug mode
linearizability passes with aggregation enabled

Phase 4: bounded PreAccept candidate aggregation

Tasks:

1. Add normalized bounded deps encoding.
2. Add deps_digest computation shared by Rust and BPF-visible header.
3. Aggregate only matching compact PreAccept responses.
4. Fallback on mismatch, overflow, reject, higher promised ballot, or missing deps.
5. Verify against user-space shadow counter in debug mode.

Exit criteria:

zero divergence between BPF candidate and Rust shadow validation
fast-path low-contention workloads show fewer wakeups
contention workloads safely fallback

Phase 5: packet steering / AF_XDP optional path

Tasks:

1. Add group_id -> queue/core policy.
2. Evaluate CPUMAP or XSKMAP routing.
3. Optionally add AF_XDP sockets for hot HoloFast packets.
4. Compare against XDP+normal socket mode.

AF_XDP can provide partial or full kernel bypass for selected packets, but it increases complexity and changes the operational model.22

Phase 6: optional follower-side special cases

Only after the first five phases are stable.

Possible future offloads:

duplicate Accept/Commit ACK for already-seen retransmissions
stale epoch/range-generation drop
fast NACK for obviously old ballot if Rust has synced promised ballot state

Do not initially implement PreAccept fast acknowledging. Accord dependency computation belongs in Rust.


12. Correctness strategy

12.1 Invariant: eBPF is advisory

Rust should be able to ignore every eBPF event and still make progress through fallback. This protects correctness if:

BPF program unloads
map is full
ringbuf event is dropped
kernel lacks required helper
packet format changes
membership changes mid-flight

12.2 Shadow validation mode

Add HOLO_FAST_DEBUG_PASS_ALL=true:

XDP produces quorum events
XDP does not drop any HoloFast replies
Rust receives all replies normally
Rust compares its normal quorum result with the BPF event
mismatches panic in test / disable feature in production

Run this mode in CI/integration tests before optimized dropping is allowed.

12.3 Deterministic packet parser tests

Test corpus:

valid packets for every phase
short Ethernet/IP/UDP frames
bad header_len
bad payload_len
unknown version
wrong magic
unsupported phase
wrong epoch
wrong range_generation
spoofed from_node
duplicate replies
out-of-order replies
large deps overflow
map overflow
ringbuf overflow

12.4 Linearizability and model tests

Keep using HoloStore's existing Porcupine linearizability check. The README already includes make check-linearizability and a configurable script for workload testing.7

Add specific tests:

3-node no loss
5-node no loss
7-node no loss
single node failure
leader/coordinator failure mid-phase
membership epoch change while proposals in flight
range split during outstanding packets
packet duplication
packet reordering
packet loss
large command fallback
large dependency fallback
mixed gRPC and HoloFast nodes if compatibility is required

12.5 Formal-ish safety review checklist

For every eBPF drop decision, answer:

Can dropping this packet hide information Rust needs for safety?
Can dropping this packet prevent liveness without timeout fallback?
Can a duplicate packet be mistaken for a unique replica vote?
Can a stale epoch packet be counted for a new epoch?
Can a stale range-generation packet affect a new range?
Can a malicious or buggy node spoof from_node?
Can map eviction create a false quorum?
Can ringbuf loss create a false completion?

False quorum must be impossible. Lost event is acceptable if fallback/timeout handles it.


13. Benchmark plan

13.1 Configurations

Benchmark at least:

A. gRPC current baseline
B. HoloFast UDP, no eBPF
C. HoloFast + TC broadcast
D. HoloFast + TC broadcast + XDP Accept/Commit quorum
E. HoloFast + bounded PreAccept candidate aggregation
F. Optional AF_XDP / packet steering

13.2 Replica counts

3 replicas
5 replicas
7 replicas

Electrode's improvements grew with replica count because broadcast and quorum fan-in costs grow with the number of replicas.4

13.3 Workloads

low-contention SET
hot-key SET with high dependency conflicts
mixed GET/SET
small values
large values requiring command fetch fallback
high pipeline Redis benchmark
single client latency-sensitive workload
many-client throughput workload
failure/retransmission workload
range split/merge workload if implemented

13.4 Metrics

throughput ops/s
p50/p95/p99/p999 latency
kernel CPU
user CPU
softirq CPU
send syscalls/s
recv syscalls/s
context switches/s
packets passed/dropped/redirected by XDP
TC clone count
ringbuf events/s
ringbuf drops
quorum map occupancy
fallback ratio by reason
fast-path success ratio
WAL latency distribution

13.5 Expected outcomes

Do not assume Electrode's exact gains will transfer. HoloStore uses Accord, not Multi-Paxos; it has dependency computation and WAL behavior. But the hypothesis is:

TC broadcast should improve throughput as replica count increases.
XDP Accept/Commit aggregation should reduce coordinator wakeups.
PreAccept candidate aggregation should help low-contention, small-deps workloads.
Transport wins will be smaller when WAL fsync, storage, or dependency conflicts dominate.

14. Tradeoffs

14.1 Keep gRPC everywhere vs split hot/cold transports

Option Pros Cons
Keep gRPC everywhere Simple, portable, mature, easier debugging Poor fit for packet-level eBPF, HTTP/2/protobuf overhead remains, hard to clone/aggregate phase packets
Split hot/cold transport eBPF can parse fixed headers, easier broadcast/quorum offload, keeps gRPC for complex paths More code, compatibility matrix, new packet format, new operational/debug burden

Recommendation: split transports. Keep gRPC for control/recovery and use HoloFast only for hot consensus phases.

14.2 UDP datagrams vs TCP

Option Pros Cons
UDP/fixed datagrams eBPF-friendly, single-packet parse, easy XDP/TC logic, matches Electrode model Must implement retransmission, duplicate handling, path MTU discipline, loss/reordering behavior
TCP/custom framing Reliable stream, familiar operational behavior Harder for XDP to parse message boundaries, packet fragmentation/coalescing complicates eBPF, less Electrode-like

Recommendation: use UDP-like datagrams for the hot lane with application-level retransmission. Keep recovery and large data on gRPC.

14.3 eBPF quorum aggregation vs Rust-only quorum aggregation

Option Pros Cons
Rust-only Simpler, easier correctness reasoning Every reply wakes user space; cost grows with replicas
eBPF aggregation Fewer wakeups, fewer kernel/user crossings, better fit for high replica counts BPF map complexity, verifier constraints, fallback paths required

Recommendation: start with Accept/Commit aggregation, then add conservative PreAccept candidate aggregation.

14.4 Drop non-quorum replies vs pass all replies

Option Pros Cons
Pass all replies Easiest correctness debugging Smaller performance gain
Drop duplicate/non-quorum replies Maximum wakeup reduction Must prove Rust does not need the dropped packets

Recommendation: implement debug shadow mode first. In production, drop only for phases where header data is sufficient, such as duplicate ACKs and compact Accept/Commit responses. For PreAccept, drop only after the event contains all required bounded data or use fallback.

14.5 Dependency digest vs full dependency data in eBPF

Option Pros Cons
Digest only Very easy for eBPF to compare Rust may not have actual dependencies if packets were dropped
Bounded inline deps Rust can validate compact fast-path event Needs fixed max, fallback when deps are large
Full variable deps More complete Bad fit for eBPF verifier and bounded parsing

Recommendation: use digest + bounded inline normalized deps. Fallback when deps exceed the bound.

14.6 Aya vs libbpf-rs/CO-RE

Option Pros Cons
Aya Rust-native, nice integration with HoloStore, shared Rust structs easier Production CO-RE/kernel compatibility may require careful testing
libbpf-rs + C eBPF Mature CO-RE workflow, closer to many production BPF examples Mixed Rust/C build, more FFI/build complexity

Recommendation: prototype with Aya if Rust-native elegance matters most; use libbpf-rs/CO-RE if deployment stability across kernels becomes the priority.

14.7 Normal XDP/TC vs AF_XDP vs DPDK

Option Pros Cons
XDP/TC + normal sockets Kernel-native, no busy polling, simpler ops, closest to Electrode Not as fast as kernel bypass
AF_XDP More performance potential, partial kernel bypass More complex socket/queue management
DPDK Highest ceiling Busy polling, dedicated cores, custom network stack burden

Electrode compares its kernel-native approach with kernel bypass and notes the tradeoff: eBPF retains kernel-networking benefits but does not beat pure kernel bypass in raw performance.6

Recommendation: implement XDP/TC first. Consider AF_XDP only after measurements show normal sockets remain the bottleneck.

14.8 eBPF fast acknowledging for Accord

Option Pros Cons
Copy Electrode fast ACK directly Potential latency win Unsafe for Accord PreAccept because dependencies must be computed by the replica
Do not fast ACK Correctness is simpler Leaves some latency win unrealized
Fast ACK only special cases Some win for duplicates/stale messages Requires careful state sync from Rust to BPF

Recommendation: do not implement general fast ACK at first. Add only duplicate/stale special cases after the rest is validated.


15. Operational considerations

15.1 Permissions and deployment

eBPF program loading usually requires elevated privileges or specific capabilities. HoloStore should support:

--transport grpc
--transport holofast
--transport holofast-bpf

If eBPF load fails:

log exact reason
continue with HoloFastNoBpf or gRPC depending on config
export metric holo_fast_bpf_load_failed=1

15.2 Versioning

Use strict wire versioning:

magic = HOLF
version = 1
feature bitmap
min/max supported version in node handshake

Unknown versions must pass to user space or fall back to gRPC.

15.3 Observability

Add metrics:

holo_fast_packets_rx_total{phase,kind}
holo_fast_packets_tx_total{phase,kind}
holo_fast_bpf_broadcast_clones_total
holo_fast_bpf_quorum_events_total{phase}
holo_fast_bpf_dropped_replies_total{phase,reason}
holo_fast_bpf_fallback_total{reason}
holo_fast_bpf_map_overflow_total{map}
holo_fast_preaccept_candidate_total
holo_fast_preaccept_candidate_rejected_total{reason}
holo_fast_transport_mode

15.4 Rollback

Feature flags should allow rollback without rebuild:

Disable PreAccept candidate aggregation.
Disable XDP quorum aggregation.
Disable TC broadcast.
Disable all eBPF but keep HoloFast UDP.
Disable HoloFast and use gRPC.

16. Risks and mitigations

Risk Mitigation
False quorum event Rust validation; shadow mode; bitset per replica; membership epoch in key
Dropping data Rust needs Drop only when event contains sufficient data; PreAccept conservative fallback
Membership/range race Include membership epoch and range generation in every packet and map key
Map overflow Pass to user space; bounded inflight; metrics and alerts
Ringbuf overflow Pass packet; event loss never completes proposal by itself
Packet spoofing Validate node address/member mapping; optionally authenticate packets later
Kernel incompatibility Version-gate helpers; fallback to no-BPF/gRPC
Debuggability Debug pass-all mode; packet trace logs; per-reason counters
Performance regression Stage-by-stage benchmarks; feature flags
Complexity creep Keep recovery/WAL/deps computation out of eBPF

17. Concrete first pull requests

PR 1: holo_fast::wire

Deliverables:

wire structs
encode/decode
fuzz tests
packet parser tests
feature flags
no runtime behavior change

PR 2: HoloFast UDP transport without eBPF

Deliverables:

FastTransport implementing existing Transport trait
UDP send/recv
timeouts/retransmission
gRPC fallback
benchmark mode
linearizability pass

PR 3: group-aware broadcast API

Deliverables:

BroadcastTransport trait
Accord coordinator uses group broadcast when available
fallback adapter sends per peer over old Transport
no eBPF yet

PR 4: TC egress broadcast

Deliverables:

BPF loader
membership maps
TC program
clone counters
debug packet capture test

PR 5: XDP Accept/Commit quorum aggregation

Deliverables:

XDP dispatcher
quorum maps
ringbuf event loop
shadow validation mode
optimized drop mode behind flag

PR 6: bounded PreAccept candidate aggregation

Deliverables:

deps digest
bounded normalized deps encoding
candidate quorum event
fallback reasons
shadow validation

18. Definition of success

The project is successful if:

Correctness:
  all existing linearizability tests pass
  failure/retransmission tests pass
  shadow mode shows zero BPF/Rust quorum divergence

Performance:
  send syscall count drops for broadcast phases
  recv wakeups drop for Accept/Commit quorum phases
  p99 latency improves on low-contention small-message workloads
  throughput improves as replica count grows

Operability:
  feature flags can disable each optimization
  fallback is automatic and visible in metrics
  gRPC control/recovery lane remains available

The project is not successful if the only measurable bottleneck is WAL/fsync/storage/dependency conflict rate. In that case, the custom transport may still be useful, but eBPF should not be prioritized until the actual bottleneck moves back to networking.


19. References

Footnotes

  1. Yang Zhou, Zezhou Wang, Sowmya Dharanipragada, and Minlan Yu, "Electrode: Accelerating Distributed Protocols with eBPF," NSDI 2023. The abstract states that Electrode executes optimizations in the kernel before the networking stack and reports up to 128.4% throughput improvement and 41.7% latency improvement for classic Multi-Paxos. Paper: https://www.usenix.org/system/files/nsdi23-zhou.pdf 2

  2. Electrode paper, Section 1 and Table 1, discussing overhead from user/kernel crossings and kernel networking stack traversal in Paxos deployments. https://www.usenix.org/system/files/nsdi23-zhou.pdf 2

  3. Electrode paper, Figure 1 and Section 4, describing offloads for message broadcasting, fast acknowledging, and wait-on-quorums. https://www.usenix.org/system/files/nsdi23-zhou.pdf 2

  4. Electrode paper, Section 7.1, reporting larger throughput improvements as replica count increases. https://www.usenix.org/system/files/nsdi23-zhou.pdf 2 3

  5. Electrode paper, Section 7.2 and Figure 6, breaking down message broadcasting, fast acknowledging, and wait-on-quorums contributions. https://www.usenix.org/system/files/nsdi23-zhou.pdf 2

  6. Electrode paper, Section 7.5, comparing Electrode with kernel bypass and discussing the performance/operational tradeoff. https://www.usenix.org/system/files/nsdi23-zhou.pdf 2

  7. HoloStore README, describing HoloStore as a strongly consistent key/value store built on Accord with pre-accept, accept, commit, dependency tracking, fast path, per-partition Accord groups, WAL, range generation IDs, Redis-compatible benchmarking, and linearizability checks. https://github.com/kellabyte/holostore 2 3

  8. HoloStore transport.rs, whose top-level comments describe the current gRPC transport layer for Accord quorum rounds/read paths, per-peer batching workers, bounded in-flight RPC concurrency, and telemetry. https://raw.githubusercontent.com/kellabyte/holostore/main/crates/holo_store/src/transport.rs 2

  9. HoloStore holo.proto, which defines the gRPC service and packed unary v2 batch methods for hot consensus/read paths. https://github.com/kellabyte/holostore/blob/main/crates/holo_store/proto/holo.proto 2

  10. HoloStore holo.proto, definitions for PreAcceptRequest, PreAcceptResponse, AcceptRequest, AcceptResponse, CommitRequest, CommitResponse, and RecoverResponse. https://github.com/kellabyte/holostore/blob/main/crates/holo_store/proto/holo.proto 2 3 4

  11. Electrode paper, Sections 3, 4, and 8, discussing eBPF verifier constraints, bounded/static memory behavior, and the division between simple fast-path operations in kernel and complex protocol behavior in user space. https://www.usenix.org/system/files/nsdi23-zhou.pdf 2

  12. Linux kernel documentation, eBPF verifier, describing safety checks for eBPF programs. https://docs.kernel.org/bpf/verifier.html

  13. Electrode paper, Section 4.1, message broadcasting in TC using bpf_clone_redirect() to clone and rewrite packets in kernel. https://www.usenix.org/system/files/nsdi23-zhou.pdf 2

  14. Electrode paper, Section 4.3, wait-on-quorums using eBPF-maintained bitsets and forwarding quorum-reaching packets/events. https://www.usenix.org/system/files/nsdi23-zhou.pdf 2

  15. HoloStore design notes, describing the separation of consensus, WAL/durability, execution, batching, and range generations. https://raw.githubusercontent.com/kellabyte/holostore/main/crates/holo_store/docs/DESIGN.md 2

  16. Electrode paper, Section 1, noting the prototype targets UDP protocols, uses application-level retransmission, and fits small Paxos messages/datacenter conditions. https://www.usenix.org/system/files/nsdi23-zhou.pdf

  17. docs.ebpf.io, BPF_PROG_TYPE_XDP, describing XDP programs attached to network devices and packet actions such as pass, drop, redirect, and manipulation. https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_XDP/

  18. Accord CEP-15 draft whitepaper, Sections 3.1 and 3.2, describing PreAccept, Accept, Commit, Execute, Apply, fast path, timestamps, and dependency responses. https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf

  19. Linux kernel documentation, BPF_MAP_TYPE_XSKMAP, describing XDP redirection of raw frames to AF_XDP sockets. https://docs.kernel.org/bpf/map_xskmap.html

  20. Linux kernel documentation, BPF ring buffer, describing the ring buffer design/API for BPF-to-user-space event communication. https://www.kernel.org/doc/html/next/bpf/ringbuf.html

  21. Electrode paper, Sections 4.2 and 4.3, describing fixed-size in-kernel structures and fallback/forwarding to user space when the eBPF path cannot handle a case. https://www.usenix.org/system/files/nsdi23-zhou.pdf

  22. docs.ebpf.io, AF_XDP, describing AF_XDP sockets and partial/full kernel bypass with XDP. https://docs.ebpf.io/linux/concepts/af_xdp/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment