Skip to content

Instantly share code, notes, and snippets.

@afeinberg
Last active January 19, 2024 18:08
Show Gist options
  • Save afeinberg/d346d20714014f997346d4381aa94b9d to your computer and use it in GitHub Desktop.
Save afeinberg/d346d20714014f997346d4381aa94b9d to your computer and use it in GitHub Desktop.

Anti-Entropy and Dissemination

Purpose

Quick and reliable propagation.

More applicable to cluster wide metadata: small, infrequent, must be propagated reliably.

Types:

  1. Broadcast: one to rest.
  2. Periodically, exchange messages peer-to-peer.
  3. Co-operative broadcast: node receives messages, spreads messages quickly.

Techniques

Read-repair, hinted handoff, Merkle trees: see Dynamo paper.

Hinted-handoff vs. sloppy quorums: Voldemort and Cassandra do not implement sloppy quorums (where hints can be read immediately), use HH to catch up nodes that went off-line. AF: Note that these are used for data as opposed to metadata.

Digest reads: request a digest, sync if digest mismatches.

Bitmap Version Vectors: using bitmaps to record encode casual relations, allow identifying missing points.

Notes

Amazon Aurora usage

From Amazon Aurora Ascendant:

The basic idea of a write quorum implies that some segments may not initially receive all of the writes, all of the time. How do those segments deal with gaps in the redo log stream? Aurora storage nodes continuously "gossip" among themselves to fill holes (and perform repairs). Log stream advancement is tightly orchestrated through Log Sequence Number (LSN) management. We use a set of LSN markers to maintain the state of each individual segment.

Bitmap version vectors/version vectors

Dynamo-type systems use version vectors (which they slightly misleadingly call vector clocks) by only having N nodes in a clock (where N is the quorum size) and garbage collecting old entries.

Implementations:

Voldemort Standalone OCaml Riak

Question: does using bitmaps address scale issues (e.g., number of nodes, number of versions)?

Gossip

Epidemic

Similar to spread of rumors or epidemics. Select f (fan-out) peers at random, exchange information. Distribute a message in $\log_{f}{N}$ message rounds.

Implementation:

Voldemort

Non-epidemic (tree-based)

Overlay networks: exact certainty, fixed topologies (spanning trees.)

Hybrid

Plumtree: combine epidemic and tree-based approach, only sends full message to a subset of peers, lazily forwards message id to others.

Used by Riak core

Partial Views

Deals with the problem of churn (number of nodes leaving and joining.) Active: participate in dissemination, passive: used to replace the active ones if they fail.

HyParView: Hybrid Partial Views.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment