Skip to content

Instantly share code, notes, and snippets.

@lenary
Last active April 13, 2018 14:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save lenary/6008876 to your computer and use it in GitHub Desktop.
Save lenary/6008876 to your computer and use it in GitHub Desktop.
Some kind of introduction to CRDTs

Sooo, CRDTs.

They broadly fall into two groups, Commutative and Convergent.

  • Commutative Replicated Data Types (aka CmRDTs): These are also referred to as op-based CRDTs. Essentially, updates cause commutative operations to be broadcast. However, this broadcast channel is required to be reliable and have an order (see §2.2.2 of [1] below).
  • Convergent Replicated Data Types (aka CvRDTs): These are also referred to as state-based CRDTs. Essentially they consist of a data structure with a commutative merge operation which will deterministically take two of these structures and merge them into one, preserving updates. These don't require a reliable broadcast system, instead can rely on the liveness properties of an Eventually Consistent (EC) system normally.

Why is the reliable broadcast thing an issue? To have it you require strong consistency. CRDTs are supposed to be designed for Highly Available, EC systems, which by definition (and CAP theorem) can't have strong consistency[+].

In Riak, we've been doing work on CRDTs for riak. Ours is a slightly bastardised hybrid format of Cm- and CvRDTs. Internally, we use CvRDTs everywhere, in storage and in any Riak nodes when coordinating requests. The reason we can't use CvRDTs is that riak clients are not extra replicas, so we have an op-based system for performing updates, where a coorinator recieves an update-op from a client, and then the coordinator does a read-op-merge-write procedure. This means that clients don't need to read a crdt, then perform an update and broadcast the structure back to do an update.

We have PN-counters working very successfully with this hybrid system, and it turns out to be relatively simple to implement.

I haven't yet had the chance to look at Treedoc, or http://bosker.wordpress.com/2012/05/10/on-editing-text/ , but there's been lots of work done on it (we're going back and looking at more generic data structures like Maps so that our CRDTs are usable by more people).

there have been two large EU projects on CRDTs, ConcoRDanT which worked on TreeDoc http://concordant.lip6.fr/ and Another that we're involved with is just starting.

Some Reading:


[+] Hahaha, distributed systems means I can't talk in absolutes. Essentially we can't always have strong consistency and accept writes because we have to cope with network partitions. This doesn't stop us achieving strong consistency for short periods of time, say if we need to do some kind of GC on our CRDTs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment