Create a gist now

Instantly share code, notes, and snippets.

@dublx / forked from dominictarr/
Last active Aug 29, 2015

What would you like to do?

(dominic: this list of papers was originally recommended to me by Brain Noguchi @bnoguchi, and was a great start to understanding distributed systems)

Here's a selection of papers that I think you would find helpful and interesting:

Time, Clocks, and the Ordering of Events in a Distributed System

The seminal paper about event ordering and concurrency. The important result is that events in a distributed system define a partially ordered set. The connection to what we're working on is fundamental, as this defines how to detect concurrent updates. Moreover, the chosen algorithm to turn the partially ordered set into a totally ordered set defines the conflict resolution algorithm.

Dynamo: Amazon’s Highly Available Key-value Store

Perhaps you've already read this one. It describe's Amazon's Dynamo data store, which has had a considerable influence on the designs of systems such as Riak, Cassandra, etc. It demonstrates how one can use Lamport Clocks in practice with a bound on the size of the clock. Moreover, it covers the scenario of deciding to do reconciliation (their term for conflict resolution) on reads, as opposed to on writes. Finally, it touches on the important consideration of replication/availability/eventual consistency and how their reconciliation approach ties in with that.

A comprehensive study of Convergent and Commutative Replicated Data Types

This is a pretty good (though lengthy) overview of CRDTs. If you design your app to work with CRDTs, then all updates are commutative; therefore, event ordering does not matter because distributed state is guaranteed to be eventually consistent. The challenge with making CRDTs is designing your data structures in such a way that give you good space/time characteristics.

A Survey on Operational Transformation Algorithms: Challenges, Issues and Achievements

This is a succinct summary that cuts across decades to nicely wrap up the historical evolution and important results in the OT space. Due to its succinctness, some things are not as well explained as they could be.

(dominic: to be honest, OT strikes me as overly academic, it's more complex and usually depends on a stateful server. crdts are much more flexible)

Achieving Data Consistency by Contextualization in Web-Based Collaborative Applications

This paper grounds the perspective of OT using a concept called contextualization. I think this perspective is the most mathematically rigorous as well as intuitive approaches that I have seen. Overall, this was the most read-able OT paper for me as I didn't have to go through it more than twice.

(dominic: I also found this idea very compelling)

Consistency Analysis in Bloom: a CALM and Collected Approach

On how any system based on logically monotonic operations is provably eventually consistent, otherwise, if a non-monotonic operation takes place after an asynchronous one, then coordination is necessary.

(dominic: and while we are at it, you should certainly read the bitcoin paper...)

Bitcoin: a peer to peer electronic cash system

Describes the design of a Decentralized Timestamp Server, that solves the Byzantine Generals problem, and thus the problem of double spending, without relying on a centralized server.

Efficient Reconciliation and Flow Control for Anti-Entropy Protocols

describes the tracking layer from amazon dynamo db. The main thing is a very simple but effective replication handshake, which I used in and later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment