Skip to content

Instantly share code, notes, and snippets.

@miguel-vila
Last active August 29, 2015 13:55
Show Gist options
  • Save miguel-vila/02ffa69001c631a4dbad to your computer and use it in GitHub Desktop.
Save miguel-vila/02ffa69001c631a4dbad to your computer and use it in GitHub Desktop.

"Eventually Consistent" by Werner Vogels

  • (...) Many of these techniques try to achieve distribution transparency —that is, to the user of the system it appears as if there is only one system instead of a number of collaborating systems. Many systems during this time took the approach that it was better to fail the complete system than to break this transparency.
  • An important observation is that in larger distributed scale systems, network partitions are a given; therefore, consistency and availability cannot be achieved at the same time. This means there are two choices on what to drop: relaxing consistency will allow the system to remain highly available under the partitionable conditions; making consistency a priority means that under certain conditions the system will not be available.
  • Types of consistency:
    • Strong consistency : After the update completes, any subsequent access will return the updated value.
    • Weak consistency : The system does not guarantee that subsequent accesses will return the updated value. A number of conditions need to be met before the value will be returned.
    • Eventual consistency : This is a specific form of weak consistency; the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value.
  • The eventual consistency model has a number of variations that are important to consider:
    • Causal consistency. If process A has communicated to process B that it has updated a data item, a subsequent access by process B will return the updated value, and a write is guaranteed to supersede the earlier write. Access by process C that has no causal relationship to process A is subject to the normal eventual consistency rules.
    • Read-your-writes consistency. This is an important model where process A, after having updated a data item, always accesses the updated value and never sees an older value. This is a special case of the causal consistency model.
    • Session consistency. This is a practical version of the previous model, where a process accesses the storage system in the context of a session. As long as the session exists, the system guarantees read-your-writes consistency. If the session terminates because of a certain failure scenario, a new session must be created and the guarantees do not overlap the sessions.
    • Monotonic read consistency. If a process has seen a particular value for the object, any subsequent accesses will never return any previous values.
    • Monotonic write consistency. In this case, the system guarantees to serialize the writes by the same process. Systems that do not guarantee this level of consistency are notoriously difficult to program.
  • From a practical point of view these two properties (monotonic reads and read-your-writes) are most desirable in an eventual consistency system, but not always required. These two properties make it simpler for developers to build applications, while allowing the storage system to relax consistency and provide high availability.
  • N = The number of nodes that store replicas of the data.
  • W = The number of replicas that need to acknowledge the receipt of the update before the update completes.
  • R = The number of replicas that are contacted when a data object is accessed through a read operation.
  • If W+R > N, then the write set and the read set always overlap and one can guarantee strong consistency. In the primary-backup RDBMS scenario, which implements synchronous replication, N=2, W=2, and R=1.
  • In distributed storage systems that provide high performance and high availability the number of replicas is in general higher than two. Systems that focus solely on fault tolerance often use N=3 (with W=2 and R=2 configurations).
  • Systems that must serve very high read loads often replicate their data beyond what is required for fault tolerance; N can be tens or even hundreds of nodes, with R configured to 1 such that a single read will return a result.
  • In R=1 and N=W we optimize for the read case, and in W=1 and R=N we optimize for a very fast write.
  • Weak/eventual consistency arises when W+R <= N, meaning that there is a possibility that the read and write set will not overlap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment