KlausTrainer/Consistency_in_BigCouch.md

## Consistency_in_BigCouch.md

      
    Raw
  

              Consistency_in_BigCouch.md
            
          
    Consistency in BigCouch

In a BigCouch cluster, there are N replicas, who can either be in the state alive or dead. The case of all replicas being alive is defined as the absence of network partitions, i.e., no message is lost. Network partitions are modeled as total failure of one or more replica. For instance, take a set of network partitions P. For each network partition P_i, all replicas that are located in a different network partition than P_i, are defined as dead.
When there's a read or write request, the request first hits one of the replicas, which is referred to as the leader. The leader starts a voting round, i.e., it forwards the request to all replicas. When the leader receives a response from a replica, it's said that the replica has voted. A response may either contain a document version, or some error code indicating e.g. that the document was not found, or the revision number of a newly created document version.
If at least R or rather W replicas agree (i.e., they gave the same vote), it's said that they've formed a quorum.
In general, it can be said that strict consistency is guaranteed if an absolute majority of replicas (i.e., N / 2 + 1) agree, otherwise only eventual consistency is guaranteed.
Write Operations

A write operation succeeds, if there's a quorum.
In the case that a write operation creates a new document version and W >= N / 2 + 1, strict consistency can be guaranteed for a subsequent read operation on that version (see below). Otherwise, only eventual consistency can be guaranteed.
Read Operations

A read operation succeeds, if there's a quorum.
In the case that R >= N / 2 + 1, strict consistency is guaranteed, otherwise only eventual consistency.
Anti-Entropy Mechanism

In the case that there's no uniform consensus, and either all replicas have voted or there's an absolute majority, a "winning" document version is determined. The winning version is the one an absolute majority of replicas have voted for, or, if there's no absolute majority, the one with the highest revision number (by ASCII order). Note that it's also possible that the majority votes that no version of the document is available. In that case, the algorithm aborts. Otherwise, the winning version is written to those replicas who don't already have it.
To summarize the algorithm:

Start a voting round
try to form a quorum
wait until there's a quorum, or until all replicas have voted, or until there's a timeout
if there's a timeout, return with a timeout error, if there's a quorum, return the version the quorum agrees upon, if all replicas have voted but haven't formed a quorum, determine the winning version and return it to the client
if necessary (i.e., there's no uniform consensus), and if possible (i.e., either there's an absolute majority that votes for a version that actually exists, or all replicas have voted), write the winning version to those replicas who don't have it.