qrli/cap-does-not-mean-what-cap-theorem-means.md

## cap-does-not-mean-what-cap-theorem-means.md

      
    Raw
  

              cap-does-not-mean-what-cap-theorem-means.md
            
          
    In Reality, CAP Does Not Mean What CAP Theorem Means

Experts have been trying to explain that the CAP theorem does not mean what people thought and it is not the right tool to categorize databases. But in the end, the true meaning does not matter. It is all about what majority people think it is. So, I tried to summarize what people really mean when they say CAP in reality. It is wrong, however more useful.
CP

CP usually means consistency as in traditional relational databases. The strict form means reads/writes are serialized, which is still less than true CAP consistency. While in many use cases, the loose form of read-your-own-writes is good enough, and that’s what many NoSQL databases mean when they claim to be CP. An example is MongoDB, even though the default configuration does not guarantee that in some failure scenarios. You have to opt-in stronger guarantee at some performance cost.
AP

AP does not refer to normal high availability, though many people think that way. If it was about normal high availability, CP databases also provide that, and then most databases provide all of CAP. Instead, it is more about low latency scenarios where temporary inconsistency is a better choice than having to wait, which is an extreme form of high availability. When a database claims to be AP, it typically means the database will always serve reads/writes in time, as long as some replica can still be reached by a client. It covers all kinds of failures, other than network partitioning only. A examples is Cassandra.
P

P may mean two different things depending on context:

One is simply talking about the ability to scale out, which also overlaps with availability somehow.
The other is more about partitioning across wide-area networks, rather than cluster nodes in the same local network. A typical example is replicas across different data centers (or replicas across your server and client computer). In this case, there is high latency, so the sync of state is almost always asynchronous in practice, which means even CP databases do not guarantee strong consistency.

In reality, P cannot be sacrificed. Without P, high availability can hardly be provided as a single node may crash, so there is no CA database. Traditional single node relational databases, are only C in this sense. And when using a fail-over setup, it could be treated as the CP category, even though the fail-over is really solving the high availability concern. 🤷‍
Little Difference in the End

However, no matter what databases claimed to be, there is a trend that databases stop saying to they are CP or AP. Instead, they can be either CP or AP depending on how they are configured.