Skip to content

Instantly share code, notes, and snippets.

@qrli
Created January 29, 2019 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save qrli/83e62870f6bbd033b238c7941b5ca33b to your computer and use it in GitHub Desktop.
Save qrli/83e62870f6bbd033b238c7941b5ca33b to your computer and use it in GitHub Desktop.
In Reality, CAP Does Not Mean What CAP Theorem Means

In Reality, CAP Does Not Mean What CAP Theorem Means

Experts have been trying to explain that the CAP theorem does not mean what people thought and it is not the right tool to categorize databases. But in the end, the true meaning does not matter. It is all about what majority people think it is. So, I tried to summarize what people really mean when they say CAP in reality. It is wrong, however more useful.

CP

CP usually means consistency as in traditional relational databases. The strict form means reads/writes are serialized, which is still less than true CAP consistency. While in many use cases, the loose form of read-your-own-writes is good enough, and that’s what many NoSQL databases mean when they claim to be CP. An example is MongoDB, even though the default configuration does not guarantee that in some failure scenarios. You have to opt-in stronger guarantee at some performance cost.

AP

AP does not refer to normal high availability, though many people think that way. If it was about normal high availability, CP databases also provide that, and then most databases provide all of CAP. Instead, it is more about low latency scenarios where temporary inconsistency is a better choice than having to wait, which is an extreme form of high availability. When a database claims to be AP, it typically means the database will always serve reads/writes in time, as long as some replica can still be reached by a client. It covers all kinds of failures, other than network partitioning only. A examples is Cassandra.

P

P may mean two different things depending on context:

  • One is simply talking about the ability to scale out, which also overlaps with availability somehow.
  • The other is more about partitioning across wide-area networks, rather than cluster nodes in the same local network. A typical example is replicas across different data centers (or replicas across your server and client computer). In this case, there is high latency, so the sync of state is almost always asynchronous in practice, which means even CP databases do not guarantee strong consistency.

In reality, P cannot be sacrificed. Without P, high availability can hardly be provided as a single node may crash, so there is no CA database. Traditional single node relational databases, are only C in this sense. And when using a fail-over setup, it could be treated as the CP category, even though the fail-over is really solving the high availability concern. 🤷‍

Little Difference in the End

However, no matter what databases claimed to be, there is a trend that databases stop saying to they are CP or AP. Instead, they can be either CP or AP depending on how they are configured.

@mojaedukacja
Copy link

Contact

  • Please write your contact informations (email, etc). public@smoothdraw.com doesn't work !! I've been trying ctontact you. or a year.

@toufa12
Copy link

toufa12 commented Feb 15, 2022

please make smoothdraw software for mac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment