Skip to content

Instantly share code, notes, and snippets.

@qrli
Last active January 29, 2019 13:39
Show Gist options
  • Save qrli/fb565928da6b57bf8d7980dadf522c3d to your computer and use it in GitHub Desktop.
Save qrli/fb565928da6b57bf8d7980dadf522c3d to your computer and use it in GitHub Desktop.
NoSQL Databases Are Different, But Hard To Explain

NoSQL Databases Are Different, But Hard To Explain

I have been seeing people asking why we cannot use another NoSQL DB instead. Roughly, I know they are designed for different scenarios. But to tell concrete convincing arguments, I have to do some research. My focus is only about some popular ones: ElasticSearch, MongoDB, and Cassandra.

The most frequent question (especially from managers) is why not use ElasticSearch as the DB exclusively instead of also storing data in some other DB like MongoDB. Yes, it is primarily a search engine, great for OLAP (analytic workflows). But why it is not so suitable as a DB for OLTP (transactional workflows)?

There are many detailed differentiating factors. But the major one I find is that, for OLTP, we typically expect some level of consistency. For industry OLTP, the most interesting level is called read your own writes. That is, if you update a record, your subsequent reads in the same client should see your latest update.

  • Relational DBs typically have consistency of fully serialized. People from relational DB background often forget that NoSQL DBs are typically a lot more relaxed at consistency. And by default, they do not even do read-your-own-writes consistency either. So be MongoDB and Cassandra. In case you need such level of consistency, you can configure to enable it, with some performance cost, of course. Some people are scared when they learned this, and want to go back to a relational DB. In fact, when you need to do replication and partitioning, relational DB is no better but only harder.

  • ElasticSearch, on the other hand, cannot do this read-your-own-writes level of consistency. To be accurate, it can for simple GET operations, but not for SEARCH. To enable the lighting fast and super power search, it has very advanced indexing. However, it cannot afford updating the index for each write, which would make it very slow. So, it updates the index, e.g. every second in batch. Before the index is updated, SEARCH returns old data. This is perfect for search-engine-like usecases, with eventual consistency, but no so for intensive transactional processing.

For the differences between MongoDB and Cassandra, we know MongoDB is document DB while Cassandra is wide column DB. But what does that mean?

  • For usecases which read specific columns a lot, Cassandra clearly wins. It is because Cassandra stores a column in a continuous block than having to query a lots records. While in MongoDB you can store a column in a single document, but the document size has a hard 16 MB limit.

  • For document usecases, on the other hand, Cassandra data model is table-like, which means it does not handle nested documents. Also, indexing is typically limited a primary index, while secondary indexes are tricky to use and can impact performance greatly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment