Skip to content

Instantly share code, notes, and snippets.

@qrli
qrli / nosql-databases-are-different-but-how.md
Last active January 29, 2019 13:39
NoSQL Databases Are Different, But Hard To Explain
View nosql-databases-are-different-but-how.md

NoSQL Databases Are Different, But Hard To Explain

I have been seeing people asking why we cannot use another NoSQL DB instead. Roughly, I know they are designed for different scenarios. But to tell concrete convincing arguments, I have to do some research. My focus is only about some popular ones: ElasticSearch, MongoDB, and Cassandra.

The most frequent question (especially from managers) is why not use ElasticSearch as the DB exclusively instead of also storing data in some other DB like MongoDB. Yes, it is primarily a search engine, great for OLAP (analytic workflows). But why it is not so suitable as a DB for OLTP (transactional workflows)?

There are many detailed differentiating factors. But the major one I find is that, for OLTP, we typically expect some level of consistency. For industry OLTP, the most interesting level is called read your own writes. That is, if you update a record, your subsequent reads in the same client should see your latest u

@qrli
qrli / uft-8-vs-utf-16-no-one-true-encoding.md
Created January 29, 2019 13:46
Utf-8 vs Utf-16; No One True Encoding
View uft-8-vs-utf-16-no-one-true-encoding.md

Utf-8 vs Utf-16; No One True Encoding

It has since long come to a consensus that utf-8 is the right choice for files and network, while utf-16 can be slightly better for in-memory processing. However, the addition of emoji characters to unicode weakens the utf-16 arguments.

One of typical arguments for utf-16 is that it is more efficient for asian characters, because it is mostly 2 bytes per character instead of 3–4 bytes in utf-8. People have already pointed out that for normal files, there are also lots of metadata (e.g. HTML tags/css/js), which is ASCII, so the savings are typically cancelled out.

The Old Advantage Is Gone

The real advantage, in my opinion, is still in that utf-16 can be largely used as UCS-2, which is the 2-bytes-only predecessor of utf-16. Many people would disagree and think that’s wrong. However, those characters which requires 4 bytes in utf-16 are mostly dead characters which most people will never use, so they deserve some extra handling ins

@qrli
qrli / battle-of-front-and-back-ends.md
Created January 29, 2019 13:50
Battle of Front and Back Ends
View battle-of-front-and-back-ends.md

Battle of Front and Back Ends

From a front-end centric view, back-end is just a database. From a back-end centric view, front-end is just the UI. When the two groups of developers work together, there is a battle of the middle ground — the business layer.

As a back-end person:

  • I’d think the front-end landscape changes too fast. The JavaScript we write today will become a technology dinosaur in 5 years. But our business logic typically lives for much longer.
  • Also, the front-end tool chains are often much loose compared to back-ends. It is indeed good for quick iterations and delivering fast, but it will be much harder when the project becomes large enough, and when we need frequent refactors.
  • And for protecting IP, keeping business logic in back-ends feels more secure, than obfuscated JavaScript code in browsers.
@qrli
qrli / kubernetes-was-designed-to-manage-a-data-center-but-developers-use-it-to-run-a-website.md
Last active March 7, 2020 02:18
Kubernetes was designed to manage a data center, but developers use it to run a website
View kubernetes-was-designed-to-manage-a-data-center-but-developers-use-it-to-run-a-website.md

Recently Google Cloud announced they will charge for control plane of GKE. That’s some small amount, but users got angry. One reason is that many companies are running many Kubernetes clusters, which surprised Google people. After looking into the discussion, it cleared out my long time curiosity.

Its Designed Usage

Remember that Kubernetes started in competition with DC/OS, which manages 1000s of machines. It is supposed to manage a whole data center, and all your applications are supposed to be deployed inside, with namespace for isolation. While in Cloud, it is reasonable that a company runs a single Kubernetes cluster (or a few, but no more), to be shared by all teams and to isolate from other companies. So, you need one Ops team to manage Kubernetes, and all other teams can focus on applications.

How We Got Here

But for majority developers, Kubernetes is compared with much simpler orchestrators like Swarm, Nomad, Service Fabric, etc., which are typically used to host a single application (or a few)

@qrli
qrli / cap-does-not-mean-what-cap-theorem-means.md
Created January 29, 2019 13:57
In Reality, CAP Does Not Mean What CAP Theorem Means
View cap-does-not-mean-what-cap-theorem-means.md

In Reality, CAP Does Not Mean What CAP Theorem Means

Experts have been trying to explain that the CAP theorem does not mean what people thought and it is not the right tool to categorize databases. But in the end, the true meaning does not matter. It is all about what majority people think it is. So, I tried to summarize what people really mean when they say CAP in reality. It is wrong, however more useful.

CP

CP usually means consistency as in traditional relational databases. The strict form means reads/writes are serialized, which is still less than true CAP consistency. While in many use cases, the loose form of read-your-own-writes is good enough, and that’s what many NoSQL databases mean when they claim to be CP. An example is MongoDB, even though the default configuration does not guarantee that in some failure scenarios. You have to opt-in stronger guara