Skip to content

Instantly share code, notes, and snippets.

@samsch
Created September 13, 2020 19:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save samsch/1687dde7ed34b83e5a42ab58a4a190c7 to your computer and use it in GitHub Desktop.
Save samsch/1687dde7ed34b83e5a42ab58a4a190c7 to your computer and use it in GitHub Desktop.
When should I use relational vs NoSQL databases?

When should I use relational vs NoSQL databases?

What is the task you need to accomplish?

Frequently this question is asked as a "what should I use for a web app" question with no real details for what kind of data is being stored. As it turns out, that's ok! We actually have a type of database which directly fits "general purpose", by being fully robust and flexible. These are the relational databases, which are designed around the SQL standard.

Because "general purpose" covers most tasks really well when it comes to databases (something that's not nearly as true in other areas of technology), you can just grab PostgreSQL or MySQL and use that for all data storage purposes, and likely have no issues for the full lifetime of the project.

When general purpose is not enough

So when or where does "general purpose" stop working for your project?

The easiest answer is when you start hitting performance limits and aren't able to apply known fixes (scaling, simplifying queries, caching) to get what you need. This might come from needing to do depth-first searches in terabytes or petabytes of data. Or it could be that your write volume is outpacing what your servers can handle for indexing.

Why not start with NoSQL

But if using a NoSQL solution for those cases makes sense, why not just start with it?

There are good reasons you don't want to start with NoSQL databases.

One is that "NoSQL" isn't actually a "type" of database at all. It's a description that says the tool isn't general purpose. NoSQL databases vary dramatically in what their purposes are, and what their strengths and trade-offs are. So picking any individual one immediately limits your application to that tool's specifics.

More importantly though, NoSQL databases don't model relations (not that they couldn't, but they would probably not be called NoSQL if they did). All data is relational. One of the defining traits of general purpose databases is that they are designed to model relationships in your data. On the other hand, NoSQL databases expect that any relationships will be managed outside of the database, or handled by nesting the data (making it inaccessible for other purposes).

By modeling relationships, the general purpose database can maintain "data integrity" by enforcing those relationships are not corrupted. If a Post is always to be written by a User, then the database can ensure a Post can't exist in the data without the connected User. This makes your application as a whole more robust. NoSQL databases do not have this capability, and rely on your application to perform checks directly before writing or deleting. This often means manually including checks, which is error-prone, or using external tools which aren't portable like the database itself.

Modeling relationships also makes your application more flexible, because having a general definition of the data and relationships means you can query and correctly combine the data in new ways without needing to change how it's stored. With NoSQL databases, you usually have to "combine" data upfront to handled relationships, which makes it much harder to make different combinations of the data later.

But I though NoSQL was supposed to be more flexible?

Unfortunately, this is a widely held belief. What is true is that you can write random data to something like a document store (a type of NoSQL database) without pre-defining a data schema. The problem is that without a pre-defined schema, the database can't know what shape that data is supposed to be, and doesn't have a model of the relationships in it. So it can't check to make sure the data is valid, nor can it make sure it doesn't conflict. Relational database are actually quite capable of handling unstructured data in the same way if you want (json/jsonb fields in SQL), but usually you quickly find defining a fixed schema is more beneficial.

Use a relational database by default

All your data is relational, and while using a relational database might have a little more initial setup to define your data schema, but the payout for that is a robust store and the flexibility for your project to handle new requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment