Skip to content

Instantly share code, notes, and snippets.

@matthiasr
Created May 19, 2022 07:50
Show Gist options
  • Save matthiasr/54820441683ee0e1ccf08ff7ef608b7a to your computer and use it in GitHub Desktop.
Save matthiasr/54820441683ee0e1ccf08ff7ef608b7a to your computer and use it in GitHub Desktop.
What makes operating databases more complex than stateless services?

To me, the fundamental difficulty in managing databases is the amount of state they have, the time it takes to move that state around, and the difficulty in keeping it in sync.

Whether it's Cassandra, MySQL, or PostgreSQL, bringing up a new instance takes time, orders of magnitude more than replacing some stateless service. Network mountable volumes help, because that state mostly lives on whatever provides that, but you still need to account for "moving" it into the cache.

Additionally, you necessarily have shared state that you cannot ever reset. A lot of the usefulness from "cattle" servers is that you have a clear way to reset them to a known good state. In most cases you cannot do that with your data.

Some of the mechanics of a cattle management system like Kubernetes still work but they're on timelines that give the cluster operators headaches.

Because certain "reset" actions are unavailable, you end up spending much more time thinking about what the way forward is, and on managing step by step evolution instead of rebuilding from scratch every time

For any system, I like to think through the components and work out what is

  • stateless (can be removed and recreated quickly, possibly scales out easily, the application instance)
  • semi-stateful (can be reconstructed but takes time and care, like caches)
  • stateful (cannot be fully recreated from something else, can only evolve forwards).

This happens at multiple levels. If you have external network mountable volumes, the DB instance becomes semi-stateful, because it "only" needs to warm up the cache. The volume underneath is fully stateful, you cannot delete and recreate it.

Backups with point in time restore capability make it slightly less stateful by adding another "most stateful" layer. Ideally, you end up with a gradient where the more complicated bits are less stateful so they're easier to manage-by-reset, whereas the ultimate store of state is the simplest possible thing that doesn't need managing at all (i.e. an S3 bucket)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment