Skip to content

Instantly share code, notes, and snippets.

@edwardloveall
Created September 17, 2019 17:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edwardloveall/5276a954140cee94b2d79e64db6936ff to your computer and use it in GitHub Desktop.
Save edwardloveall/5276a954140cee94b2d79e64db6936ff to your computer and use it in GitHub Desktop.
Some thoughts on why to use Postgres over MySQL. This was close to a year ago so some of these may be out of date.

Performance

MySQL has poor failure characteristics when overloaded. Performance issues in MySQL tend to be catastrophic for the entire database, whereas performance issues in Postgres tend to be specific to the affected table or row.

As one example, I debugged an application that was opening transactions and forgetting to close them. The end result in MySQL was that the entire database locked up. It was very hard to debug, because you couldn’t ask MySQL any questions about what it was doing when it froze up. Even the administrative tools were locked, so killing the daemon and booting the database back up was the only option. This also resulted in data loss.

I’ve debugged similar issues in Postgres, and you tend to see predictable failures. The same mistake in Postgres would result in lock timeouts, with an error message telling you that a lock had timed out. (Wow!) Other queries would continue to work, and administrative and debugging tools would remain viable during the outage.

Postgres supports useful features like expression indexes. It has a good query planner that can reuse simple, single-column indexes. MySQL databases require careful planning for index and will frequently require custom indexes (or tables!) for complex queries.

Postgres supports views, including materialized views. This lets you create caches for complex queries when necessary without creating temporary tables or managing atomic refreshes by hand.

Reliability

Postgres provides strong guarantees in terms of consistency. Its constraints and foreign keys work for sure and can’t be bypassed. Even DDL statements occur in a transaction and are thus atomic.

MySQL has fuzzy rules around aggregate queries, null values, uniqueness constraints, character sets, time zones, and time stamps which mean it’s very easy to misunderstood what data you actually have.

These data inconsistencies can result in subtle, devastating bugs, such as delivering data for the wrong user, when Postgres would refuse to run the ambiguous query. Issues with constraints, character sets, and time zones can result in non-sense data that requires careful fixes by hand, which can sometimes be impossible to determine retroactively.

Postgres has good options for replication and better options for distributed setups. As an example, it has long had good support for UUID primary keys. Using non-sequential primary keys in MySQL leads to subtle bugs and performance sacrifices, so sharding is implausible without a custom partitioning scheme.

Summary

Postgres is a fast database for mid-sized data sets that provides strong guarantees about the data it’s responsible for. MySQL is faster for single row, primary key lookups, but is slower for many other operations and provides poor guarantees. If specialized performance is a concern and you can afford to give up guarantees, you should look at a specialized database like Redis, Cassandra, or Elastic Search. For everything else, use Postgres.

@edwardloveall
Copy link
Author

Credit to @jferris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment