Compilation of the Uber Facts on PostgreSQL to MySQL Migration
- ~50GB MySQL Application
- Main motivation: PostGis
- Migration made with a custom tool(xml2pgcopy) and mysqldump on 45min
- Final architecture layout: Python app -> PGBouncer -> HAProxy -> PostgreSQL
- Single [and large] database to store all data (excluding analytics)
- Hard to scale, no sharding at all
- With the change to MySQL, they implement sharding with...
- ...Append only database
- Too many engineers (about 1500) and caotic
- Database timeline:
- 2009-2010 - PHP + MySQL
- break in 2 pieces:
- dispatch: NodeJS + MongoDB
- API: python + SQLAlchemy + MySQL
- Feb 2012: dispatch swaps MongoDB for Redis
- Feb 2013: Api switched to Postgres
- Mar 2014: New Python services use MySQL, "schemaless begins"
- sep 2014: First schemaless - starts migration from postgres
- Chaos engineering
- On May 11, API goes down for 16hours and read get slow queries for more 24 hours:
- Disk full by wal archiving failed (caused by wrong permissions on S3) . Randon engineer delete old WAL and restart postgres. postgres does not restart.
- All standby fail to become a new master. Consulting write a c program to make it works.
- "databases can be killed and no users will notice"
- uDestroy and failure tests
- More details about database timeline and mostly the same presentation at All your base 2015
- Old and monolithic backend application written in Python
- "PostgreSQL limitations":
- Inefficient architecture for writes
- Inefficient data replication
- Issues with table corruption
- Poor replica MVCC support
- Difficulty upgrading to newer releases
Data Corruption issues
- PostgreSQL Replication: For crash recovery purposes, the database already maintains a write-ahead log (WAL) and uses it to implement two-phase commit.
- Write amplification
- Postgres replication data stream quickly becomes extremely verbose, potentially occupying a large amount of bandwidth
- Cascading replication limits the inter–data center bandwidth requirements to the amount of replication required between just the master and a single replica, even if there are many replicas in the second data center.
During a routine master database promotion to increase database capacity, we ran into a Postgres 9.2 bug.
- Postgres does not have true replica MVCC support
- If a streaming replica has an open transaction, updates to the database are blocked if they affect rows held open by the transaction. (
- pglogical is still not an option for people running on older Postgres releases
- During a routine master database promotion to increase database capacity, we ran into a Postgres 9.2 bug: Replicas followed timeline switches incorrectly, causing some of them to misapply some WAL records.
Buffer Pool issues
- Because replication records work at the physical level, it’s not possible to replicate data between different general availability releases of Postgres.
- To increase performance, Postgres allows the kernel to automatically cache recently accessed disk data via the page cache
- accessing data via the page cache is actually somewhat expensive compared to accessing RSS memory
Comnunity responses in Blogs/Mail Lists
- MySQL Connections theads vs PostgreSQL connection process
- Postgres seems to simply have poor support for handling large connection counts
- PostgreSQL need a external connection pool like PGBouncer
- Occasional application bugs in backend services that caused them to open more active connections and these bugs have caused extended downtimes