sebastianwebber/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Uber facts

Original posts/information

Migrating Uber from MySQL to PostgreSQL

Key points


~50GB MySQL Application
Main motivation: PostGis
Migration made with a custom tool(xml2pgcopy) and mysqldump on 45min
Final architecture layout: Python app -> PGBouncer -> HAProxy -> PostgreSQL

@Scale Presentation: Taking storage for a Ride at Uber

Key points


Single [and large] database to store all data (excluding analytics)
Hard to scale, no sharding at all
With the change to MySQL, they implement sharding with...
...Append only database

Matt Ranney at All Your Base 2015

Key points


Too many engineers (about 1500) and caotic
Database timeline:

2009-2010 - PHP + MySQL
break in 2 pieces:

dispatch: NodeJS + MongoDB
API: python + SQLAlchemy + MySQL


Feb 2012: dispatch swaps MongoDB for Redis
Feb 2013: Api switched to Postgres
Mar 2014: New Python services use MySQL, "schemaless begins"
sep 2014: First schemaless - starts migration from postgres


Chaos engineering
On May 11, API goes down for 16hours and read get slow queries for more 24 hours:

Disk full by wal archiving failed (caused by wrong permissions on S3) . Randon engineer  delete old WAL and restart postgres. postgres does not restart.
All standby fail to become a new master. Consulting write a c program to make it works.


"databases can be killed and no users will notice"

Surge 2015 - Matt Ranney - Designing for failure: Scaling Uber's backend by breaking everything

Key points


uDestroy and failure tests
More details about database timeline and mostly the same presentation at All your base 2015

Engenering Blog: Migration to MySQL

About PostgreSQL

Key points


Old and monolithic backend application written in Python
"PostgreSQL limitations":

Inefficient architecture for writes
Inefficient data replication
Issues with table corruption
Poor replica MVCC support
Difficulty upgrading to newer releases


Replication Issues


PostgreSQL Replication: For crash recovery purposes, the database already maintains a write-ahead log (WAL) and uses it to implement two-phase commit.
Write amplification
Postgres replication data stream quickly becomes extremely verbose, potentially occupying a large amount of bandwidth
Cascading replication limits the inter–data center bandwidth requirements to the amount of replication required between just the master and a single replica, even if there are many replicas in the second data center.
During a routine master database promotion to increase database capacity, we ran into a Postgres 9.2 bug.
Postgres does not have true replica MVCC support
If a streaming replica has an open transaction, updates to the database are blocked if they affect rows held open by the transaction. (hot_standby_feedback?)
pglogical is still not an option for people running on older Postgres releases

Data Corruption issues


During a routine master database promotion to increase database capacity, we ran into a Postgres 9.2 bug:  Replicas followed timeline switches incorrectly, causing some of them to misapply some WAL records.

Postgres Upgrades


Because replication records work at the physical level, it’s not possible to replicate data between different general availability releases of Postgres.

Buffer Pool issues


To increase performance, Postgres allows the kernel to automatically cache recently accessed disk data via the page cache
accessing data via the page cache is actually somewhat expensive compared to accessing RSS memory

Connection Issues


MySQL Connections theads vs PostgreSQL connection process
Postgres seems to simply have poor support for handling large connection counts
PostgreSQL need a external connection pool like PGBouncer
Occasional application bugs in backend services that caused them to open more active connections and these bugs have caused extended downtimes

Comnunity responses in Blogs/Mail Lists


RE: Why uber engineering switched from Postgres to MySQL
Use the index Luke website: On Uber’s Choice of Databases
pgsql-hackers list: Why we lost Uber as a user
pgsql-advocacy list: Uber moving towards MySQL
PGDay Russia 16 Presentation -> Wings, legs and tails: strong sides of MySQL, and when PostgreSQL will conquer the world: Comparison of MySQL and PostgreSQL features side-by-side
Reddit Discussion
2nd Quadrant's Blog: Thoughts on Uber’s List of Postgres Limitations
Robert Haas's Blog: Uber's move away from PostgreSQL