kplawver/mariadb-galera.markdown

## mariadb-galera.markdown

      
    Raw
  

              mariadb-galera.markdown
            
          
    The High Availability Switch: from MySQL+MMM to MariaDB+Galera Cluster

INSERT PICTURE
Kevin Lawver, President @ Rails Machine, is our guest author for this post.
Few things feel worst than rolling out a High Availability (HA) system, then regularly seeing that system collapse. For our team at Rails Machine, this failing HA system was MySQL Multi-Master Replication (MMM).
We've been searching for a MMM replacement for a while, and a few months ago, we made the switch to MariaDB + Galera Cluster for High Availability MySQL. Here's the full story: the reasons why MMM doesn't work, why Galera Cluster does work, and our production takeaways.
MySQL Multi-Master Replication (MMM) is fundamentally broken

INSERT DIAGRAM
How MySQL MMM Works: a server with MySQL MMM installed polls the MySQL nodes every ten seconds, checking their health. Only one of the servers receives the writer role - the rest can have reader roles. MMM maintains a virtual IP that points to the node with the writer role.
The problem is the polling: if MySQL MMM is polling every ten seconds (the default), what happens when the writer node fails between the checks? You likely need a high-availability setup because you are handling many transactions - you could have hundreds of thousands of failed transactions before MMM detects the writer node isn't healthy. Worse, what if there's an internal issue where replication fails first, then transactions fail, then you switch the writer role to the other node. The other node won't be consistent with the original writer node.
Decreasing the polling interval to one second won't fix it - large databases are likely doing many transactions per/second.
So, polling is the fundamental issue, but beyond fundamental issues, MySQL MMM in the wild often results in hard-to-recover problems. Here's Baron Swartz, MySQL God at Percona on MMM's flaws:

In short, MMM causes more downtime than it prevents. It’s a Low-Availability tool, not a High-Availability tool. It only takes one really good serious system-wide mess to take you down for a couple of days, working 24×7 trying to scrape your data off the walls and put it back into the server. MMM brings new meaning to the term “cluster-f__k”.

Despite MMM's flaws, it was the least broken way to do HA for MySQL for a time. But, times have changed. Even the creator of MySQL MMM says it's time for a change. Here's Alexey's comment on Baron's blog post on MMM:

I’m the original author of MMM and I completely agree with you. Every time I try to add HA to my clusters I remember MMM and want to stab myself because I simply could not trust my data to the tool and there is nothing else available on the market to do the job reliably.

So, why is Galera the best MySQL HA solution?

While node health is still determined by polling, a failing node won't cause your database to enter a unrecoverable state. Why is a solution that still does health checks via polling better than the old way?
The answer lies in how replication works. With standard MySQL, writes to the master are recorded in a binary log. Slaves then reproduce the queries in the binary log. There is always a delay between when a query is run on the writer and when other nodes run it. This is asynchronous.
MySQL asynchronous replication has the following issues:

It's common for a slave's dataset to lag behind that of the master.
MySQL replication is slow - it replays transactions from the binary log.

With Galera, transactions are synchronously committed on all nodes. When a transaction completes, all servers have the same dataset - there is no replication delay. Galera also uses row based replication, which is significantly faster than MySQL's standard replication.
STILL NOT CLEAR - HOW DOES GALERA AVOID CAUSING ISSUES WHEN THE WRITER NODE FAILS?
What happens if a node fails?

We're using HAProxy w/Galera Cluster: we have a front-end for the writer node and a front-end for reads that balances queries across all of the nodes.HAProxy runs a health check on each node, checking the synced state. If the writer node isn't synced, HAProxy promotes one of the other nodes to the writer backend and takes the current writer offline. Frequently, MySQL MMM would end up taking all of the nodes offline in this case - HAProxy doesn't do this. We may lose a couple of queries when the writer backend is updated, but it won't result in an inconsistent data set across servers, which is far more crippling.
We don't automatically repair failed nodes, but that's OK. My primary concern is just making sure a healthy node is taking writes, and HAProxy does that.
Repairing failed nodes (and bringing new ones online)

When a node fails with standard MySQL replication, you put an intense load on one server while getting replication set up again (that single server taking not only reads and writes but the load from innodbbackupex). With Galera, you take one of the nodes offline (so you need a minimum of three nodes). That node becomes the donor - write operations to it are blocked. The node transfers its data to the failed node (or new node) via rsync. Then, both the donor node and the failed node catchup by running queries from the slave queue. Once they are both back to a Synced state, HAProxy will automatically mark them as Up and add them back to the front-end.
Galera Cluster supports regular MySQL too, so why did we switch to MariaDB?

Our reasons for switching to MariaDB are a mix of technical and political:

Easy Migration: First, MariaDB is a Drop-in replacement for MySQL. Migrating from from MySQL 5.1 to MariaDB Galera Server 5.6 just worked.
Performance: We had several queries covered by indexes that we had performance issues with before the migration that "fixed" themselves afterwards, which was a nice surprise. MariaDB seems to be much better at the complex queries and joins that Ruby on Rails is famous for.  It does a better job on the whole of finding indexes and like I said before, a lot of the queries that plagued us are now speedy. We haven't seen much if any difference in memory usage either, which was kind of surprising.  I expected there to be more usage with the overhead from Galera, but if there is, it's not noticeable.
Community: MariaDB has a lot of momentum and is adding more features than MySQL. There are a lot of concerns about the future of MySQL with Oracle and MariaDB looks like it will be around for a long while - even Google is switching to MariaDB.

Would we do it again?

Absolutely. We don't see any reasons to not switch to both MariaDB and Galera Cluster.
Monitoring Galera Cluster

TODO
Elsewhere

TODO