Skip to content

Instantly share code, notes, and snippets.

@master-q
Created November 10, 2015 02:00
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save master-q/697b6eba8b92a763345b to your computer and use it in GitHub Desktop.
Save master-q/697b6eba8b92a763345b to your computer and use it in GitHub Desktop.

How to shape failover over MySQL

Environment for testing

Server vagrant-db1

  • Hostname: vagrant-db1
  • OS: CentOS 5.7
  • eth0: 10.0.2.15 (for internet service)
  • eth1: 192.168.179.6 (for interconnect)

Server vagrant-db2

  • Hostname: vagrant-db2
  • OS: CentOS 5.7
  • eth0: 10.0.2.16 (for internet service)
  • eth1: 192.168.179.7 (for interconnect)

Server vagrant-web1

  • Hostname: vagrant-web1
  • OS: CentOS 5.7
  • eth0: 10.0.2.16 (for internet service)
  • eth1: 192.168.179.8 (for interconnect)

VIP

For connection by MySQL.

  • 192.168.179.100

Specification

Normal operation

  • The vagrant-db1 is DB master, the vagrant-db2 is DB slave, and the VIP points vagrant-db1.
  • Or the vagrant-db2 is DB master, the vagrant-db1 is DB slave, and the VIP points vagrant-db2.
  • MySQL client use the VIP to access MySQL server.

When failure occurs

Case A: DB master has failure

DB slave will become DB master, and the VIP points the DB master.

To fix it, you should setup a CentOS instance from scratch, and let it become DB slave.

Case B: DB slave has failure

DB master knows that DB slave shutdown.

To fix it, you should setup a CentOS instance from scratch, and let it become DB slave.

Setup

Install software into vagrant-web1

Install following software for stress testing.

vagrant-web1$ sudo yum install mysql-bench perl-DBD-MySQL

Install software into vagrant-db1

Install following software for failover.

  • heartbeat: 3.0.5
  • pacemaker: 1.0.13
  • mysql-server: 5.0.95
vagrant-db1$ sudo yum install mysql-server which
vagrant-db1$ cd /tmp
vagrant-db1$ wget 'http://osdn.jp/frs/redir.php?m=iij&f=%2Flinux-ha%2F61792%2Fpacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz'
vagrant-db1$ tar xfz pacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz
vagrant-db1$ cd pacemaker-1.0.13-2.1.el5.x86_64.repo
vagrant-db1$ sudo yum -c pacemaker.repo install heartbeat.x86_64 pacemaker.x86_64

Setup MySQL on vagrant-db1

Setup MySQL.

vagrant-db1$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=1
...
vagrant-db1$ sudo /sbin/service mysqld start
vagrant-db1$ mysql -u root -p
mysql> DELETE FROM mysql.user WHERE User = '';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT SUPER,REPLICATION SLAVE,REPLICATION CLIENT,PROCESS ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'slavepass';
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db1$ sudo /sbin/chkconfig mysqld on

Setup Heartbeat on vagrant-db1

Setup Heartbeat.

vagrant-db1$ sudo vi /etc/ha.d/ha.cf
pacemaker on
logfacility local1

debug 0
udpport 694

keepalive 2
warntime 20
deadtime 24
initdead 48

bcast eth1

node vagrant-db1
node vagrant-db2
vagrant-db1$ sudo vi /etc/ha.d/authkeys
auth 1
1 sha1 centabcdefg
vagrant-db1$ sudo chown root:root /etc/ha.d/authkeys
vagrant-db1$ sudo chmod 600 /etc/ha.d/authkeys
vagrant-db1$ sudo vi /etc/syslog.conf
...
*.info;mail.none;authpriv.none;cron.none;local1.none    /var/log/messages
...
# Save pacemaker log
local1.*                                                /var/log/ha-log

Boot Pacemaker.

vagrant-db1$ sudo /sbin/service heartbeat start

Install software into vagrant-db2

Install following software for failover.

vagrant-db2$ sudo yum install mysql-server which
vagrant-db2$ cd /tmp
vagrant-db2$ wget 'http://osdn.jp/frs/redir.php?m=iij&f=%2Flinux-ha%2F61792%2Fpacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz'
vagrant-db2$ tar xfz pacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz
vagrant-db2$ cd pacemaker-1.0.13-2.1.el5.x86_64.repo
vagrant-db2$ sudo yum -c pacemaker.repo install heartbeat.x86_64 pacemaker.x86_64

Setup MySQL on vagrant-db2

Setup MySQL.

vagrant-db2$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=2
...
vagrant-db2$ sudo /sbin/service mysqld start
vagrant-db2$ mysql -u root -p
mysql> DELETE FROM mysql.user WHERE User = '';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT SUPER,REPLICATION SLAVE,REPLICATION CLIENT,PROCESS ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'slavepass';
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db2$ sudo /sbin/chkconfig mysqld on

Setup Heartbeat on vagrant-db2

Setup Heartbeat.

vagrant-db2$ sudo vi /etc/ha.d/ha.cf
pacemaker on
logfacility local1

debug 0
udpport 694

keepalive 2
warntime 20
deadtime 24
initdead 48

bcast eth1

node vagrant-db1
node vagrant-db2
vagrant-db2$ sudo vi /etc/ha.d/authkeys
auth 1
1 sha1 centabcdefg
vagrant-db2$ sudo chown root:root /etc/ha.d/authkeys
vagrant-db2$ sudo chmod 600 /etc/ha.d/authkeys
vagrant-db2$ sudo vi /etc/syslog.conf
...
*.info;mail.none;authpriv.none;cron.none;local1.none    /var/log/messages
...
# Save pacemaker log
local1.*                                                /var/log/ha-log

Boot Pacemaker.

vagrant-db2$ sudo /sbin/service heartbeat start

Check status of Pacemaker

Please check Pacemaker status using crm_mon command. It needs about 1 minute by everything online.

vagrant-db2$ sudo /usr/sbin/crm_mon
============
Last updated: Fri Jul 10 18:40:44 2015
Stack: Heartbeat
Current DC: vagrant-db2 (ca03e33c-82bb-4da9-bf64-bba48df33141) - partition with quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
0 Resources configured.
============

Online: [ vagrant-db1 vagrant-db2 ]

Setup Pacemaker on vagrant-db1

Setup Pacemaker using crm command. If you have custom settings on MySQL, some parameter for ocf:heartbeat:mysql is needed. For more detail, please read MySQL (resource agent) page.

vagrant-db1$ sudo bash
vagrant-db1# export PATH=$PATH:/usr/sbin
vagrant-db1# crm node standby vagrant-db2
vagrant-db1# crm configure
crm(live)configure# primitive vip_192.168.179.100 ocf:heartbeat:IPaddr2 params ip="192.168.179.100" cidr_netmask="24" nic="eth1"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node vagrant-db1
crm(live)configure# node vagrant-db2
crm(live)configure# commit
crm(live)configure# quit
vagrant-db1# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
vagrant-db1# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
vagrant-db1# crm configure colocation vip_on_mysql inf: vip_192.168.179.100 mysql-clone:Master
vagrant-db1# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.179.100:start
vagrant-db1# crm node online vagrant-db2

Here, we are ready for replication of MySQL.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Testing replication

Create a table on vagrant-db1

First, try to create table at vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> create database example;
mysql> create table example.dummy (`id` varchar(10));
mysql> show tables in example;
mysql> quit;

Check the table and create another table on vagrant-db2

Please check the table at vagrant-db2.

vagrant-db2# mysql -u root -p
mysql> show tables in example;
mysql> quit;

Switch database master into vagrant-db2. And try to create table at vagrant-db2.

vagrant-db2# crm node standby vagrant-db1
vagrant-db2# sleep 10
vagrant-db2# crm node online vagrant-db1
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]
vagrant-db2# mysql -u root -p
mysql> create database example2;
mysql> create table example2.dummy (`id` varchar(100));
mysql> show tables in example2;
mysql> quit;

Check the table on vagrant-db1

Please check the table at vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> show tables in example2;
mysql> quit;

Testing on test environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Test A: Vagrant-db1 halts

To emulate the test case, simply shutdown vagrant-db1.

vagrant-db1# /sbin/shutdown -h now

After the testing, vagrant-db2 should be master.

vagrant-db2# crm_mon
Online: [ vagrant-db2 ]
OFFLINE: [ vagrant-db1 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Stopped: [ mysql:0 ]

To fix it, you should setup a CentOS instance from scratch, and do following.

  • Install software into vagrant-db1
  • Setup Heartbeat on vagrant-db1

Get MySQL database dump on vagrant-db2.

vagrant-db2# mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000012 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db2# mysqldump -u root -p -x --all-databases --lock-all-tables > /vagrant_data/db.dump
vagrant-db2# mysqldump -u root -p -x --allow-keywords --lock-all-tables mysql > /vagrant_data/dbuser.dump
vagrant-db2# mysql -u root -p
mysql> UNLOCK TABLES;

And repair MySQL database on vagrant-db1.

vagrant-db1$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=1
...
vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/chkconfig mysqld on
vagrant-db1# mysql -u root -p < /vagrant_data/db.dump
vagrant-db1# mysql -u root -p mysql < /vagrant_data/dbuser.dump
vagrant-db1# mysql -u root -p
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db1# /sbin/service mysqld stop

Restart Pacemaker on vagrant-db1.

vagrant-db1# /sbin/service heartbeat restart
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]
OFFLINE: [ vagrant-db1 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

vagrant-db1# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Set master status into vagrant-db2.

vagrant-db1# mysql -u root -p
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000004 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db1# crm node standby vagrant-db2
vagrant-db1# sleep 10
vagrant-db1# crm node online vagrant-db2
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

vagrant-db2# mysql -u root -p
mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 98;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test B: Vagrant-db1 is temporarily disconnected

To emulate the test case, simply reboot vagrant-db1.

vagrant-db1# reboot

After the testing, vagrant-db2 should be master and vagrant-db1 should be slave.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

Pacemaker automatically fix it.

Test C: MySQL server on vagrant-db1 is killed

To emulate the test case, simply stop MySQL server.

vagrant-db1# /sbin/service mysqld stop

After the testing, vagrant-db2 should be slave with some errors.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Stopped: [ mysql:0 ]

Failed actions:
    mysql:0_monitor_10000 (node=vagrant-db1, call=21, rc=7, status=complete): not running
    mysql:0_demote_0 (node=vagrant-db1, call=64, rc=7, status=complete): not running
    mysql:0_start_0 (node=vagrant-db1, call=68, rc=1, status=complete): unknown error

To fix it, you should run MySQL and restart Pacemaker at vagrant-db1.

vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/service heartbeat restart
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

Test D: Vagrant-db2 halts

To emulate the test case, simply shutdown vagrant-db2.

vagrant-db2# /sbin/shutdown -h now

After the testing, vagrant-db1 should be master.

vagrant-db1# crm_mon
Online: [ vagrant-db1 ]
OFFLINE: [ vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Stopped: [ mysql:1 ]

To fix it, you should setup a CentOS instance from scratch, and do following.

  • Install software into vagrant-db2
  • Setup Heartbeat on vagrant-db2

Get MySQL database dump on vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000011 |      270 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db1# mysqldump -u root -p -x --all-databases --lock-all-tables > /vagrant_data/db.dump
vagrant-db1# mysqldump -u root -p -x --allow-keywords --lock-all-tables mysql > /vagrant_data/dbuser.dump
vagrant-db1# mysql -u root -p
mysql> UNLOCK TABLES;

And repair MySQL database on vagrant-db2.

vagrant-db2$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=2
...
vagrant-db2# /sbin/service mysqld start
vagrant-db2# /sbin/chkconfig mysqld on
vagrant-db2# mysql -u root -p < /vagrant_data/db.dump
vagrant-db2# mysql -u root -p mysql < /vagrant_data/dbuser.dump
vagrant-db2# mysql -u root -p
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db2# /sbin/service mysqld stop

Restart Pacemaker on vagrant-db2.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]
OFFLINE: [ vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

vagrant-db2# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Set master status into vagrant-db1.

vagrant-db2# mysql -u root -p
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000004 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db2# crm node standby vagrant-db1
vagrant-db2# sleep 10
vagrant-db2# crm node online vagrant-db1
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

vagrant-db1# mysql -u root -p
mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 98;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test E: Vagrant-db2 is temporarily disconnected

To emulate the test case, simply reboot vagrant-db2.

vagrant-db2# reboot

After the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Pacemaker automatically fix it.

Test F: MySQL server on vagrant-db2 is killed

To emulate the test case, simply stop MySQL server.

vagrant-db2# /sbin/service mysqld stop

After the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Pacemaker automatically fix it.

Stress testing on test environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

How to run mysql-bench

Add "sysbench" user and "test" database on MySQL.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]
vagrant-db1# mysql -u root -p
mysql> GRANT ALL PRIVILEGES ON *.* TO 'sysbench'@"%" identified by 'sysbenchpass';
mysql> FLUSH PRIVILEGES;
mysql> CREATE DATABASE test;

Run mysql-bench at vagrant-web1.

vagrant-web1# cd /usr/share/sql-bench
vagrant-web1# while true; do sudo ./test-insert --server=mysql --host=192.168.179.100 --user=sysbench --password=sysbenchpass; done

How to check SLAVE STATUS

Switch database master into vagrant-db2. And check SLAVE STATUS as Yes at vagrant-db1.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]
vagrant-db1# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Switch database master into vagrant-db1. And check SLAVE STATUS as Yes at vagrant-db2.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]
vagrant-db2# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test A': Vagrant-db1 halts, with stress

Do "Test A: Vagrant-db1 halts" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Test B': Vagrant-db1 is temporarily disconnected, with stress

Do "Test B: Vagrant-db1 is temporarily disconnected" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Test C': MySQL server on vagrant-db1 is killed, with stress

Do "Test C: MySQL server on vagrant-db1 is killed" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Testing on production environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave. Following testing needs a person who manually updates user-setting on web browser.

Test A'': Vagrant-db1 halts while using product

Do "Test A: Vagrant-db1 halts" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test B'': Vagrant-db1 is temporarily disconnected while using product

Do "Test B: Vagrant-db1 is temporarily disconnected" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test C'': MySQL server on vagrant-db1 is killed while using product

Do "Test C: MySQL server on vagrant-db1 is killed" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test G'': Web servers are temporarily disconnected while using product

Reboot 1st web server and 2nd web server while user-setting is updated continuously. Finally, check SLAVE STATUS.

Trouble shooting

Error "mysql:0_start_0 (node=vagrant-db1 ...): unknown error" occurs on crm_mon

To fix it, do following on the server.

vagrant-db1# /sbin/service heartbeat stop
vagrant-db1# /sbin/chkconfig mysqld on
vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/service heartbeat start

Error "Slave I/O thread: error connecting to master" occurs on mysql.log

vagrant-db2# less /var/log/mysqld.log
150804  8:55:17 [ERROR] Slave I/O thread: error connecting to master 'repl@vagrant-db1:3306': Error: 'Access denied for user 'repl'@'vagrant-db2' (using password: YES)'  errno: 1045  retry-time: 60  retries: 86400

It's caused by no access setting on MySQL. Please confirm MySQL setting about "repl" user, and try to check following:

vagrant-db2# mysql -h vagrant-db1 -u repl -p
Enter password: slavepass
--snip--
mysql>

Error "Could not find first log file name in binary log index file" occurs on mysql.log

vagrant-db2# less /var/log/mysqld.log
150805 21:42:42 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
150805 21:42:42 [ERROR] Got fatal error 1236: 'Could not find first log file name in binary log index file' from master when reading data from binary log

It's caused by not-matching of MySQL "CHANGE MASTER TO" settings. Then you should try to get status of master as following:

vagrant-db1# mysql -u root -p
mysql> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000011 |       98 |              |                  |
+------------------+----------+--------------+------------------+

And please run "CHANGE MASTER TO" as following:

vagrant-db2# mysql -u root -p
mysql> stop slave;
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=98;
mysql> start slave;
@master-q
Copy link
Author

Dec 16 09:38:11 centillion heartbeat: [5450]: info: all clients are now paused
Dec 16 09:39:34 centillion heartbeat: [5450]: WARN: Message hist queue is filling up (376 messages in queue)

@master-q
Copy link
Author

[root@centillion ~]# traceroute -p 694 centillion.db01
traceroute to centillion.db01 (127.0.0.1), 30 hops max, 40 byte packets
 1  centillion.db01 (127.0.0.1)  0.055 ms  0.018 ms  0.014 ms
[root@centillion ~]# traceroute -p 694 centillion.db02
traceroute to centillion.db02 (192.168.0.140), 30 hops max, 40 byte packets
 1  centillion.db02 (192.168.0.140)  0.272 ms  0.242 ms !X  0.235 ms !X
[root@centillion heartbeat]# traceroute -p 694 centillion.db02
traceroute to centillion.db02 (127.0.0.1), 30 hops max, 40 byte packets
 1  centillion.db02 (127.0.0.1)  0.086 ms  0.019 ms  0.015 ms
[root@centillion heartbeat]# traceroute -p 694 centillion.db01
traceroute to centillion.db01 (192.168.0.141), 30 hops max, 40 byte packets
 1  centillion.db01 (192.168.0.141)  0.191 ms  0.148 ms !X  0.121 ms !X

@master-q
Copy link
Author

[root@centillion heartbeat]# traceroute -p 694 centillion.db02
traceroute to centillion.db02 (192.168.0.140), 30 hops max, 40 byte packets
 1  centillion.db02 (192.168.0.140)  0.491 ms  0.148 ms  0.477 ms

@master-q
Copy link
Author

============
Last updated: Wed Dec 16 13:01:43 2015
Stack: Heartbeat
Current DC: centillion.db02 (7bc4a498-391d-4a72-a962-7506685f4df4) - partition w
ith quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
0 Resources configured.
============

Node centillion.db02 (7bc4a498-391d-4a72-a962-7506685f4df4): pending
Online: [ centillion.db01 ]

@master-q
Copy link
Author

Dec 16 13:37:39 centillion heartbeat: [3456]: ERROR: should_drop_message: attempted replay attack [centillion.db02]? [gen = 1447073569, curgen = 1450240589]

@master-q
Copy link
Author

centillion.db02:

[root@centillion ~]# nc -l -u 694
###
11:(0)t=status,12:(0)st=active,10:(0)dt=5dc0,13:(0)protocol=1,22:(0)src=centillion.db02,27:(1)srcuuid={弔9Jr・uh_M??12:(0)seq=3188f,14:(0)hg=56409721,14:(0)ts=56716ff1,32:(0)ld=0.00 0.00 0.00 1/123 14745,8:(0)ttl=3,50:(0)auth=1 8a9b74109eaf585dab2986d084a1ef6a39a2ece7,%%%
42:1 09dcf8a6133bef4f92e9b2285e5165c4e5957d35,

centillion.db01:

[root@centillion heartbeat]# echo "hoge" | nc -u centillion.db02 694
nc: Write error: Connection refused

[root@centillion heartbeat]# echo "hoge" | nc -u centillion.db02 694
nc: Write error: Connection refused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment