Skip to content

Instantly share code, notes, and snippets.

@shannonmitchell
Created January 23, 2017 20:20
Show Gist options
  • Save shannonmitchell/6a50c308ba9c712758721a8faa01e381 to your computer and use it in GitHub Desktop.
Save shannonmitchell/6a50c308ba9c712758721a8faa01e381 to your computer and use it in GitHub Desktop.
###########
# Notes
###########
The idea here is to simulate a data loss incident due to a network partition. The following steps were taken:
- We will take infra02-rabbit-mq out of the cluster by blocking network access
- We will restart the service on infra02-rabbitmq
- We will modify the data on the good partition.
- We will join the infra02-rabbit-mq server back into the cluster
- We will then restart both nodes on the good partition.
- The good partition devices will start and pull from the bad infra02-rabbit-mq partitioned node causing the changes to be lost
infra01-rabbit-mq-container-23334667(172.29.237.12)
infra02-rabbit-mq-container-533fe11a(172.29.238.200)
infra03-rabbit-mq-container-92cb2c0d(172.29.239.176)
##########################################
# Isolate and partition infra02-rabbit-mq
##########################################
iptables -A OUTPUT -m tcp -p tcp -d 172.29.237.12 -j DROP
iptables -A OUTPUT -m tcp -p tcp -d 172.29.239.176 -j DROP
iptables -n -L
- At this point we need to wait for the cluster to turn off the partitioned node. the rabbitmqctl list_vhosts command
will fail when this happens.
root@infra02-rabbit-mq-container-533fe11a:~# rabbitmqctl list_vhosts
Error: rabbit application is not running on node rabbit@infra02-rabbit-mq-container-533fe11a.
* Suggestion: start it with "rabbitmqctl start_app" and try again
- Start the app manually(will take a long time as it tries to reconnect to the cluster)
root@infra02-rabbit-mq-container-533fe11a:~# rabbitmqctl start_app
Starting node 'rabbit@infra02-rabbit-mq-container-533fe11a' ...
root@infra02-rabbit-mq-container-533fe11a:~# rabbitmqctl list_vhosts
Listing vhosts ...
/neutron
/heat
/keystone
/
/cinder
/nova
/glance
/testme
###########################################################################
# Modify data on the good side of the partition while connectivity is down
###########################################################################
root@infra03-rabbit-mq-container-92cb2c0d:~# rabbitmqctl delete_vhost /testme
Deleting vhost "/testme" ...
root@infra03-rabbit-mq-container-92cb2c0d:~# rabbitmqctl add_vhost /testme2
Creating vhost "/testme2" ...
root@infra03-rabbit-mq-container-92cb2c0d:~# rabbitmqctl list_vhosts
Listing vhosts ...
/neutron
/heat
/keystone
/testme2
/
/cinder
/nova
/glance
##############################################################
# Add rabbitmq02 back into the cluster and view the partition
##############################################################
- After bringing it back, you can see how infra02-rabbit-mq is partitioned from the others.
- You can also see that the data is different on infra02-rabbit-mq as it has a '/testme' vhost and infra03-rabbit-mq has
a '/testme2' vhost due to the changes.
root@infra02-rabbit-mq-container-533fe11a:~# iptables -F
root@infra02-rabbit-mq-container-533fe11a:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@infra02-rabbit-mq-container-533fe11a' ...
[{nodes,
[{disc,
['rabbit@infra01-rabbit-mq-container-23334667',
'rabbit@infra02-rabbit-mq-container-533fe11a',
'rabbit@infra03-rabbit-mq-container-92cb2c0d']}]},
{running_nodes,['rabbit@infra02-rabbit-mq-container-533fe11a']},
{cluster_name,<<"openstack">>},
{partitions,
[{'rabbit@infra02-rabbit-mq-container-533fe11a',
['rabbit@infra01-rabbit-mq-container-23334667',
'rabbit@infra03-rabbit-mq-container-92cb2c0d']}]},
{alarms,[{'rabbit@infra02-rabbit-mq-container-533fe11a',[]}]}]
root@infra02-rabbit-mq-container-533fe11a:~# rabbitmqctl list_vhosts | grep test
/testme
root@infra03-rabbit-mq-container-92cb2c0d:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@infra03-rabbit-mq-container-92cb2c0d' ...
[{nodes,
[{disc,
['rabbit@infra01-rabbit-mq-container-23334667',
'rabbit@infra02-rabbit-mq-container-533fe11a',
'rabbit@infra03-rabbit-mq-container-92cb2c0d']}]},
{running_nodes,
['rabbit@infra01-rabbit-mq-container-23334667',
'rabbit@infra03-rabbit-mq-container-92cb2c0d']},
{cluster_name,<<"openstack">>},
{partitions,
[{'rabbit@infra01-rabbit-mq-container-23334667',
['rabbit@infra02-rabbit-mq-container-533fe11a']},
{'rabbit@infra03-rabbit-mq-container-92cb2c0d',
['rabbit@infra02-rabbit-mq-container-533fe11a']}]},
{alarms,
[{'rabbit@infra01-rabbit-mq-container-23334667',[]},
{'rabbit@infra03-rabbit-mq-container-92cb2c0d',[]}]}]
root@infra03-rabbit-mq-container-92cb2c0d:~# rabbitmqctl list_vhosts | grep test
/testme2
#################################################################
# Restart the good partition nodes and watch /testme2 dissapear
#################################################################
- Restarting the good partition nodes(at the same time) will cause them to sync with the bad(infra02-rabbit-mq) node. This will
cause the /testme2 vhost to dissapear.
root@infra03-rabbit-mq-container-92cb2c0d:~# service rabbitmq-server restart
* Restarting message broker rabbitmq-server
root@infra01-rabbit-mq-container-23334667:~# service rabbitmq-server restart
* Restarting message broker rabbitmq-server
root@infra03-rabbit-mq-container-92cb2c0d:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@infra03-rabbit-mq-container-92cb2c0d' ...
[{nodes,[{disc,['rabbit@infra01-rabbit-mq-container-23334667',
'rabbit@infra02-rabbit-mq-container-533fe11a',
'rabbit@infra03-rabbit-mq-container-92cb2c0d']}]},
{running_nodes,['rabbit@infra01-rabbit-mq-container-23334667',
'rabbit@infra02-rabbit-mq-container-533fe11a',
'rabbit@infra03-rabbit-mq-container-92cb2c0d']},
{cluster_name,<<"openstack">>},
{partitions,[]},
{alarms,[{'rabbit@infra01-rabbit-mq-container-23334667',[]},
{'rabbit@infra02-rabbit-mq-container-533fe11a',[]},
{'rabbit@infra03-rabbit-mq-container-92cb2c0d',[]}]}]
root@infra03-rabbit-mq-container-92cb2c0d:~# rabbitmqctl list_vhosts | grep test
/testme
root@infra01-rabbit-mq-container-23334667:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@infra01-rabbit-mq-container-23334667' ...
[{nodes,[{disc,['rabbit@infra01-rabbit-mq-container-23334667',
'rabbit@infra02-rabbit-mq-container-533fe11a',
'rabbit@infra03-rabbit-mq-container-92cb2c0d']}]},
{running_nodes,['rabbit@infra02-rabbit-mq-container-533fe11a',
'rabbit@infra03-rabbit-mq-container-92cb2c0d',
'rabbit@infra01-rabbit-mq-container-23334667']},
{cluster_name,<<"openstack">>},
{partitions,[]},
{alarms,[{'rabbit@infra02-rabbit-mq-container-533fe11a',[]},
{'rabbit@infra03-rabbit-mq-container-92cb2c0d',[]},
{'rabbit@infra01-rabbit-mq-container-23334667',[]}]}]
root@infra01-rabbit-mq-container-23334667:~# rabbitmqctl list_vhosts | grep test
/testme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment