title | date | draft | tags | menu | |||||
---|---|---|---|---|---|---|---|---|---|
Manually Unfederating a Nomad Cluster using a Network Partition |
2017-12-07 12:49:20 -0500 |
false |
|
|
This is an advanced process that can be used to unfederate a Nomad cluster with minimal impact to running client jobs. For clusters where the current job state is easily recreated, it is easier to stop the jobs in the cluster, wipe the server's state, and resubmit the jobs.
Scenario Cluster
A 9-node cluster federated in Consul. Nomad configured to automatically discover nodes based on Consul data.
- Cluster A
- mr-a-1 - 10.0.0.214
- mr-a-2 - 10.0.0.218
- mr-a-3 - 10.0.0.28
- Cluster B
- mr-b-1 - 10.0.0.70
- mr-b-2 - 10.0.0.179
- mr-b-3 - 10.0.0.55
- Cluster C
- mr-c-1 - 10.0.0.87
- mr-c-2 - 10.0.0.18
- mr-c-3 - 10.0.0.187
CentOS 7 Nomad 0.7.0 Enterprise Using firewalld locally on the boxes
The cluster topology starts in a federated state based on the Consul information:
[root@mr-a-1 ~]# nomad server-members
Name Address Port Status Leader Protocol Build Datacenter Region
mr-a-1.node.consul.global 10.0.0.214 4648 alive false 2 0.7.0+ent dc1 global
mr-a-2.node.consul.global 10.0.0.218 4648 alive false 2 0.7.0+ent dc1 global
mr-a-3.node.consul.global 10.0.0.28 4648 alive false 2 0.7.0+ent dc1 global
mr-b-1.node.consul.global 10.0.0.70 4648 alive false 2 0.7.0+ent dc2 global
mr-b-2.node.consul.global 10.0.0.179 4648 alive false 2 0.7.0+ent dc2 global
mr-b-3.node.consul.global 10.0.0.55 4648 alive true 2 0.7.0+ent dc2 global
mr-c-1.node.consul.global 10.0.0.87 4648 alive false 2 0.7.0+ent dc3 global
mr-c-2.node.consul.global 10.0.0.18 4648 alive false 2 0.7.0+ent dc3 global
mr-c-3.node.consul.global 10.0.0.187 4648 alive false 2 0.7.0+ent dc3 global
Three discrete clusters of three nodes, each unaware of the other nodes.
[root@mr-a-1 ~]# nomad server-members
Name Address Port Status Leader Protocol Build Datacenter Region
mr-a-1.node.consul.global 10.0.0.214 4648 alive false 2 0.7.0+ent dc1 global
mr-a-2.node.consul.global 10.0.0.218 4648 alive false 2 0.7.0+ent dc1 global
mr-a-3.node.consul.global 10.0.0.28 4648 alive true 2 0.7.0+ent dc1 global
[root@mr-b-1 ~]# nomad server-members
Name Address Port Status Leader Protocol Build Datacenter Region
mr-b-1.node.consul.global 10.0.0.70 4648 alive false 2 0.7.0+ent dc2 global
mr-b-2.node.consul.global 10.0.0.179 4648 alive true 2 0.7.0+ent dc2 global
mr-b-3.node.consul.global 10.0.0.55 4648 alive false 2 0.7.0+ent dc2 global
[root@mr-c-1 ~]# nomad server-members
Name Address Port Status Leader Protocol Build Datacenter Region
mr-c-1.node.consul.global 10.0.0.87 4648 alive false 2 0.7.0+ent dc3 global
mr-c-2.node.consul.global 10.0.0.18 4648 alive true 2 0.7.0+ent dc3 global
mr-c-3.node.consul.global 10.0.0.187 4648 alive false 2 0.7.0+ent dc3 global
On the nodes, add a consul
stanza to the top level of the configuration. This stanza must include server_auto_join = false
and client_auto_join = false
. For example, in HCL:
...
consul {
server_auto_join = false
client_auto_join = false
}
...
Add the information about the cluster's addresses to the server
stanza. Using retry_join
is a preferred method:
...
server {
retry_join = ["10.0.0.87","10.0.0.18","10.0.0.187"] # for a cluster C node as an example
... other server options...
}
...
Do not restart the nodes at this time. We will do that in a future step.
You can also pre-build the peers.json file as directed below.
NOTE: This file MUST not be put into the raft folder while the node is up.
Create firewall rules that prevent communication between the clusters that you want to separate. For my sample cluster, I will partition it into three separate clusters. On cluster A, I want to deny all traffic from clusters B and C; for cluster B, I want to deny A and C; for C, deny A and B.
# Cluster B
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.70" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.179" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.55" reject'
# Cluster C
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.87" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.18" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.187" reject'
# Cluster A
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.214" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.218" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.28" reject'
# Cluster C
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.87" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.18" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.187" reject'
# Cluster A
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.214" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.218" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.28" reject'
# Cluster B
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.70" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.179" reject'
firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="10.0.0.55" reject'
After the Nomad process is stopped, it won't be possible to submit new jobs to the cluster. Existing jobs will continue running without issue. You will need to perform these operations on all clusters before starting Nomad again.
Deleting the Serf snapshot is required to prevent the nodes in the clusters from reconnecting. The Serf snapshot file can be found in the «data_dir»/server/serf folder
rm -f «data_dir»/server/serf/snapshot
If the cluster membership changes will cause the new configuration to be unable to reach quorum (which is typical in this scenario), update the membership information using a peers.json file.
In the «data_dir»/server/raft folder, there is a peers.info file with additional information about the process.
Create «data_dir»/server/raft/peers.json
with a list of cluster members. For example, in my cluster C, the peers.json file would contain:
["10.0.0.87:4647","10.0.0.18:4647","10.0.0.187:4647"]
curl http://127.0.0.1:8500/v1/catalog/nodes | jq --compact-output '[.[] | .Address+":4647"]' > peers.json
Verify that the peers.json contains the correct nodes to be reclustered.
Use nomad server-members
to verify that the clusters are now separate.
The firewall rules created in step 2 are no longer necessary, so you can remove them. Because my example cluster is using firewalld, I would run firewall-cmd --reload
to remove the temporary rules.