Skip to content

Instantly share code, notes, and snippets.

@bernadinm
Last active March 28, 2021 00:17
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save bernadinm/f4c952c2469b50a9ed068fdc86509a46 to your computer and use it in GitHub Desktop.
Save bernadinm/f4c952c2469b50a9ed068fdc86509a46 to your computer and use it in GitHub Desktop.
Procedure to Replace Single Master with Master Discovery Set to Static

Master Replacement with Static List Procedure

Objective: WIth this procedure, we want to replace node 172.31.1.23 with node 172.31.7.30. Also, it is required that we change exhibitor_zk_path to something unique in order for this procedure to work.

Begin

Configure the bootstrap node with the modified config.yaml that has a change to a new exhibitor_zk_path along with the master node that you want to replace for master_list

Important NOTE: The most important piece is exhibitor_storage_backend: zookeeper and exhibitor_zk_path: "/masterip-change-20170127". You must set exhibitor_zk_path to be something unique otherwise your cluster wont converge.

Previous Config

---
bootstrap_url: http://172.31.11.148:80
cluster_name: IP_ADDRESS_MASTER_REPLACE
exhibitor_storage_backend: zookeeper
exhibitor_zk_hosts: 172.31.4.24:2181
exhibitor_zk_path: "/"
log_directory: /genconf/logs
master_discovery: static
master_list:
   - 172.31.14.42
   - 172.31.9.219
   - 172.31.7.61
   - 172.31.6.139
   - 172.31.1.23
mesos_dns_ip_sources:
- mesos
- docker
- host
process_timeout: 120
resolvers:
- 8.8.8.8
roles: slave_public
ssh_key_path: /genconf/ssh-key
weights: slave_public=1

After Change

---
bootstrap_url: http://172.31.11.148:80
cluster_name: IP_ADDRESS_MASTER_REPLACE
exhibitor_storage_backend: zookeeper
exhibitor_zk_hosts: 172.31.4.24:2181
exhibitor_zk_path: "/masterip-change-20170127"
log_directory: /genconf/logs
master_discovery: static
master_list:
   - 172.31.14.42
   - 172.31.9.219
   - 172.31.7.61
   - 172.31.6.139
   - 172.31.7.30
mesos_dns_ip_sources:
- mesos
- docker
- host
process_timeout: 120
resolvers:
- 8.8.8.8
roles: slave_public
ssh_key_path: /genconf/ssh-key
weights: slave_public=1

On the bootstrap node, run these commands below

 # Generate new config files with updated master list
sudo bash dcos_generate_config.ee.sh --genconf
 # Remove the docker restart 
sed -i -e "s/systemctl restart systemd-journald//g" -e "s/systemctl restart docker//g" genconf/serve/dcos_install.sh 
# Setup Nginx
sudo docker run -d -p 80:80 -v $PWD/genconf/serve:/usr/share/nginx/html:ro nginx 

Shutdown Permanently or Uninstall Old Master Being Replaced

# Uninstall
sudo -i /opt/mesosphere/bin/pkgpanda uninstall
sudo rm -rf /opt/mesosphere /var/lib/mesosphere /etc/mesosphere /var/lib/zookeeper /var/lib/dcos/exhibitor/zookeeper /var/lib/mesos /var/lib/dcos  /run/dcos
sudo rm -rf /etc/profile.d/dcos.sh /etc/systemd/journald.conf.d/dcos.conf /etc/systemd/system/dcos-download.service /etc/systemd/system/dcos-link-env.service /etc/systemd/system/dcos-setup.service /etc/systemd/system/multi-user.target.wants/dcos-setup.service /etc/systemd/system/multi-user.target.wants/dcos.target 
sudo rm -fr /run/mesos /var/log/mesos /tmp/dcos
sudo systemctl daemon-reload
sudo reboot
# Completed

On all master perform the upgrade (in any order and/or simultaneously – quorum needs to be recreated)

Existing Masters

# Master Commands
curl -O http://172.31.11.148/dcos_install.sh
sudo -i /opt/mesosphere/bin/pkgpanda uninstall
sudo rm -rf /opt/mesosphere /etc/mesosphere 
sudo useradd --system --home-dir /opt/mesosphere --shell /sbin/nologin -c 'DCOS System User' dcos_exhibitor 
sudo chown -R dcos_exhibitor /var/lib/zookeeper
sudo bash dcos_install.sh -d master
# Complete

New Master

# Master Commands
curl -O http://172.31.11.148/dcos_install.sh
sudo useradd --system --home-dir /opt/mesosphere --shell /sbin/nologin -c 'DCOS System User' dcos_exhibitor 
sudo chown -R dcos_exhibitor /var/lib/zookeeper
sudo bash dcos_install.sh -d master
# Complete

For the agents launch a marathon job that will replace the static ips locally on each agent:

We will be changing the ip from 172.31.1.23 to 172.31.7.30 in our sed command here 's/172.31.1.23/172.31.7.30/g' and printing the result. Change the number of instances from 1000000 to the number of agents currently registered on the cluster.

{
  "id": "/update-master-ip",
  "cmd": "sudo cat /opt/mesosphere/packages/dcos-config--setup_*/etc/master_list ;sudo sed -i.bak  's/172.31.1.23/172.31.7.30/g' /opt/mesosphere/packages/dcos-config--setup_*/etc/master_list; sudo cat /opt/mesosphere/packages/dcos-config--setup_*/etc/master_list  ; sleep 10000000",
  "cpus": 0.01,
  "mem": 32,
  "disk": 0,
  "instances": 1000000,
  "acceptedResourceRoles": [
    "*",
    "slave_public"
  ],
 "constraints": [
    [
      "hostname",
      "UNIQUE"
    ]
  ]
}

Output

Registered executor on 172.31.10.255
Starting task update-master-ip.b95d58a5-e749-11e6-b02b-663fb0863958
Forked command at 18349
sh -c 'sudo cat /opt/mesosphere/packages/dcos-config--setup**/etc/master_list ;sudo sed -i.bak  's/172.31.1.23/172.31.7.30/g' /opt/mesosphere/packages/dcos-config--setup_*/etc/master_list; sudo cat /opt/mesosphere/packages/dcos-config--setup_*/etc/master*list  ; sleep 10000000'
<"172.31.14.42", "172.31.9.219", "172.31.7.61", "172.31.6.139", "172.31.1.23">
["172.31.14.42", "172.31.9.219", "172.31.7.61", "172.31.6.139", "172.31.7.30"]

Finish

@kopax
Copy link

kopax commented Mar 26, 2021

Hi, I got a master that burned in the OVH fire on the 9th of march, I now have the replacement server, IP are the same, I use https://github.com/eBayClassifiedsGroup/PanteraS to create my mesos paas, do you know how I can bring it in the pool of master? I expected to start directly when using the same IP as before

@bernadinm
Copy link
Author

Hi @kopax this procedure is for that exact situation. You'll need to run through this procedure and ensure that it gets executed on all the masters (including the new master you're bringing up). This will tell the existing masters that the IP address they have needs to be changed while allowing the newer node to join the quorum. This will also allow the new master node is installed as well with the proper configuration.

If you have any questions feel free to leave a comment and/or D2iQ support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment