Skip to content

Instantly share code, notes, and snippets.

@bergerx
Created November 23, 2017 18:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bergerx/35acbfe90a4d93fc6e7cbd2ee40b07a0 to your computer and use it in GitHub Desktop.
Save bergerx/35acbfe90a4d93fc6e7cbd2ee40b07a0 to your computer and use it in GitHub Desktop.
mesos-graceful-shutdown for running DCOS nodes in ASG

Shut mesos down gracefully only when the node is shutting down but NOT rebooting.

A clean shutdown will cause the tasks that were scheduled on this node to be rescheduled to another node. Since a rebooting node will return to the cluster shortly, it's better to leave the rebooting node in an unhealthy state so that its tasks continue running on it when it rejoins the cluster.

This has particular importance when you manage your nodes in AWS autoscale groups. When scaling an ASG down you'll find stale agents around.

#!/bin/bash
# Map dcos role names to the systemd unit that runs the mesos agent
declare -A systemd_units
systemd_units["mesos-private-agent"]="dcos-mesos-slave.service"
systemd_units["mesos-public-agent"]="dcos-mesos-slave-public.service"
systemd_units["mesos-master"]=""
# Check that this feature is enabled via tags
mesos_attr_dir=/etc/mesos-slave/attributes
grace_enabled=$(cat "${mesos_attr_dir}/graceful_mesos_shutdown" 2> /dev/null)
role=$(cat "${mesos_attr_dir}/role" 2> /dev/null)
unit=${systemd_units["${role}"]}
# Shut mesos down gracefully only when the node is shutting down but NOT rebooting.
# A clean shutdown will cause the tasks that were scheduled on this node to be rescheduled
# to another node. Since a rebooting node will return to the cluster shortly, it's better
# to leave the rebooting node in an unhealthy state so that its tasks continue running on it
# when it rejoins the cluster.
reboot=$(systemctl list-jobs | awk '($2 == "reboot.target") && ($3 == "start") {print "true"}')
shutdown=$(systemctl list-jobs | awk '($2 == "shutdown.target") && ($3 == "start") {print "true"}')
if [ "${grace_enabled}" != "disabled" -a -n "${unit}" -a \
"${shutdown}" = "true" -a "${reboot}" != "true" ]; then
echo "Sending SIGUSR1 to ${unit}..."
systemctl kill -s SIGUSR1 "${unit}"
echo "Stopping ${unit}..."
systemctl stop "${unit}"
else
echo "mesos will not be shut down gracefully. enabled=\"${grace_enabled}\", role=\"${role}\", reboot=\"${reboot}\", shutdown=\"${shutdown}\""
fi
[Unit]
Description=Put mesos agent in maintenance mode before system shutdown.
Wants=dcos.target
# start up after dcos.target. systemd shutdown order is the reverse of the startup order,
# so ExecStart will run while dcos is up.
After=dcos.target
# don't run this on master nodes.
ConditionPathExists=!/opt/mesosphere/etc/roles/master
[Service]
Type=oneshot
KillMode=none
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/usr/local/bin/mesos-graceful-shutdown
[Install]
WantedBy=multi-user.target
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment