Skip to content

Instantly share code, notes, and snippets.

@Cryptophobia
Last active November 9, 2018 22:34
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Cryptophobia/447ecea7858d26141359b22161e70ee2 to your computer and use it in GitHub Desktop.
Save Cryptophobia/447ecea7858d26141359b22161e70ee2 to your computer and use it in GitHub Desktop.
preStop Kubernetes Lifecycle Resque Hook
#!/bin/bash
# This script gracefully stops resque workers by issuing the USR2
# signal to the resque-pool manager and then waits for the workers
# to be paused before exiting. Resque-pool master process relays
# the USR2 signal to the children.
#
# For reference:
# (1) https://github.com/nevans/resque-pool#signals
# (2) https://github.com/resque/resque/blob/master/lib/resque/worker.rb#L376-L378
#
# This script can serve as a preStop lifecycle hook in Kubernetes
#
# Written by: Anton Ouzounov <aouzounov@zuora.com>
#
# Version 1.0 - June 14th, 2017
# Version 1.1 - October 20, 2017
kill -USR2 1;
#if resque workers present, wait 10 seconds and check again
numWorkers=$(ps aux | grep -i '[p]rocessing' | wc -l);
while true; do
if [ "$numWorkers" -eq "0" ]; then
#exit when all resque workers are terminated
echo "Time to terminate this pod";
break;
else
echo "Not ready to exit, workers are processing, more sleepy sleepy.";
echo "$numWorkers -- number of workers processing.";
sleep 5;
kill -USR2 1;
numWorkers=$(ps aux | grep -i '[p]rocessing' | wc -l);
fi
done;
echo "Exiting from the bash script";
@Cryptophobia
Copy link
Author

I have removed let numWorkers-=1; as ps aux will count itself but sometime it will fail and we don't want that. Let's assume ps aux will at least not be able to count itself once.

@sunild
Copy link

sunild commented Oct 24, 2018

Thanks!! I was looking a simple way to do this, found some much longer script that turned me off on the whole process. The "new" way of signal handling (using TERM_CHILD=1 in Resque > 1.22) was not acceptable to me ... it interrupts immediately and expects you to handle the exception and re-queue the job. They offer a grace period, but that is only to specify how long you're willing to wait for cleanup.

@Cryptophobia
Copy link
Author

@sunild , for resque-pool master with workers in a k8s pod, you can specify pod.Spec.TerminationGracePeriodSeconds and tune this value in the spec section of the Kubernetes pod's manifest if you need a grace period.

Here is the documentation: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#soft-eviction-thresholds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment