Skip to content

Instantly share code, notes, and snippets.

@mmerickel
Last active February 11, 2022 19:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mmerickel/a2159c51d7a2486b9ac7057fa6b69139 to your computer and use it in GitHub Desktop.
Save mmerickel/a2159c51d7a2486b9ac7057fa6b69139 to your computer and use it in GitHub Desktop.
istio-launcher for implementing graceful shutdown and one-off job termination
apiVersion: apps/v1
kind: Deployment
metadata:
name: foo
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
annotations:
proxy.istio.io/config: |
# since we are always waiting for the app to shutdown first
# we have no need to drain - because the app is done by the
# time that this matters and it simply slows down the shutdown
# of the pod - the original default is 5s
terminationDrainDuration: 0s
# not strictly necessary but if you want to guarantee that
# the proxy is started before your app establishes connections
# then here ya go
holdApplicationUntilProxyStarts: true
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
image: myapp/myapp
- name: istio-proxy
image: auto
lifecycle:
# see docs in istio-launcher.py explaining this protocol
preStop:
exec:
command:
- /bin/sh
- -c
- |
echo "[prestop] listening for shutdown" >> /proc/1/fd/2;
curl -X POST localhost:15000/drain_listeners?inboundonly 2>&1 > /dev/null;
nc localhost -l 15123 2>&1 > /dev/null;
echo "[prestop] success" >> /proc/1/fd/2;
FROM ...
COPY istio-launcher.py /app/istio-launcher.py
ENTRYPOINT ["tini", "--", "/app/istio-launcher.py"]
#!/usr/bin/env python3
"""
Istio/K8S has issues with sidecars such that they do not properly kill
themselves when the primary workload is complete inside of a pod, thus
causing the entire pod to stay alive after the job should be complete
see https://github.com/istio/istio/issues/6324
There are two scenarios we want to account for in this script:
1. When running as a one-off job, istio/k8s does not kill the istio-proxy
sidecar when the app is complete. From this script we can ping the
/quitquitquit endpoint to tell the sidecar to exit.
2. When K8S issues a graceful shutdown it sends a SIGTERM to both the
istio-proxy sidecar and the app. In our case the app handles graceful
shutdown correctly, so we want it to have the entire
terminationGracePeriodSeconds to try to shutdown.
The istio-proxy sidecar should do 2 things here:
- Stop incoming connections.
- Wait until the app is complete and exit when it is.
To do this, we use the preStop hook on the pod to block the SIGTERM.
The hook opens a listener, which we can hit when we are done.
You might ask why we don't just use the /quitquitquit endpoint all the
time and instead rely on this listener where we can. The problem is that
when we hit /quitquitquit while the preStop is blocking, the container
dies without it finishing and emits a warning level K8S event on graceful
shutdown.
"""
import os
import signal
import socket
import subprocess
import sys
ISTIO_CLEANUP = '''
curl --silent --show-error -o /dev/null -X POST http://localhost:15020/quitquitquit
'''.split()
FWD_SIGS = {
signal.SIGTERM,
signal.SIGINT,
signal.SIGHUP,
signal.SIGUSR1,
signal.SIGUSR2,
}
def log(msg):
print(f'[istio-launcher] {msg}', file=sys.stderr)
dbg = log if os.getenv('ISTIO_LAUNCHER_VERBOSE') == '1' else lambda msg: None
# this queuedsigs is a slightly kludgy hack to capture any signals that arrive
# between the time we set up the signal handlers and the time the child process starts.
# A better approach would be to use signal.pthread_sigmask to temporarily block the
# signals while we get set up; the problem there is the child process inherits the
# blocked signal mask, and subprocess.Popen has no option to reset the signal mask
# in the child. So we'd have to roll our own implementation of Popen that resets the
# signal mask in the child process.
queuedsigs = set()
childpid = None
exitval = -120
def sighandler(sig, frame):
if childpid is None:
queuedsigs.add(sig)
else:
dbg(f'forwarding signal={sig}')
os.kill(childpid, sig)
try:
for s in FWD_SIGS:
signal.signal(s, sighandler)
dbg(f'launching subprocess, argv={sys.argv[1:]}')
proc = subprocess.Popen(sys.argv[1:])
childpid = proc.pid
# log(f'child pid={proc.pid}')
for s in queuedsigs:
dbg(f'forwarding queued signal={s}')
os.kill(childpid, s)
# wait for the program to finish
proc.wait()
exitval = proc.returncode
dbg(f'child finished, rc={exitval}')
for s in FWD_SIGS:
signal.signal(s, signal.SIG_DFL)
finally:
try:
# in graceful shutdown the sidecar has a listener on 15123 waiting for
# a connection - which will allow the sidecar to continue shutting down
# after we hit it notifying them that we are done with our work
dbg('sending graceful shutdown to istio-proxy')
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect(('localhost', 15123))
dbg('successfully sent graceful shutdown to istio-proxy')
except BaseException as ex:
dbg(f'graceful shutdown failed, err={ex}')
# explicitly do not worry if this was successful, we tried
p = subprocess.run(ISTIO_CLEANUP, check=False)
if p.returncode == 0:
dbg('successfully sent quitquitquit to istio-proxy')
else:
log(f'failed to send quitquitquit to istio-proxy, rc={p.returncode}')
sys.exit(exitval)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment