Skip to content

Instantly share code, notes, and snippets.

@danielhanold
Created May 15, 2024 19:20
Show Gist options
  • Save danielhanold/8d03787c7830e8b5f6eb894dda92aa15 to your computer and use it in GitHub Desktop.
Save danielhanold/8d03787c7830e8b5f6eb894dda92aa15 to your computer and use it in GitHub Desktop.
No-downtime NGINX Kubernetes deployments
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-no-downtime
spec:
replicas: 1
selector:
matchLabels:
app: nginx-no-downtime
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: nginx-no-downtime
spec:
terminationGracePeriodSeconds: 30
containers:
- name: nginx-no-downtime
image: nginx:latest
# By default, Kubernetes sends a SIGTERM signal to a container when it's safe to terminate it,
# e.g. when another instance of this deployment becomes available during a rollout.
# NGINX will interpret this signal as a "fast shutdown", i.e. it will not gracefully shut down
# and will not wait until until any existing connections have finished.
# This will lead to dropped connections and "5xx" response codes.
# @see https://ubuntu.com/blog/avoiding-dropped-connections-in-nginx-containers-with-stopsignal-sigquit
#
# It could be assumed that a new instance of the Nginx deployment requires startupProbes or readinessProbes
# to determine if the new instance can receive traffic, however that assumption is incorrect.
# Instead, the issue is related to the existing instance of the Nginx deployment pre-maturely terminating
# without waiting for all connections to finish.
#
# We want NGINX to gracefully terminate and wait until any existing connections finish.
# To achieve this, we plug in to the "preStop" Kubernetes lifecycle event, which allows us to
# execute a command before Kubernetes sends the SIGTERM signal.
# For this event stage, we shut down NGINX gracefully ourselves. This will wait for up to
# the value of terminationGracePeriodSeconds (default = 30 seconds) and then continue
# with the container termination. In our testing, this has lead to no dropped connections
# during load testing & triggering a deployment event for a deployment with 1 replica.
# @see https://nginx.org/en/docs/control.html
# @see https://medium.com/inside-personio/graceful-shutdown-of-fpm-and-nginx-in-kubernetes-f362369dff22
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- kill -s QUIT $(cat /var/run/nginx.pid)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment