Skip to content

Instantly share code, notes, and snippets.

@itsouvalas
Last active August 27, 2021 11:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save itsouvalas/85e07992147165512be5b7aef0ebda68 to your computer and use it in GitHub Desktop.
Save itsouvalas/85e07992147165512be5b7aef0ebda68 to your computer and use it in GitHub Desktop.
502s the Why and How to address them

502s the Why and How to address them

Introduction

Great Scott! You made to kubernetes! You've successfully deployed your application on a kubernetes cluster and you start to understand why all this fuzz about it. Aftet having statically scaled your deployment to a number of replicas/pods, you feel adventurous enough to use the all mighty Horizontal Pod Autoscaler. I mean, don't take me wrong, there's no harm in using a static scale to a number of your choosing, on the other hand, one of the beauties of using a container-orchestration system such us kubernetes is the music like results that stems from kubernetes directing the lead violinist (one pod) as well as tens or hundreds of violins (x pods) in absolute harmony!

So, there you have it, you bought into that music parallelism, and you've instructed the conductor to use as many violins as she sees fit based on a set of criteria. You sit back and enjoy the concert, your see one pod running, and when the crescendo kicks in you see all the pods start working together in mesmerizing unity. All that beauty there made you wanna reward yourself with a nice beverage, so you jump to the next room to serve yourself (lets say) a cup of coffee. While you poor that delicious coffee and start thinking how to break the news to other music aficionados you suddenly hear a violin off tone. You calmly reassure yourself that this was a once off event, the violinist sneezed, I mean, it does happen. By the time you sit back on your chair, you see absolute chaos, several violins with cut cords and the audience screaming with despair. The once Fantasia like dream has turned into a Freddy Krueger's nightmare, and you need to wake up NOW!

Why & How

Hopefully this was all a dream, even a nightmare, as in, you are reading this while stress testing on a staging application. If not, you can skip to the gists although, I would suggest to set a static number of replicas through the deployment and avoid new deployments while you follow along.

Let me start by analyzing the issue itself in slightly more detail. The 502s are usually recorded right after:

upstream prematurely closed connection while reading response header from upstream

and can happen for either of the following two reasons:

  1. The worker is not gracefully killed
  2. The worker has been killed but the endpoint is still considered active

The worker is not gracefully killed

By default kubernetes will send a SIGTERM when a pod is scheduled for termination. A worker will not shutdown gracefully, again for two reasons:

a. Because it doesn't understand SIGTERM as a graceful shutdown signal.

Passenger for example considers SIGTERM as a signal which will cause the Rails application to exit immediately.

b. Because it never received SIGTERM in the first place.

Depending on the implementation, although it could understand SIGTERM, it might not receive that altogether due to another process taking PID 1 and not passing that down to its children

Both of the above can be addressed with dumb-init:

  • Update your Dockerfile to include dumb-init
RUN apt-get update && apt-get install -y imagemagick wget dumb-init
  • Edit your Procfile to prepend passenger with dumb-init
web: /usr/bin/dumb-init --rewrite 15:10 -- bundle exec passenger start -p 5000 --max-pool-size 2 --min-instances 2

The above solution will:

  • Map SIGTERM to SIGUSR1 that Passenger understands (SIGQUIT is another option if we take nginx into consideration)

  • Have dumb-init running as PID 1 which gracefully passes on the signals to its children

Although an elegant and to the point solution this doesn't address our second point which we will analyze below:

The worker has been killed but the endpoint is still considered active

By design there is a race condition in the way Kubernetes deprovisions pods. In a nutshell, When you terminate a Pod, removing the endpoint and the signal to the kubelet are issued at the same time.

race

One of the ways to address this in a raw Kubernetes implementation is to use a prestop hook. Given the abstraction layer introduced by the platform this is not possible in EYK, yet the same level of control can be achieved by using traps. In summary, a trap can jump in and take a series of actions (commands) that can give us complete control of the lifecycle of the process.

The fact that I mentioned dumb-init first wasn't due to alphabetical order, but because in our solution we will be implementing both simultaneously:

  1. Use dumb-init to make sure that it always takes PID 1 and passes the signals to its children
  2. Create an entrypoint.sh that uses trap to take a series of actions before it runs appcontrol.sh
  3. Create an appcontrol.sh that uses dumb-init to map 15:0 i.e. TERM to EXIT (another way of ignoring the signal altogether) and includes the application run command

The above solution requires the following:

  1. dumb-init package
  2. trap (already available in most distros)
  3. entrypoint.sh bash script
  4. appcontrol.sh bash script

Lets go down to specifics:

Dockerfile

Include dumb-init package

RUN apt-get update && apt-get install -y imagemagick wget dumb-init

Although on our implementation we rely on a Procfile to pass the process instructions, depending your use case you may add it on the Dockerfile:

ENTRYPOINT ["/usr/bin/dumb-init","--"]

From the above you can see that we are not changing the mapping here just yet.

Procfile

Here we will include both the use of dumb-init as well as the entrypoint.sh script:

web: /usr/bin/dumb-init -- ./script/entrypoint.sh

again, if no Procfile is used, you would like to add the CMD command on your Dockerfile:

CMD ./script/entrypoint.sh

script/entrypoint.sh

Here we are adding the trap followed by the appcontrol.sh script:

#!/bin/bash

trap "passenger-status; echo SIGTERM recieved - sleeping 30 seconds; sleep 30; echo Slept 30 Seconds - stopping Passenger; passenger stop --port 5000; exit 0" TERM
./script/appcontrol.sh

As per the trap options above, once the pod receives the signal TERM (i.e. SIGTERM default for docker/kubernetes scale down/rolling update process) it will issue the following serially:

  • passenger-status

that will display the well known to us status information to stdout and held on our logs

  • echo SIGTERM received - sleeping 30 seconds

An informational message that we have received SIGTERM

  • sleep 30

This is holding the next command for 30 seconds

  • echo Slept 30 Seconds - stopping Passenger

An informational message that we are about to stop passenger

  • passenger stop --port 5000

This is actually the command that stops passenger gracefully

  • exit 0

That's where we are exiting the trap

script/appcontrol.sh

Here we are using dumb-init to ignore SIGTERM (15:0) and start passenger the usual way:

/usr/bin/dumb-init --rewrite 15:0 -- bundle exec passenger start --port 5000

As you have noticed already, the solution is specific to Passenger. Yet, nothing is holding you to edit that to match whatever application server you might be running. By paying attention to the signals and introduce a number of commands that best suites your use case scenario, you can have complete control over the pod's lifecycle.

Important

In the pods lifecycle, Kubernetes waits for a specified time called the termination grace period. By default, this is 30 seconds. After that, it proceeds with a SIGKILL. In a raw kubernetes implementation, you can update the pod's yaml with the terminationGracePeriodSeconds to lets say 60. In EYK you can do that with an environment variable:

eyk config:set KUBERNETES_POD_TERMINATION_GRACE_PERIOD_SECONDS=60 -a appanme

In any case you would like terminationGracePeriodSeconds to be greater than the sleep xx command introduced earlier. Failing to do so will make all the changes discussed already moot.

Afterword

So you (hopefully) read through the solution once and you are already attempting to apply that on your implementation. If it doesn't work straight out of the box do not despair. You just passed on a number of new partitures to your violinists and your conductor. You might need to tweak things out to get to the harmony you are aiming for. Once there, sit back and relax and enjoy the concert!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment