Skip to content

Instantly share code, notes, and snippets.

@mrchrisadams
Last active November 29, 2023 12:50
Show Gist options
  • Save mrchrisadams/38a5b74bb5517fc15d113c2d15ab7ef6 to your computer and use it in GitHub Desktop.
Save mrchrisadams/38a5b74bb5517fc15d113c2d15ab7ef6 to your computer and use it in GitHub Desktop.
How to do application level scale to zero with Gunicorn

What's this?

This is an experiment with making gunicorn gracefully scale down to zero after X seconds, as a way to do application-level scale-to-zero behaviour in applications that use a webserver like Gunicorn. The idea here is that you do not need to mess too much internal logic of an existing application, nor put it in a container if you use this.

Instead you use the web server's own support for handling SIGTERM signals, to allow gracefully scaling down of processes when they are not in use.

If you're using Linux to run a server, the chances of Systemd being used to manage your processes is fairly high, as it's the default option for a number of linux distributions now.

It also means you might not need a complicated "serverless" system to orchestrate scaling up and down, to reclaim memory on a server for us in other tasks if you have a website or webservice that isn't continually serving traffic.

This is nice, but how would you spin up processes as they come in?

You would typically combine this with something like Systemd's existing wake on socket request functionality to spin up processes as soon as traffic is detected on the port that Systemd is listening to.

You'd rely on Systemd to wake gunicorn back up when there is new inbound traffic.

This is nice, but how would you spin down processes after they're no longer needed?

You would use a config like the one below to tell gunicorn to gracefully exit after a given period time with zero traffic to handle scaling down.

This is nice, but how do you scale up and down processes if you have just little bit of traffic, or larger surges of traffic?

The act of telling gunicorn to scale to different numbers of workers / threads beyond default number is an exercise left to the reader.

import logging
import os
import signal
import time
logger = logging.getLogger(__name__)
# increasing workers uses more RAM, but provides a simple model for scaling up resources
workers = os.getenv("GUNICORN_WORKERS", default=4)
# increasing threads saves RAM at the cost of using more CPU
threads = 1
SECONDS_TO_RUN_SERVER_TIL_GRACEFUL_EXIT = 10
def signal_handler(signum, frame):
"""Receive the SIGALRM signal warn"""
logger.info(f"Signal SIGALRM received at {time.ctime()}")
# dunno how this affects performance. is sleep() is blocking?
# you probably want this to be some form of async version if so
for num in range(10):
time.sleep(1)
countdown = SECONDS_TO_RUN_SERVER_TIL_GRACEFUL_EXIT - num
logger.warning(f"Telling workers to gracefully exit in {countdown}")
# send our SIGTERM signal to the Arbiter, which triggers a graceful
# exit for every worker servicing a request
os.kill(os.getpid(), signal.SIGTERM)
# set our handler when we receive a SIGALRM to the handler above
signal.signal(signal.SIGALRM, signal_handler)
def when_ready(server):
"""Begin the initial"""
logger.info(
f"Parent 'arbiter' with process id {server.pid} started. Will gracefully exit"
f" in {SECONDS_TO_RUN_SERVER_TIL_GRACEFUL_EXIT} seconds."
)
signal.alarm(SECONDS_TO_RUN_SERVER_TIL_GRACEFUL_EXIT)
def post_request(worker, req, environ, resp):
worker_process_id = worker.pid
parent_process_id = worker.ppid
logger.info(
f"Worker id: {worker_process_id} received a new inbound request."
"Resetting the countdown, by sending a SIGALARM signal."
)
# send an alarm signal to reset the countdown started
# by the parent gunicorn process in `when_ready`
os.kill(parent_process_id, signal.SIGALRM)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment