Hot reloads with Gunicorn and Supervisor

2022-12-22

Background

A common way to hot reload an application served by gunicorn is to send the SIGHUP signal. At Carousell this is what we do for our Django application in order to pick up configuration changes. However this method introduces a small period where requests are not processed. It’s not an entirely graceful hot reload.

This post will describe how the gunicorn handles the SIGHUP signal and why it is not ideal for gracefully reloading an application. I describe a better way to perform hot reloads using the SIGUSR2 signal. For applications managed by supervisor, I provide a script that handles graceful hot reloads using SIGUSR2, and is a drop-in replacement for gunicorn.

How gunicorn handles SIGHUP

When we send SIGHUP to gunicorn, new requests will be blocked until the new worker processes are ready to serve requests. How long requests are blocked is determined by how long it takes your application to start up. For applications with low traffic or a fast initialization time, the small downtime during a hot reload may not be immediately noticeable.

In gunicorn’s signal handling documentation this is how SIGHUP is described:

HUP: Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers. If the application is not preloaded (using the –preload option), Gunicorn will also load the new version of it.

When “configuration” is mentioned, gunicorn is refering to its own configuration, not your application. Also, when gunicorn mentions graceful shutdown, it simply means that in-flight requests will complete. Graceful does not mean that there will be no application downtime.

To illustrate, this is what happens when gunicorn receives SIGHUP:

Step 1: Before receiving the signal, gunicorn has one master process. Requests go to the master process, which routes them to its workers.

gunicorn-sighup-handling-1.svg

Step 2: Sending SIGHUP to the master process will cause it to spawn new workers. Requests start getting routed to the new workers. The new workers are not yet initialized and so can’t process requests.

gunicorn-sighup-handling-2.svg

Step 3: Old workers are gracefully shut down and finish processing in-flight requests. No new requests are going to old workers. New requests are going to the new workers, which are still uninitialized. During this time no requests are being processed.

gunicorn-sighup-handling-3.svg

Step 4: After some amount of time, the new workers are initialized. Requests are now getting processed again.

gunicorn-sighup-handling-4.svg

Our application takes about 20 seconds to initialize and to start serving requests. During this period of 20 seconds, requests are blocked. From the caller’s perspective, APIs are observed to have high latency and the application appears to be unresponsive.

Graceful reloading with SIGUSR2

To properly gracefully reload an application without blocking requests, we can send the SIGUSR2 signal to gunicorn. SIGUSR2 is designed for upgrading the gunicorn binary, so we are misappropriating it a little here.

SIGUSR2 will cause gunicorn to spawn a new master process while leaving the old master process running. After a warm-up period, the new master process is up and serving requests, and we can terminate the old master process by sending SIGTERM to it.

To illustrate:

Step 1: Before receiving the signal, gunicorn has one master process.

gunicorn-sigusr2-1.svg

Step 2: Sending SIGUSR2 to the master process will cause it to spawn a new master process. The new master process starts to initialize its workers. Requests are still going to the old master process, so no requests are being blocked.

gunicorn-sigusr2-2.svg

Step 3: After the workers in the new master process are initialized, requests will start getting served by both masters.

gunicorn-sigusr2-3.svg

Step 4: Sending SIGTERM to the old master process will cause it to gracefully shut down its workers. The workers from the old master process will finish processing its in-flight requests. The old master process will exit once all workers have exited. At this point, only the new master remains.

gunicorn-sigusr2-4.svg

Notice that at no point in this flow did requests go to an uninitialized worker. No requests were blocked.

With supervisor

Unfortunately the SIGUSR2 approach to gracefully reloading applications does not work well with supervisor. The new gunicorn master process isn’t owned by supervisor. Therefore when the old master process is terminated, supervisor will attempt to restart gunicorn. This results in there being two master processes.

What we can do instead is wrap the gunicorn master processes with a script. We will then have Supervisor manage this wrapper script instead of gunicorn directly.

  1. The script will handle sending gunicorn the SIGUSR2 and SIGTERM signals to hot reload your application.
  2. The script will present itself to supervisor with a consistent PID to insulate supervisor from the changing PIDs of the currently-active gunicorn master process.

The full script can be found in the appendices. But first, let’s walk through how the script works.

In the main loop of the script, we start gunicorn if it hasn’t been started yet. If the gunicorn process is externally killed or otherwise does not exist, then we exit the script. When the script exits, supervisor will handle restarting it.

gunicorn_args=("$@")
gunicorn_pidfile="/run/gunicorn.pid"

function start_gunicorn() {
    log "Starting gunicorn"
    gunicorn "${gunicorn_args[@]}" &
}

function gunicorn_exists() {
    [ -f "$gunicorn_pidfile" ] && ps -p "$(cat "$gunicorn_pidfile")" &> /dev/null
}

# Start gunicorn if not yet started
if ! gunicorn_exists; then
    start_gunicorn
fi

# Loop to keep the script alive
while true; do
    sleep 5
    if ! gunicorn_exists; then
        # If somehow gunicorn has stopped, exit this script.
        exit 0
    fi
done

When the script receives SIGTERM, we propagate the signal to the gunicorn process and wait for it to exit, before the script itself exits.

trap shutdown SIGTERM

function log() {
    echo "[$(date --rfc-3339=seconds)] [gunicorn-wrapper] $1"
}

function shutdown() {
    if [ -f "$gunicorn_pidfile" ]; then
        pid=$(cat $gunicorn_pidfile)
        log "Shutting down. Sending SIGTERM to $pid"
        kill -s SIGTERM "$pid"
        wait_pid "$pid"
    fi
    exit
}

function wait_pid() {
    pid=$1
    tail --pid="$pid" -f /dev/null
}

When the script receives SIGHUP, we gracefully reload gunicorn using the SIGUSR2+SIGTERM approach. The new gunicorn master process is given 30 seconds to warm up. This warm up period depends on how long it takes your application to start.

trap queue_for_reload SIGHUP

should_reload=0

function queue_for_reload() {
    eval should_reload=1
}

function reload_gunicorn() {
    if [ ! -f "$gunicorn_pidfile" ]; then
        return
    fi

    old_gunicorn_pid=$(cat $gunicorn_pidfile)

    # If existing pid doesn't exist, do nothing
    if ! ps -p "$old_gunicorn_pid" &> /dev/null; then
        return
    fi

    # Signal gunicorn to fork the master process
    log "Sending SIGUSR2 to $old_gunicorn_pid"
    kill -s SIGUSR2 "$old_gunicorn_pid"

    # Give the new master process 30s to start up
    sleep 30

    # Gracefully terminate the old master process
    log "Sending SIGTERM to $old_gunicorn_pid"
    kill -s SIGTERM "$old_gunicorn_pid"
    wait_pid "$old_gunicorn_pid"
    log "Gunicorn pid $old_gunicorn_pid shutdown complete"
    sleep 2
    log "New gunicorn pid is $(cat $gunicorn_pidfile)"
}

while true; do
    sleep 5
    if [ "$should_reload" -ne "0" ]; then
        reload_gunicorn
        should_reload=0
    fi
done

The full source of the gunicorn-wrapper script can be found in the appendices. This script is a drop-in replacement for gunicorn in your supervisor config.

[program:app]
- command=gunicorn
+ command=gunicorn-wrapper
    --pid /run/gunicorn.pid
    --chdir=/opt/code
    wsgi:application 
stopsignal=TERM

You can then trigger a hot reload of your application by sending SIGHUP to the program managed by supervisor.

kill -s SIGHUP $(supervisorctl pid app)

In Carousell we trigger these hot reloads using consul-template whenever there is a configuration update. Our application will then reload and pick up the configuration changes.

Appendices

gunicorn-wrapper script

#!/bin/bash

trap queue_for_reload SIGHUP
trap shutdown SIGTERM

gunicorn_pidfile="/run/gunicorn.pid"
gunicorn_args=("$@")

should_reload=0

function log() {
    echo "[$(date --rfc-3339=seconds)] [gunicorn-wrapper] $1"
}

function shutdown() {
    if [ -f "$gunicorn_pidfile" ]; then
        pid=$(cat $gunicorn_pidfile)
        log "Shutting down. Sending SIGTERM to $pid"
        kill -s SIGTERM "$pid"
        wait_pid "$pid"
    fi
    exit
}

function queue_for_reload() {
    eval should_reload=1
}

function reload_gunicorn() {
    if [ ! -f "$gunicorn_pidfile" ]; then
        return
    fi

    old_gunicorn_pid=$(cat $gunicorn_pidfile)

    # If existing pid doesn't exist, do nothing
    if ! ps -p "$old_gunicorn_pid" &> /dev/null; then
        return
    fi

    # Signal gunicorn to fork the master process
    log "Sending SIGUSR2 to $old_gunicorn_pid"
    kill -s SIGUSR2 "$old_gunicorn_pid"

    # Give the new master process 30s to start up
    sleep 30

    # Gracefully terminate the old master process
    log "Sending SIGTERM to $old_gunicorn_pid"
    kill -s SIGTERM "$old_gunicorn_pid"
    wait_pid "$old_gunicorn_pid"
    log "Gunicorn pid $old_gunicorn_pid shutdown complete"
    sleep 2
    log "New gunicorn pid is $(cat $gunicorn_pidfile)"
}

function wait_pid() {
    pid=$1
    tail --pid="$pid" -f /dev/null
}

function start_gunicorn() {
    log "Starting gunicorn"
    gunicorn "${gunicorn_args[@]}" &
}

function gunicorn_exists() {
    [ -f "$gunicorn_pidfile" ] && ps -p "$(cat "$gunicorn_pidfile")" &> /dev/null
}


# Start gunicorn if not yet started
if ! gunicorn_exists; then
    start_gunicorn
fi

# Loop to keep the script alive
while true; do
    sleep 5
    if [ "$should_reload" -ne "0" ]; then
        reload_gunicorn
        should_reload=0
    elif ! gunicorn_exists; then
        # If somehow gunicorn has stopped, exit this script.
        exit 0
    fi
done