Skip to content

Instantly share code, notes, and snippets.

@nathanielvarona
Last active January 13, 2024 06:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save nathanielvarona/31a2eca04fb5290823a8dc522c2c1cb1 to your computer and use it in GitHub Desktop.
Save nathanielvarona/31a2eca04fb5290823a8dc522c2c1cb1 to your computer and use it in GitHub Desktop.
Airflow Scheduler Health Checks and Auto Restart Scheduler if Unhealthy
# /etc/systemd/system/airflow-scheduler-health.service
# Description: A simple SystemD process manager that execute the health check scripts.
# To monitor: `sudo journalctl -u airflow-scheduler-health -f`
# To investigate unhealthy periods: `tail -f /var/log/airflow/scheduler/unhealthy-periods.log`
[Unit]
Description=Airflow Scheduler Health Checks
[Service]
User=user
Group=group
ExecStart=/bin/bash /usr/local/bin/airflow-scheduler-health.sh
# /etc/systemd/system/airflow-scheduler-health.timer
# Discription: A simple SystemD timer that executes the `airflow-scheduler-health.service` periodically.
# To enabled: `sudo systemctl enable --now airflow-scheduler-health.timer`
# To disable: `sudo systemctl disable airflow-scheduler-health.timer`
[Unit]
Description=Airflow Scheduler Health Checks every 10 seconds
[Timer]
OnBootSec=10
OnUnitActiveSec=10
AccuracySec=1ms
[Install]
WantedBy=timers.target
#!/bin/bash
# /usr/local/bin/airflow-scheduler-health.sh
# Description: A simple shell scripts that monitor scheduler from the Airflow platform health endpoint.
function airflow_scheduler_health {
local res=$(curl -fsSL -w "%{http_code}" https://airflow.tld/health)
local body=${res::-3}
local status=$(printf "%s" "$res" | tail -c 3)
if [ "$status" -ne "200" ]; then
echo "error"
exit
fi
echo $body | jq -r .scheduler.status
}
scheduler_status=$(airflow_scheduler_health)
if [ -n $scheduler_status ]; then
if [ $scheduler_status == 'unhealthy' ]; then
date >> /var/log/airflow/scheduler/unhealthy-periods.log
echo "Airflow scheduler is unhealthy, restarting..."
supervisorctl restart airflow-scheduler
# systemctl restart airflow-scheduler
sleep 1
elif [ $scheduler_status == 'healthy' ]; then
echo "Airflow scheduler is healthy..."
:
elif [ $scheduler_status == 'error' ]; then
echo "Airflow health endpoint error..."
:
fi
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment