Skip to content

Instantly share code, notes, and snippets.

@rdev5
Last active July 4, 2018 11:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rdev5/2f74db90b47cc43d028bb39e6f5855ab to your computer and use it in GitHub Desktop.
Save rdev5/2f74db90b47cc43d028bb39e6f5855ab to your computer and use it in GitHub Desktop.
Performs UDP health check against a list of UDP services derived from NGINX stream configuration file (sample-streams.conf), reloading NGINX only when necessary.

NGINX UDP Health Check (PHP/BASH)

Status: PRODUCTION (5/22/18)

About

Performs UDP health check against a list of UDP services derived from NGINX stream configuration file, reloading NGINX only when necessary. This script is designed to be idempotent and preclude enabling/disabling servers that have already been effected in the stream configuration file specified.

Installation

Tip: Don't ever download and run scripts blindly or register them as crons, especially as root. Just because you can, doesn't mean you should!

Download (/usr/local/bin/udp-check)

# mkdir -p /usr/local/bin/udp-check && \

  curl -so /usr/local/bin/udp-check/udp-manage-all.sh 'https://gist.githubusercontent.com/rdev5/2f74db90b47cc43d028bb39e6f5855ab/raw/10_udp-manage-all.sh' && \
  curl -so /usr/local/bin/udp-check/udp-check.php 'https://gist.githubusercontent.com/rdev5/2f74db90b47cc43d028bb39e6f5855ab/raw/20_udp-check.php' && \
  
  chmod 0744 /usr/local/bin/udp-check/udp-manage-all.sh && \
  chmod 0744 /usr/local/bin/udp-check/udp-check.php

Setup Cron

Note: The UDP management script is designed to invoke itself every 10s to facilitate shorter health check intervals. See also CRON_INTERVAL_SECONDS.

# crontab -l > /tmp/.root-cron && \
  echo '* * * * * /usr/local/bin/udp-check/udp-manage-all.sh' >> /tmp/.root-cron && \
  crontab /tmp/.root-cron && \
  rm -f /tmp/.root-cron

Verify Installation

To verify proper installation, watch for bursts of ICMP traffic to/from your UDP upstream servers with tcpdump:

# /usr/sbin/tcpdump -n ip proto \\icmp

Development

This section describes script usage more in depth and may be used to bootstrap your own custom UDP monitoring solution.

Testing

The following test can be handcrafted using a copy of sample-streams.conf.

$ ./udp-check.php ./sample-streams.conf && echo '!! NGINX reload required' || echo 'No reload required at this time'
PHP Warning:  Retrying for lock (attempt #1)... in ./udp-check.php on line 128
PHP Warning:  Retrying for lock (attempt #1)... in ./udp-check.php on line 128
PHP Notice:  Conducting 3 UDP checks took 1 seconds.
 in ./udp-check.php on line 297
PHP Notice:  127.0.0.2:33033 went offline in ./udp-check.php on line 184
PHP Notice:  127.0.0.3:33033 came online in ./udp-check.php on line 184
!! NGINX reload required

$ cat ./streams.conf
server {
  listen 127.0.0.3:33033 udp;
  proxy_timeout 1s;
  proxy_pass localhost-33033;
}

upstream localhost-33033 {
  server 127.0.0.1:33033;
  # server 127.0.0.2:33033;
  server 127.0.0.3:33033;
}

Once you've verified the script behaves expectedly, you can try running it against a copy of your UDP upstream configuration files taken from NGINX.

The following example will:

  1. Iterate over all *.conf files in ./udp_upstreams
  2. Pipe each file to udp-check.php in the background (concurrent), redirecting notices and warnings to /dev/null
  3. Recommend appropriate NGINX reload action per configuration file based on any status changes detected
$ cp -r /etc/nginx/udp_upstreams .
$ find udp_upstreams/ -type f -name "*.conf" | xargs -I {} sh -c '((./udp-check.php {} &>/dev/null && echo "NGINX reload required for {}" || echo "No reload required for {}") &)'
No reload required for udp_upstreams/appClusterA-33033.conf
No reload required for udp_upstreams/appClusterB-33034.conf
NGINX reload required for udp_upstreams/appClusterC-33035.conf
No reload required for udp_upstreams/appClusterD-33036.conf

Tip: Watch the ICMP traffic via /usr/sbin/tcpdump ip proto \\icmp!

Exit Codes

  • 0 = Change in server health detected; NGINX should reload
  • 1 = Invalid usage, fork failure
  • 2 = No changes require reloading at this time

Production

To deploy in production, simply replace the find target directory (i.e. /etc/nginx/udp_upstreams), wait for all responses to come in, and perform a one-time NGINX reload.

An example script (udp-manage-all.sh) has been created to demonstrate this but requires the UPSTREAM_GLOB variable to be set appropriately.

Example:

# ((tail -f /var/log/messages | grep -i nginx) &)
# ./udp-manage-all.sh
Reload required for ./udp_upstreams/appClusterC-33035.conf
May 21 15:36:03: udp-manage-all.sh: UDP check detected change in health state for one or more servers. Reloading NGINX...
Reloading NGINX
May 21 15:36:03: Reloading The nginx HTTP and reverse proxy server.
May 21 15:36:03: Reloaded The nginx HTTP and reverse proxy server.
redirecting to systemctl reload nginx.service

Note: This script is being run as root to obtain enough privileges for reloading nginx.service.

License: MIT

Copyright (c) 2018 Matt Borja

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

#!/bin/sh
# Simply point UPSTREAM_GLOB to your UDP streams directory and fire away!
UPSTREAM_GLOB=/etc/nginx/udp_upstreams/*.conf
CRON_INTERVAL_SECONDS=10
UDP_CHECK_BIN="/usr/local/bin/udp-check/udp-check.php"
RELOAD_COMMAND="sh /etc/init.d/nginx reload"
RELOAD_SIGNAL=/dev/shm/nginx.reload
# Cron Manager (support for < 1-minute granularity with cron using sleep)
SLEEP_VALUE="$1"
if [ -z "$SLEEP_VALUE" ]; then
$0 0
for i in $(seq 1 5); do
$0 "$CRON_INTERVAL_SECONDS"
done
exit 0;
fi
# Validate sleep value
case "$SLEEP_VALUE" in
''|*[!0-9]*)
echo "Invalid sleep value (must be numeric)"
exit 1
;;
esac
sleep "$SLEEP_VALUE"
for CONF in $UPSTREAM_GLOB; do
("$UDP_CHECK_BIN" "$CONF" &>/dev/null && touch $RELOAD_SIGNAL && echo "Reload required for $CONF") &
done
wait
if [ ! -f "$RELOAD_SIGNAL" ]; then
echo "No reload required at this time. Exiting normally..."
else
logger "$0: UDP check detected change in health state for one or more servers. Reloading NGINX..."
echo "Reloading NGINX"
eval "$RELOAD_COMMAND"
fi
rm -f "$RELOAD_SIGNAL"
#!/usr/bin/php
<?php
/*
# Status: STAGING (5/17/18)
#
# About:
# Performs UDP health check against a list of UDP services derived from NGINX stream
# configuration file (sample-streams.conf), reloading NGINX only when necessary. This
# script is designed to be idempotent and preclude enabling/disabling servers that have
# already been effected in the stream configuration file specified.
#
# Exit codes:
# - 0 = Change in server health detected; NGINX should reload
# - 1 = Invalid usage, fork failure
# - 2 = No changes require reloading at this time
#
# License: MIT
# Copyright (c) 2018 Matt Borja
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
const FLOCK_RETRY_WAIT = 1;
const FLOCK_RETRY_MAX = 6;
const OUT_CSV_READ_LENGTH = 1024;
const OUT_CSV_DELIMITER = ',';
const OUT_CSV_FIELD_COUNT = 2;
const OUT_CSV_FIELD_SERVICE_INDEX = 0;
const OUT_CSV_FIELD_STATUS_INDEX = 1;
const SERVICE_ENUMERATE_REGEX = '/(#*)\s*server\s+([^:]+):(\d+)/';
const SERVICE_ENUMERATE_CONFIG_INDEX = 1;
const SERVICE_ENUMERATE_HOST_INDEX = 2;
const SERVICE_ENUMERATE_PORT_INDEX = 3;
// Patterns to reverse
const SERVICE_ENABLE_REGEX = '/(#*)\s*server(\s+)/';
const SERVICE_ENABLE_WITH = 'server$2';
const SERVICE_DISABLE_REGEX = '/\bserver(\s+)/';
const SERVICE_DISABLE_WITH = '# server$1';
const EXIT_SKIP_RELOAD = 2;
const EXIT_SIGNAL_RELOAD = 0;
function icmp($host, $count = 2) {
if (strpos($host, ':') !== false)
$host = explode(':', $host)[0];
$cmd = sprintf('ping -c %d %s &>/dev/null', (int)$count, escapeshellarg($host));
system($cmd, $exit);
return $exit;
}
function udp_check($service) {
return icmp($service);
}
// Returns associative array of services ready for populating with results of health checks
function readServicesFromFile($filename) {
lock($fh, $filename, 'r');
$services = array();
while (($line = fgets($fh)) !== false) {
if (!preg_match(SERVICE_ENUMERATE_REGEX, $line, $matches))
continue;
// Stack: services[0] = array(host, port)
$service = sprintf('%s:%d', $matches[SERVICE_ENUMERATE_HOST_INDEX], (int)$matches[SERVICE_ENUMERATE_PORT_INDEX]);
$services[$service] = array(
'status' => 'offline',
'config' => ($matches[SERVICE_ENUMERATE_CONFIG_INDEX] === '#') ? 'disabled' : 'enabled',
'action' => 'unchanged'
);
}
unlock($fh);
return $services;
}
// Returns file handle exclusively locked
function lock(&$fh, $filename, $mode = 'a') {
$fh = fopen(realpath($filename), $mode);
if (!$fh)
die ('Failed to open file for locking: ' . $filename);
for ($i = 1; $i <= FLOCK_RETRY_MAX; $i++) {
if (flock($fh, LOCK_EX | LOCK_NB))
return $fh;
if ($i >= FLOCK_RETRY_MAX)
die(sprintf("Failed to acquire exclusive lock on %s after %d attempts.\n", $filename, $i));
trigger_error("Retrying for lock (attempt #${i})...", E_USER_WARNING);
sleep(FLOCK_RETRY_WAIT);
}
}
function unlock($fh) {
flock($fh, LOCK_UN);
fclose($fh);
}
// Writes result of udp_check to file before releasing exclusive lock
// Returns result of udp_check
function accumulateResults($service, $outFile) {
$result = udp_check($service);
lock($fh, $outFile);
fwrite($fh, $service . ',' . $result . "\n");
fflush($fh);
unlock($fh);
return $result;
}
// Evaluates health check status against current configuration, returning array of services requiring update
// Note: $services parameter will be updated with appropriate config and actions consistent with return value
function filterServicesForUpdate(&$services, $outFile) {
lock($fh, $outFile, 'r');
$filtered = array();
while (($data = fgetcsv($fh, OUT_CSV_READ_LENGTH, OUT_CSV_DELIMITER)) !== FALSE) {
if (count($data) !== OUT_CSV_FIELD_COUNT)
continue;
$service = $data[OUT_CSV_FIELD_SERVICE_INDEX];
$status = $data[OUT_CSV_FIELD_STATUS_INDEX];
if (!array_key_exists($service, $services)) {
trigger_error(sprintf("[WARN] Invalid service entry found in %s: %s", $outFile, $service), E_USER_WARNING);
continue;
}
$services[$service]['status'] = ($status === '0') ? 'online' : 'offline';
$dirty = false;
switch ($services[$service]['status']) {
case 'online':
if ($services[$service]['config'] === 'disabled') {
$services[$service]['config'] = 'enabled';
$services[$service]['action'] = 'update';
$dirty = true;
trigger_error(sprintf("%s came online", $service), E_USER_NOTICE);
}
break;
case 'offline':
if ($services[$service]['config'] === 'enabled') {
$services[$service]['config'] = 'disabled';
$services[$service]['action'] = 'update';
$dirty = true;
trigger_error(sprintf("%s went offline", $service), E_USER_NOTICE);
}
break;
default:
die('Unexpected service status: ' . $services[$service]['status']);
}
if ($dirty)
$filtered[$service] = $services[$service];
}
unlock($fh);
return $filtered;
}
// Returns exit code signaling NGINX to reload if required for changes to take effect
function finalize(&$services, $conf) {
$filtered = filterServicesForUpdate($services, getOutFile($conf));
if (count($filtered) === 0) {
// trigger_error("No changes in server status detected at this time. Quitting...", E_USER_NOTICE);
return EXIT_SKIP_RELOAD;
}
// File may have been changed externally. We do not want to assume it is safe to reload NGINX at this point unless $reload is fully validated
$reload = false;
$output = array();
lock($fh, $conf, 'r+');
while (($line = fgets($fh)) !== false) {
array_push($output, $line);
if (!preg_match(SERVICE_ENUMERATE_REGEX, $line, $matches))
continue;
// Ignore service entries not processed in this run
$service = sprintf('%s:%d', $matches[SERVICE_ENUMERATE_HOST_INDEX], (int)$matches[SERVICE_ENUMERATE_PORT_INDEX]);
if (!array_key_exists($service, $filtered))
continue;
$lastLine = count($output) - 1;
switch ($filtered[$service]['status']) {
case 'online':
$output[$lastLine] = preg_replace(SERVICE_ENABLE_REGEX, SERVICE_ENABLE_WITH, $output[$lastLine]);
$reload = true;
break;
case 'offline':
$output[$lastLine] = preg_replace(SERVICE_DISABLE_REGEX, SERVICE_DISABLE_WITH, $output[$lastLine]);
$reload = true;
break;
default:
die('Unexpected service status: ' . $services[$service]['status']);
}
}
$render = join('', $output);
rewind($fh);
fwrite($fh, $render);
ftruncate($fh, strlen($render));
fflush($fh);
unlock($fh);
return $reload ? EXIT_SIGNAL_RELOAD : EXIT_SKIP_RELOAD;
}
function getOutFile($conf) {
return getcwd() . '/.' . basename($conf) . '.tmp';
}
function main() {
global $argc, $argv;
if ($argc !== 2) {
trigger_error("Usage: ${argv[0]} <streams.conf>", E_USER_ERROR);
exit(1);
}
$conf = $argv[1];
$outFile = getOutFile($conf);
system('echo > ' . escapeshellarg($outFile));
$services = readServicesFromFile($conf);
$t = microtime(true);
// Async
foreach ($services as $service => $result) {
$pid = pcntl_fork();
if ($pid === -1) {
trigger_error("Failed to fork. Exiting...", E_USER_ERROR);
exit(1);
}
// Child
if ($pid === 0) {
accumulateResults($service, $outFile);
exit();
}
}
// Await
while (pcntl_waitpid(0, $status) !== -1);
trigger_error(sprintf("Conducting %d UDP checks took %d seconds.\n", count($services), microtime(true)-$t), E_USER_NOTICE);
// Finalize
$exit = finalize($services, $conf); // Report
unlink($outFile);
exit($exit);
}
main();
server {
listen 127.0.0.3:33033 udp;
proxy_timeout 1s;
proxy_pass localhost-33033;
}
upstream localhost-33033 {
server 127.0.0.1:33033;
server 127.0.0.2:33033;
# server 127.0.0.3:33033;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment