ogarrett/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Overview

Requirement is for NGINX Plus to back off and stop sending new connections to an upstream node
if the network utilization for that node exceeds a given threshhold.
Strategy

Create a simple HTTP-accessible script that runs on each upstream node.  Script returns 200 OK (HTTP status code) if
node is not overloaded, and 503 Too Busy if node is overloaded.
Use the script as the target for an NGINX Plus health check
Implementation

Running scripts from NGINX alone is not possible, as NGINX does not provide CGI or a similar application platform.  We
don't want the complexity of installing php, python or any other app platform on our upstream servers, so we'll use a simple
HTTP responder (loadtest.sh) written in bash and running from systemd.  You can of course adapt/port the loadtest.sh script
to php, python etc, according to what can be run on the upstream server.
The script runs off port 8099 (for example), and returns status accordingly:
curl -D - http://dev0:8099/

HTTP/1.0 200 OK
Content-Type: text/plain
Connection: close

HTTP Status: 200 OK

Transfer counter 13088764457 to 13088764457 bytes
Timer 1610462202357 to 1610462204365 ms

Bytes transferred = 0 bytes over time 2008 milliseconds

Bandwidth = 0 Mbits

"Failure" output, when current bandwidth exceeds limit:
curl -D - http://dev0:8099/

HTTP/1.0 503 Too Busy
Content-Type: text/plain
Connection: close

HTTP Status: 503 Too Busy

Transfer counter 13558896787 to 13814984187 bytes
Timer 1610462222625 to 1610462224639 ms

Bytes transferred = 256087400 bytes over time 2014 milliseconds

Bandwidth = 970 Mbits

Steps

Put loadtest.sh somewhere appropriate, such as /usr/local/bin.  Make executable.
Edit loadtest.sh to define the correct network interface to monitor, and to define bandwidth threshold.
Test loadtest.sh by writing an HTTP request to STDIN:
printf "GET /\r\n\r\n" | /usr/local/bin/loadtest.sh

HTTP/1.0 200 OK
Content-Type: text/plain
Connection: close

HTTP Status: 200 OK

Transfer counter 14170650531 to 14170650531 bytes
Timer 1610462694117 to 1610462696125 ms

Bytes transferred = 0 bytes over time 2008 milliseconds

Bandwidth = 0 Mbits

Configure systemd to run this script in response to a connection to port 8099.
File /etc/systemd/system/loadtest.socket:
[Unit]
Description=HTTP service for load testing health check

[Socket]
ListenStream=8099
Accept=yes

[Install]
WantedBy=sockets.target

File /etc/systemd/system/loadtest@.service:
[Unit]
Description=Load Test HTTP health check script

[Service]
ExecStart=-/usr/local/bin/loadtest.sh
StandardInput=socket
User=nginx
Group=nginx

Start the new service and test with web client:
systemctl start loadtest.socket

systemctl status loadtest.socket

● loadtest.socket - HTTP service for load testing health check
Loaded: loaded (/etc/systemd/system/loadtest.socket; enabled; vendor preset: enabled)
Active: active (listening) since Tue 2021-01-12 14:50:42 UTC; 4s ago
Listen: [::]:8099 (Stream)
Accepted: 1; Connected: 0;
Tasks: 0 (limit: 4620)
Memory: 52.0K
CGroup: /system.slice/loadtest.socket

Jan 12 14:50:42 dev0 systemd[1]: Listening on HTTP service for load testing health check.

curl -D - localhost:8099

Check /var/log/syslog for errors; for example, need to ensure that user:group nginx:nginx can access and run the script.
Testing

If you need to simulate high traffic, one approach is to scp a large file to /dev/null:
dd if=/dev/zero of=/tmp/1G bs=1M count=1024
scp /tmp/1G user@localhost:/dev/null

In this case, ensure that you monitor the localhost interface, using IF=lo in loadtest.sh
Use the NGINX Plus dashboard to view the real-time status of the health checks you've configured.

  
## loadtest.sh
#!/bin/bash

# first line is HTTP request, then HTTP headers
read request
while read header ; do
    [ "$header" == $'\r' ] && break
done

IF=enp0s3 # interface to monitor in /proc/net/dev
BW=500    # threshold bandwidth, in Mbits.  If write bandwidth on $IF exceeds this value, return 503

# IF=lo # uncomment if we want to monitor localhost for testing

# Take a look at /proc/net/dev to see how this works...
# Get bytes written (field 10) and current time (milliseconds), wait, then get sample again
B1=$( cat /proc/net/dev | grep $IF: | awk '{print $10}' )
T1=$(( $(date +%s%N) / 1000000 ))

sleep 2 # wait 2 seconds; sleep is not always accurate

B2=$( cat /proc/net/dev | grep $IF: | awk '{print $10}' )
T2=$(( $(date +%s%N) / 1000000 ))

BYTES_T=$(( $B2 - $B1 ))
TIME_MS=$(( $T2 - $T1 ))

BW_MBITS=$(( ( $BYTES_T * 1000 * 8 ) / ( $TIME_MS * 1024 * 1024 ) )) # note Integer arithemetic

STATUS="200 OK"
[[ $BW_MBITS -gt $BW ]] && STATUS="503 Too Busy"

printf "HTTP/1.0 $STATUS\r\n"
printf "Content-Type: text/plain\r\n"
printf "Connection: close\r\n"
printf "\r\n"

cat << EOM
HTTP Status: $STATUS

Transfer counter $B1 to $B2 bytes
Timer $T1 to $T2 ms

Bytes transferred = $BYTES_T bytes over time $TIME_MS milliseconds

Bandwidth = $BW_MBITS Mbits
EOM

## nginx-example.conf
# primary virtual server, listening on port 80 and load-balancing to the upstreams group
server {
  listen 80;

  location / {
    proxy_pass http://upstreams;
    status_zone status_page;

    # We'll probe HC script on :8099
    health_check port=8099;
  }

  # expose NGINX Plus API and dashboard (be aware of security implications)
  location /api {
    api;
  }
  location = /dashboard.html {
    root   /usr/share/nginx/html;
  }
}

upstream upstreams {
  zone backend 64k;
  server dev0:8080;  # test server, we just forward to :8080
}

# test server on :8080
server {
  listen 8080;

  location / {
    root /usr/share/nginx/html;
  }
}
	#!/bin/bash

	# first line is HTTP request, then HTTP headers
	read request
	while read header ; do
	[ "$header" == $'\r' ] && break
	done

	IF=enp0s3 # interface to monitor in /proc/net/dev
	BW=500 # threshold bandwidth, in Mbits. If write bandwidth on $IF exceeds this value, return 503

	# IF=lo # uncomment if we want to monitor localhost for testing

	# Take a look at /proc/net/dev to see how this works...
	# Get bytes written (field 10) and current time (milliseconds), wait, then get sample again
	B1=$( cat /proc/net/dev \| grep $IF: \| awk '{print $10}' )
	T1=$(( $(date +%s%N) / 1000000 ))

	sleep 2 # wait 2 seconds; sleep is not always accurate

	B2=$( cat /proc/net/dev \| grep $IF: \| awk '{print $10}' )
	T2=$(( $(date +%s%N) / 1000000 ))

	BYTES_T=$(( $B2 - $B1 ))
	TIME_MS=$(( $T2 - $T1 ))

	BW_MBITS=$(( ( $BYTES_T * 1000 * 8 ) / ( $TIME_MS * 1024 * 1024 ) )) # note Integer arithemetic

	STATUS="200 OK"
	[[ $BW_MBITS -gt $BW ]] && STATUS="503 Too Busy"

	printf "HTTP/1.0 $STATUS\r\n"
	printf "Content-Type: text/plain\r\n"
	printf "Connection: close\r\n"
	printf "\r\n"

	cat << EOM
	HTTP Status: $STATUS

	Transfer counter $B1 to $B2 bytes
	Timer $T1 to $T2 ms

	Bytes transferred = $BYTES_T bytes over time $TIME_MS milliseconds

	Bandwidth = $BW_MBITS Mbits
	EOM
	# primary virtual server, listening on port 80 and load-balancing to the upstreams group
	server {
	listen 80;

	location / {
	proxy_pass http://upstreams;
	status_zone status_page;

	# We'll probe HC script on :8099
	health_check port=8099;
	}

	# expose NGINX Plus API and dashboard (be aware of security implications)
	location /api {
	api;
	}
	location = /dashboard.html {
	root /usr/share/nginx/html;
	}
	}

	upstream upstreams {
	zone backend 64k;
	server dev0:8080; # test server, we just forward to :8080
	}

	# test server on :8080
	server {
	listen 8080;

	location / {
	root /usr/share/nginx/html;
	}
	}