Skip to content

Instantly share code, notes, and snippets.

@AndrewWestberg
Last active March 18, 2024 20:10
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save AndrewWestberg/d982fb1304db36df8c484599180bd9e2 to your computer and use it in GitHub Desktop.
Save AndrewWestberg/d982fb1304db36df8c484599180bd9e2 to your computer and use it in GitHub Desktop.
# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').
#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h dom mon dow command
* * * * * /path/to/node-failover.sh >> /path/to/node-failover.log 2>&1
#!/bin/bash
credentials_kes_file_active=/path/to/kes.skey
credentials_vrf_file_active=/path/to/vrf.skey
credentials_opcert_file_active=/path/to/node.opcert
credentials_kes_file_standby=/path/to/kes.skey.standby
credentials_vrf_file_standby=/path/to/vrf.skey.standby
credentials_opcert_file_standby=/path/to/node.opcert.standby
service_pid=`systemctl show --property MainPID --value node.service`
# If we are leading/forging, this value from EKG will increase over a 3 second time period
leader_checks_1=`curl -H "Accept: application/json" http://127.0.0.1:14000 2>/dev/null | jq '.cardano.node.metrics.Forge."node-not-leader".int.val'`
sleep 3
leader_checks_2=`curl -H "Accept: application/json" http://127.0.0.1:14000 2>/dev/null | jq '.cardano.node.metrics.Forge."node-not-leader".int.val'`
if [[ $leader_checks_2 -gt $leader_checks_1 ]]
then
is_leading=1
#echo "ACTIVE mode. checking..."
else
is_leading=0
#echo "STANDBY mode. checking..."
fi
error=`/home/ubuntu/.cargo/bin/cncli ping --host relay0.mycardanopool.io --port 3001 | jq .status | grep error | wc -l`
if [[ $error -eq 1 ]]
then
#echo "relay0 error. check relay1..."
sleep 10
error=`/home/ubuntu/.cargo/bin/cncli ping --host relay1.mycardanopool.io --port 3001 | jq .status | grep error | wc -l`
if [[ $error -eq 1 ]]
then
if [[ $is_leading -eq 0 ]]
then
echo "$(date): Enter ACTIVE mode..."
cp -f $credentials_kes_file_standby $credentials_kes_file_active
cp -f $credentials_vrf_file_standby $credentials_vrf_file_active
cp -f $credentials_opcert_file_standby $credentials_opcert_file_active
kill -s HUP $service_pid
/usr/bin/mail -s "ACTIVE" notify@me.com <<< "FAILOVER to backup enabled"
fi
exit 0
fi
fi
if [[ $is_leading -eq 1 ]]
then
# We're currently forging, but we SHOULDN'T be.
echo "$(date): Return to STANDBY mode..."
mv -f $credentials_kes_file_active $credentials_kes_file_standby
mv -f $credentials_vrf_file_active $credentials_vrf_file_standby
mv -f $credentials_opcert_file_active $credentials_opcert_file_standby
kill -s HUP $service_pid
/usr/bin/mail -s "STANDBY" notify@me.com <<< "failover disabled"
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment