Skip to content

Instantly share code, notes, and snippets.

@humantraffic
Forked from k0kk0k/README.md
Created December 7, 2021 07:43
Show Gist options
  • Save humantraffic/19d478380ee55cd17a0ad1c58fa3ea67 to your computer and use it in GitHub Desktop.
Save humantraffic/19d478380ee55cd17a0ad1c58fa3ea67 to your computer and use it in GitHub Desktop.
A simple utility for monitoring any Cosmos-SDK based node

Description

This bash script designed for monitoring the state of a node and automatically unjail validator and/or restarting node in the case of an unexpected node halt or lost of sync state.

Installation

  1. For keyring password create the file <pass.key> and write the password to it.
read -s -p "Enter password: " pass
echo $pass > pass.key
unset pass
  1. Modify file access to 400 (read only)
chmod 400 pass.key
  1. Set exec permission for this shell script
chmod +x watchdog.sh
  1. Add the script launch every 5 min to cron
crontab -e
*/5 * * * * $HOME/watchdog.sh >> $HOME/watchdog.log
  1. IMPORTANT!!! Your current user must have a sudo privileges and have sudo password request disabled for /bin/systemctl.
echo "$(whoami) ALL=(ALL) NOPASSWD: /bin/systemctl" | sudo tee -a /etc/sudoers > /dev/null

Configuration

Change the variables in the watchdog.sh according to your node settings.

# full path to client binary (NOT node binary)
binary_path=""

# name of the account that signs the transactions
account_name=""

# name of the service file
service_name=""

# max acceptable block production delay in seconds, 
# which will be considered as normal
allowed_delay=""

# RPC point for node (default tcp://127.0.0.1:26657
node=""

Logging

All restart and unjail events are written to log file $HOME/watchdog.log

#!/bin/bash
##### !!! set the variables according to your configuration !!! ######
binary_path="$HOME/go/bin/ag-cosmos-helper"
account_name="user"
service_name="ag-chain-cosmos.service"
allowed_delay=60
node="tcp://127.0.0.1:26657"
#################### end set vars ####################################
keyring=$(cat $HOME/pass.key)
chain_id=$($binary_path status --node $node 2>&1 | jq -r ."NodeInfo"."network")
validator=$(echo -e "$keyring" | $binary_path keys show $account_name --bech val | grep address | awk '{print $2}')
catching=$($binary_path status --node $node 2>&1 | jq -r ."SyncInfo"."catching_up")
if [[ $catching == "false" ]]
then
t_block=$(date -d $($binary_path status --node $node 2>&1 | jq -r ."SyncInfo"."latest_block_time") "+%s")
t_now=$(date "+%s")
jailed=$($binary_path q staking validator $validator --node $node -oj | jq ."jailed")
if [[ $jailed == "true" ]]
then
echo "$(date) : unjail"
echo -e "$keyring" | $binary_path tx slashing unjail --chain-id=$chain_id --node $node --gas=auto --gas-adjustment=1.4 --from $account_name -y
sleep 10
fi
if (( ((t_now - t_block)) > $allowed_delay ))
then
echo "$(date) : restart"
sudo systemctl restart $service_name
fi
fi
unset keyring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment