Get an Email and a text message for any of these monitors failing.
Also, download the New Relic app on your phone so you can see everything on the go.
- Simple Browser synthetic looking for
eth_connectivity 1
on metrics page to prove the node is online and connected to an Ethereum endpoint - Simple Browser synthetic looking for my operator eth address on the diagnostics page
I have installed the New Relic infrastructure agent on both of my Linux nodes. Plan is to do some log shipping via the agent eventually back to New Relic when I get some time
- Node CPU above 90% for 5 minutes
- Node Memory above 90% for 5 minutes
- Node Disk Used above 80% for 20 minutes
- Node Not Responding
Have @mutedtommy's Grafana dashboard and monitoring setup in place. Check his Medium Post https://medium.com/@hr12rtk/keep-random-beacon-node-monitoring-grafana-prometheus-and-loki-4a4b669b31ea about how to set this up. He also recently published a script to automatically set this up for you to ease the pain points. Make sure your firewall rules are correct!
You can see the port mapping that I am doing to expose the Metrics and Diagnostics to the New Relic endpoints. Some trickery to only allow certain endpoints to talk to my node via security groups too.
sudo docker run -dit \
--restart always \
--log-driver loki \
--log-opt loki-url="http://IP:3100/loki/api/v1/push" \
--volume $HOME/keep-client:/mnt \
--env KEEP_ETHEREUM_PASSWORD=$KEEP_CLIENT_ETHEREUM_PASSWORD \
--env LOG_LEVEL=info \
--name kc \
-p 3919:3919 \ # node port
-p 8081:8080 \ # metrics
-p 8083:8082 \ # diagnostics
keepnetwork/keep-client:v1.3.0 --config /mnt/config/config.toml start
[Metrics]
Port = 8080
NetworkMetricsTick = 60
EthereumMetricsTick = 600
[Diagnostics]
Port = 8082
I am using https://buidlhub.com/ to monitor my Operator address to ensure I have enough ETH in there to cover operating costs. Alerts me when I have less than 1 ETH in there via email.
I also have my operator wallet setup in Etherscan to email me on transactions. If your node is involved in any work, you will get an email that 0 wei has been sent from your wallet as it calls the smart contract functions.
I am currently taking a snapshot of my EC2 instance daily and keeping only the latest one to save money. This is so I can easily spin it back up if something catastrophic happens. Also, and make sure you're at least doing this, backup your ~/keep-ecdsa/persistence
directory. With a backup of this directory, you can recreate your node and be good to go. I am using a cronjob on my machine to sync my persistence
directory off to S3 storage. This cronjob runs every 2 hours.
This is my cronjob that I setup by running crontab -e
0 */2 * * * /home/ubuntu/s3backup.sh
#!/bin/bash
aws s3 sync ~/keep-ecdsa/persistence/ s3://$BUCKETNAME --delete
echo "$(date) s3 backup job ran successfully" >> /home/ubuntu/persistence_s3_copy.log