Skip to content

Instantly share code, notes, and snippets.

@afmsavage
Last active September 29, 2020 06:15
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save afmsavage/7c8a9ccf085bedbc0a2880472a9ef984 to your computer and use it in GitHub Desktop.
Save afmsavage/7c8a9ccf085bedbc0a2880472a9ef984 to your computer and use it in GitHub Desktop.
Mainnet Keep Monitoring for ECDSA and Random Beacon

Monitoring for ECDSA and Random Beacon

New Relic

Get an Email and a text message for any of these monitors failing.

Also, download the New Relic app on your phone so you can see everything on the go.

Synthetics

  • Simple Browser synthetic looking for eth_connectivity 1 on metrics page to prove the node is online and connected to an Ethereum endpoint
  • Simple Browser synthetic looking for my operator eth address on the diagnostics page

Infrastructure Agent

I have installed the New Relic infrastructure agent on both of my Linux nodes. Plan is to do some log shipping via the agent eventually back to New Relic when I get some time

  • Node CPU above 90% for 5 minutes
  • Node Memory above 90% for 5 minutes
  • Node Disk Used above 80% for 20 minutes
  • Node Not Responding

Grafana

Have @mutedtommy's Grafana dashboard and monitoring setup in place. Check his Medium Post https://medium.com/@hr12rtk/keep-random-beacon-node-monitoring-grafana-prometheus-and-loki-4a4b669b31ea about how to set this up. He also recently published a script to automatically set this up for you to ease the pain points. Make sure your firewall rules are correct!

Run commands

Random Beacon Run CMD

You can see the port mapping that I am doing to expose the Metrics and Diagnostics to the New Relic endpoints. Some trickery to only allow certain endpoints to talk to my node via security groups too.

sudo docker run -dit \
--restart always \
--log-driver loki \
--log-opt loki-url="http://IP:3100/loki/api/v1/push" \
--volume $HOME/keep-client:/mnt \
--env KEEP_ETHEREUM_PASSWORD=$KEEP_CLIENT_ETHEREUM_PASSWORD \
--env LOG_LEVEL=info \
--name kc \
-p 3919:3919 \ # node port
-p 8081:8080 \ # metrics
-p 8083:8082 \ # diagnostics
keepnetwork/keep-client:v1.3.0 --config /mnt/config/config.toml start

config.toml example

[Metrics]
    Port = 8080
    NetworkMetricsTick = 60
    EthereumMetricsTick = 600

[Diagnostics]
    Port = 8082

Wallet Monitoring

I am using https://buidlhub.com/ to monitor my Operator address to ensure I have enough ETH in there to cover operating costs. Alerts me when I have less than 1 ETH in there via email.

I also have my operator wallet setup in Etherscan to email me on transactions. If your node is involved in any work, you will get an email that 0 wei has been sent from your wallet as it calls the smart contract functions.

Backups

I am currently taking a snapshot of my EC2 instance daily and keeping only the latest one to save money. This is so I can easily spin it back up if something catastrophic happens. Also, and make sure you're at least doing this, backup your ~/keep-ecdsa/persistence directory. With a backup of this directory, you can recreate your node and be good to go. I am using a cronjob on my machine to sync my persistence directory off to S3 storage. This cronjob runs every 2 hours.

This is my cronjob that I setup by running crontab -e
0 */2 * * * /home/ubuntu/s3backup.sh

#!/bin/bash
aws s3 sync ~/keep-ecdsa/persistence/ s3://$BUCKETNAME --delete
echo "$(date) s3 backup job ran successfully" >> /home/ubuntu/persistence_s3_copy.log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment