Create a gist now

Instantly share code, notes, and snippets.

@trick77 /rigcheck.sh
Last active Feb 23, 2018

What would you like to do?
Checks for crashed ethOS 1.2.9 mining processes
#!/bin/bash
#
# Auto-reboot script for ethOS 1.2.9
# https://trick77.com/auto-restart-crashed-mining-processes-ethos
#
# This script will automatically reboot the mining rig depending on the reported
# mining status in ethOS.
# - The script should be triggered every 15 minutes from a cron job.
# - May or may not work with other ethOS versions than indicated above.
#
# This script should only be used in more or less stable rigs. Do not use it on rigs that aren't properly
# fine tuned.
#
DRY_RUN=true # set this to false to enable auto-restart/reboot
LOG_FILE=/home/ethos/rigcheck.log
if [ "$EUID" != 0 ]
then echo "Please run as root or, if calling it from a console, use sudo $0"
exit
fi
if [ ${DRY_RUN} = true ]; then
echo "$(date) $0 running in DRY_RUN mode, auto-reboot not enabled!" | tee -a ${LOG_FILE}
fi
ALLOW=$(cat /opt/ethos/etc/allow.file)
if [ ${ALLOW} != 1 ]; then
echo "$(date) Miner not enabled, exiting $0..." | tee -a ${LOG_FILE}
exit 0
fi
if grep -q "gpu clock problem" /var/run/ethos/status.file; then
CRASHED=$(cat /var/run/ethos/crashed_gpus.file)
echo "$(date) GPU clock problem detected on GPU(s) ${CRASHED}, rebooting..." | tee -a ${LOG_FILE}
if [ ${DRY_RUN} = false ]; then
rm -f /var/run/ethos/crashed_gpus.file
/opt/ethos/bin/r
fi
elif grep -q "gpu crashed" /var/run/ethos/status.file; then
echo "$(date) GPU crash detected, rebooting..." | tee -a ${LOG_FILE}
if [ ${DRY_RUN} = false ]; then
rm -f /var/run/ethos/crashed_gpus.file
/opt/ethos/bin/r
fi
else
echo "Everything's fine, exiting..."
fi

rami84 commented Jan 24, 2018

Thanks for the great work
I just tried it now and it's working for me

yan-ts commented Jan 31, 2018

I found some bugs here - ill commit my changes whenever I have some time, hopefully this week.

Known Bugs:

  • '/opt/ethos/bin/r' doesn't always work as expected, some validations should be added to check whether the reboot was successful [ reboot is not being executed - probably ehtos bug]
  • sometimes the '/var/run/ethos/status.file' is not updated [due to ethos bug] and reporting that everything is ok, while there might be a problem with mem clock / mem state or miner hashes [no need to reboot in this case].
  • sometimes there is a 'gpu clock problem' in status.file, but shown due to temporary connection / pool or driver problem [no need to reboot in this case].

Features to add:

  • Send mail on reboot
  • Send mail in temporary problem
  • Send mail when reported hashrate is much higher than actual pool hashrate
  • For those who have dynamic ips, Send mail when IP is being changed

yan-ts commented Jan 31, 2018

@LazyScream:

  1. copy script to /home/ethos/rigcheck.sh
  2. in console: sudo chmod +x /home/ethos/rigcheck.sh
  3. in console: sudo crontab -e
  4. choose 2 to edit crontab in nano
  5. add a new line: '*/1 * * * * /home/ethos/rigcheck.sh' [ without apostrophes] : this will run the script every minute
  6. press Ctrl + X and save the new cron job
  7. reboot the system

How to set DRY_RUN=false ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment