#!/bin/bash | |
# | |
# Minimalistic auto-reboot script for ethOS 1.2.9 - 1.3.1 | |
# https://trick77.com/auto-restart-crashed-mining-processes-ethos | |
# | |
# This script will automatically reboot the mining rig depending on the reported | |
# mining status in ethOS. | |
# - The script should be triggered every 15 minutes from a cron job. Do not run it in shorter intervals. | |
# - May or may not work with other ethOS versions than indicated above. | |
# | |
# This script should only be used in more or less stable rigs. Do not use it on rigs that aren't properly | |
# fine tuned. | |
# | |
DRY_RUN=true # set this to false to enable auto-restart/reboot | |
LOG_FILE=/home/ethos/rigcheck.log | |
if [ "$EUID" != 0 ] | |
then echo "Please run as root or, if calling it from a console, use sudo $0" | |
exit | |
fi | |
if [ ${DRY_RUN} = true ]; then | |
echo "$(date) $0 running in DRY_RUN mode, auto-reboot not enabled!" | tee -a ${LOG_FILE} | |
fi | |
ALLOW=$(cat /opt/ethos/etc/allow.file) | |
if [ ${ALLOW} != 1 ]; then | |
echo "$(date) Miner not enabled, exiting $0..." | tee -a ${LOG_FILE} | |
exit 0 | |
fi | |
if grep -q "gpu clock problem" /var/run/ethos/status.file; then | |
CRASHED=$(cat /var/run/ethos/crashed_gpus.file) | |
echo "$(date) GPU clock problem detected on GPU(s) ${CRASHED}, rebooting..." | tee -a ${LOG_FILE} | |
if [ ${DRY_RUN} = false ]; then | |
rm -f /var/run/ethos/crashed_gpus.file | |
/opt/ethos/bin/r | |
fi | |
elif grep -q "gpu crashed" /var/run/ethos/status.file; then | |
echo "$(date) GPU crash detected, rebooting..." | tee -a ${LOG_FILE} | |
if [ ${DRY_RUN} = false ]; then | |
rm -f /var/run/ethos/crashed_gpus.file | |
/opt/ethos/bin/r | |
fi | |
else | |
echo "Everything's fine, exiting..." | |
fi |
I found some bugs here - ill commit my changes whenever I have some time, hopefully this week.
Known Bugs:
- '/opt/ethos/bin/r' doesn't always work as expected, some validations should be added to check whether the reboot was successful [ reboot is not being executed - probably ehtos bug]
- sometimes the '/var/run/ethos/status.file' is not updated [due to ethos bug] and reporting that everything is ok, while there might be a problem with mem clock / mem state or miner hashes [no need to reboot in this case].
- sometimes there is a 'gpu clock problem' in status.file, but shown due to temporary connection / pool or driver problem [no need to reboot in this case].
Features to add:
- Send mail on reboot
- Send mail in temporary problem
- Send mail when reported hashrate is much higher than actual pool hashrate
- For those who have dynamic ips, Send mail when IP is being changed
- copy script to /home/ethos/rigcheck.sh
- in console: sudo chmod +x /home/ethos/rigcheck.sh
- in console: sudo crontab -e
- choose 2 to edit crontab in nano
- add a new line: '*/1 * * * * /home/ethos/rigcheck.sh' [ without apostrophes] : this will run the script every minute
- press Ctrl + X and save the new cron job
- reboot the system
How to set DRY_RUN=false ?
@lawr3nc3 you go into the file and change it. if you're using the terminal to edit it,
1.) Open the file using a terminal text editor (I just use vi) - vi /home/ethos/rigcheck.sh
2.) Go into edit mode - literally press "i" and you'll see at the bottom left of the terminal, you'll see "-- Insert --". This means you're in edit mode and using the arrow keys, navigate to where you want to edit and change whatever.
3.) To exit and save, press the "Esc" key and type ":wq" for write and quit (you'll see yourself typing at the bottom left). To exit without saving the changes, press "Esc" and type ":q!" for force quit.
Hi there,
this does not work for me, Ethos does not show crashed gpu in status.file. For now I'm using a python script which reads ethos satus by api : https://github.com/jb41997/ethos_auto_reboot
Thanks for the great work
I just tried it now and it's working for me