Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
btcd watchdog
#!/bin/bash
POST_INIT_SYNC_DELAY=60
POLL_DELAY=60
STALL_THRESHOLD=5
if [ -z `pidof btcd` ]; then
echo "Starting btcd"
nohup btcd &
sleep $POST_INIT_SYNC_DELAY
fi
stalls=0
while true; do
start=`btcctl --notls getinfo | jq -r .blocks`
sleep $POLL_DELAY
end=`btcctl --notls getinfo | jq -r .blocks`
echo "Processed $((end - start)) blocks in the last $POLL_DELAY seconds"
if [[ "$start" == "$end" ]]; then
if (( stalls > STALL_THRESHOLD )); then
echo "Too many stalls detected. Restarting btcd..."
kill `pidof btcd`
sleep 10
nohup btcd &
stalls=0
else
syncnode=`btcctl --notls getpeerinfo | jq -r '.[] | select(.syncnode == true) | .addr' | cut -f1 -d:`
if [ -z "$syncnode" ]; then
echo "Stall detected, but no syncnode found. Restarting btcd..."
kill `pidof btcd`
sleep 10
nohup btcd &
stalls=0
else
echo "Stall detected! Evicting potentially bad node $syncnode"
btcctl --notls node disconnect $syncnode
stalls=$(( stalls + 1 ))
fi
fi
fi
done
@Sjors

This comment has been minimized.

Show comment Hide comment
@Sjors

Sjors Dec 14, 2017

For OSX you'll need a replacement for pidof, e.g. brew install pidof.

I also had to remove the --notls bit, otherwise I'd get net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16"

Sjors commented Dec 14, 2017

For OSX you'll need a replacement for pidof, e.g. brew install pidof.

I also had to remove the --notls bit, otherwise I'd get net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16"

@Sjors

This comment has been minimized.

Show comment Hide comment
@Sjors

Sjors Dec 15, 2017

This also won't work if you have multiple instances of btcd running, e.g. one for testnet and one for mainnet, because pidof btcd will just pick the first one.

Sjors commented Dec 15, 2017

This also won't work if you have multiple instances of btcd running, e.g. one for testnet and one for mainnet, because pidof btcd will just pick the first one.

@guggero

This comment has been minimized.

Show comment Hide comment
@guggero

guggero Jan 14, 2018

thanks, very useful! I've been trying to sync my btcd for three days now. Hopefully with the watchdog it will now work without interruptions.

guggero commented Jan 14, 2018

thanks, very useful! I've been trying to sync my btcd for three days now. Hopefully with the watchdog it will now work without interruptions.

@bajohns

This comment has been minimized.

Show comment Hide comment
@bajohns

bajohns Jan 21, 2018

This is working very well for me; thanks for posting

bajohns commented Jan 21, 2018

This is working very well for me; thanks for posting

@xelawafs

This comment has been minimized.

Show comment Hide comment
@xelawafs

xelawafs Feb 24, 2018

@Sjors. I have both btcd mainnet and testnet running. By first one do you mean the service started first of the two? It seems to be working fine for me with mainnet so far. I had testnet already synced at 100%, shut btcd down, restarted on mainnet then resumed testnet

@Sjors. I have both btcd mainnet and testnet running. By first one do you mean the service started first of the two? It seems to be working fine for me with mainnet so far. I had testnet already synced at 100%, shut btcd down, restarted on mainnet then resumed testnet

@adiack

This comment has been minimized.

Show comment Hide comment
@adiack

adiack Apr 20, 2018

Works like a charm, thank you. In my case I only had to remove --notls .
./watchdog_btcd.sh

+ POST_INIT_SYNC_DELAY=60
+ POLL_DELAY=60
+ STALL_THRESHOLD=5
++ pidof btcd
+ '[' -z 5465 ']'
+ stalls=0
+ true
++ jq -r .blocks
++ btcctl getinfo
+ start=384672
+ sleep 60
++ btcctl getinfo
++ jq -r .blocks
+ end=384672
+ echo 'Processed 0 blocks in the last 60 seconds'
Processed 0 blocks in the last 60 seconds
+ [[ 384672 == \3\8\4\6\7\2 ]]
+ ((  stalls > STALL_THRESHOLD  ))
++ btcctl getpeerinfo
++ jq -r '.[] | select(.syncnode == true) | .addr'
++ cut -f1 -d:
+ syncnode=217.23.8.80
+ '[' -z 217.23.8.80 ']'
+ echo 'Stall detected! Evicting potentially bad node 217.23.8.80'
Stall detected! Evicting potentially bad node 217.23.8.80
+ btcctl node disconnect 217.23.8.80
2018-04-20 09:28:00.697 [INF] SYNC: Lost peer 217.23.8.80:8333 (outbound)
2018-04-20 09:28:00.697 [INF] SYNC: Syncing to block height 519094 from peer 83.248.113.248:8333
+ stalls=1
+ true
++ jq -r .blocks
++ btcctl getinfo
+ start=384672
+ sleep 60
2018-04-20 09:28:00.977 [INF] SYNC: New valid peer 5.15.98.67:8333 (outbound) (/Satoshi:0.16.0/)
2018-04-20 09:28:01.391 [INF] SYNC: Processed 1 block in the last 7m29.19s (2 transactions, height 384673, 2015-11-21 19:38:21 +0000 UTC)
2018-04-20 09:28:11.851 [INF] SYNC: Processed 3 blocks in the last 10.46s (1207 transactions, height 384676, 2015-11-21 19:47:05 +0000 UTC)
2018-04-20 09:28:24.364 [INF] SYNC: Processed 6 blocks in the last 12.51s (3072 transactions, height 384682, 2015-11-21 20:19:26 +0000 UTC)
2018-04-20 09:28:36.536 [INF] SYNC: Processed 2 blocks in the last 12.17s (3743 transactions, height 384684, 2015-11-21 20:55:52 +0000 UTC)
2018-04-20 09:28:52.387 [INF] SYNC: Processed 4 blocks in the last 15.85s (2171 transactions, height 384688, 2015-11-21 21:24:00 +0000 UTC)

adiack commented Apr 20, 2018

Works like a charm, thank you. In my case I only had to remove --notls .
./watchdog_btcd.sh

+ POST_INIT_SYNC_DELAY=60
+ POLL_DELAY=60
+ STALL_THRESHOLD=5
++ pidof btcd
+ '[' -z 5465 ']'
+ stalls=0
+ true
++ jq -r .blocks
++ btcctl getinfo
+ start=384672
+ sleep 60
++ btcctl getinfo
++ jq -r .blocks
+ end=384672
+ echo 'Processed 0 blocks in the last 60 seconds'
Processed 0 blocks in the last 60 seconds
+ [[ 384672 == \3\8\4\6\7\2 ]]
+ ((  stalls > STALL_THRESHOLD  ))
++ btcctl getpeerinfo
++ jq -r '.[] | select(.syncnode == true) | .addr'
++ cut -f1 -d:
+ syncnode=217.23.8.80
+ '[' -z 217.23.8.80 ']'
+ echo 'Stall detected! Evicting potentially bad node 217.23.8.80'
Stall detected! Evicting potentially bad node 217.23.8.80
+ btcctl node disconnect 217.23.8.80
2018-04-20 09:28:00.697 [INF] SYNC: Lost peer 217.23.8.80:8333 (outbound)
2018-04-20 09:28:00.697 [INF] SYNC: Syncing to block height 519094 from peer 83.248.113.248:8333
+ stalls=1
+ true
++ jq -r .blocks
++ btcctl getinfo
+ start=384672
+ sleep 60
2018-04-20 09:28:00.977 [INF] SYNC: New valid peer 5.15.98.67:8333 (outbound) (/Satoshi:0.16.0/)
2018-04-20 09:28:01.391 [INF] SYNC: Processed 1 block in the last 7m29.19s (2 transactions, height 384673, 2015-11-21 19:38:21 +0000 UTC)
2018-04-20 09:28:11.851 [INF] SYNC: Processed 3 blocks in the last 10.46s (1207 transactions, height 384676, 2015-11-21 19:47:05 +0000 UTC)
2018-04-20 09:28:24.364 [INF] SYNC: Processed 6 blocks in the last 12.51s (3072 transactions, height 384682, 2015-11-21 20:19:26 +0000 UTC)
2018-04-20 09:28:36.536 [INF] SYNC: Processed 2 blocks in the last 12.17s (3743 transactions, height 384684, 2015-11-21 20:55:52 +0000 UTC)
2018-04-20 09:28:52.387 [INF] SYNC: Processed 4 blocks in the last 15.85s (2171 transactions, height 384688, 2015-11-21 21:24:00 +0000 UTC)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment