Skip to content

Instantly share code, notes, and snippets.

@tfheen
Created April 13, 2016 14:10
Show Gist options
  • Save tfheen/0f36612fddb2d24dbd8c133ca3b200af to your computer and use it in GitHub Desktop.
Save tfheen/0f36612fddb2d24dbd8c133ca3b200af to your computer and use it in GitHub Desktop.
* Debian healthcheck of multi-master upstream
Healthcheck:
Once per minute (or whatever), on each static.d.o node
- try to fetch .serial from peers, if failure:
- get .serial ten times from fastly
- save max value.
If max value > local value:
Mark local as unhealthy
purge service
Why not mini-nag?
- not per service
- does not check serial
- mini-nag's http-check could hook into this system
On push:
- purge changed/deleted files (rsync's --itemize-changes)
Outstanding problem:
- two bad hosts could end up with "good" health checks, risk
approximately 1.7E-5 (after host fails to hit peers)
- bootstrapping problem: what happens when all static hosts are down
and .serial has timed out? Treat 404/500 on peers + fastly .serial
as "we are current"?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment