Skip to content

Instantly share code, notes, and snippets.

@EvanBalster
Last active April 16, 2023 19:09
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save EvanBalster/87d0ac9153587c4dce6d260ee49bd64d to your computer and use it in GitHub Desktop.
Save EvanBalster/87d0ac9153587c4dce6d260ee49bd64d to your computer and use it in GitHub Desktop.
WIP: A Google Cloud startup-script to automatically revive preemptible compute instances.
#!/bin/bash
#
# GCloud startup script to auto-restart any instances with 'revive' tag.
# The calling machine must have Read/Write access to compute API!!
# I use this to reboot preemptible instances.
# Output is logged to /tmp/revive.log
indent() { sed 's/^/ /'; }
revive_instances() {
# Go through lines in the provided string
for line in "$1"; do
echo "$line"
# Instance name is the first word in the line.
instance_name=`echo "$line" | head -n1 | awk '{print $1}'`
instance_zone=`echo "$line" | head -n1 | awk '{print $2}'`
# Attempt to reboot the instance
echo "Rebooting '$instance_name' in zone '$instance_zone'..."
gcloud compute instances start "--zone=$instance_zone" "$instance_name"
done
}
auto_reviver () {
REVIVE_TAG="$1"
CHECK_INTERVAL="$2"
LOG_FILE="$3"
IFS=$'\n'
date +"%F %T: monitoring instances with revive tag '$REVIVE_TAG', interval $CHECK_INTERVAL" >> "$LOG_FILE"
while :; do
# Look for instances with "revive" in their name/tags and TERMINATED status
offline=`gcloud compute instances list --format='table(name,zone,status,tags.list())' | grep "$REVIVE_TAG" | grep "TERMINATED"`
if [[ ! -z "$offline" ]] ; then
# If we found some, reboot them
date +"%F %T: some instances are down." >> "$LOG_FILE"
revive_instances "$offline" | indent >> "$LOG_FILE"
fi
# Sleep for the check interval
sleep $CHECK_INTERVAL
done
}
# Make sure revive.log is readable by general users
printf '' >> "/tmp/revive.log"
chmod 644 "/tmp/revive.log"
# Run auto-reviver with tag "revive", check interval 2 minutes, logging
auto_reviver "revive" 120 "/tmp/revive.log"
@sunk818
Copy link

sunk818 commented May 24, 2020

So, you need another instance like a f1micro running that runs this job and restart any terminated instances to restart?

How does the f1micro instance get "Read/Write access to compute API"?

@EvanBalster
Copy link
Author

So, you need another instance like a f1micro running that runs this job and restart any terminated instances to restart?

How does the f1micro instance get "Read/Write access to compute API"?

Yes, you run this from another, non-preemptible instance, usually a micro. I seem to recall compute API permissions are added from the micro instance's settings.

@adebo4all
Copy link

Thank you for the script. Please how can I install the script on a f1micro GCE instance? As a startup script? Any cronjob setup required? Please advise.

@EvanBalster
Copy link
Author

# GCloud startup script to auto-restart any instances with 'revive' tag.

It's a startup script as indicated right in the source.

@adebo4all
Copy link

Thanks for the response.

Please what is your opinion about the new VM INSTANCE SCHEDULE feature in GCE? I think it should serve the same purpose since a preemptible lasts for 24 hours, so the new VM INSTANCE SCHEDULE feature could stop and start the preemptible at the end of the 24hours cycle if well calculated and considering the cost involved.

https://cloud.google.com/compute/docs/instances/schedule-instance-start-stop.

Please advise

@EvanBalster
Copy link
Author

I suggest taking that question to a Q&A forum like Stack Overflow.

I haven't done anything with cloud computing in over a year so I have no opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment