Skip to content

Instantly share code, notes, and snippets.

@snarlysodboxer
Last active July 6, 2018 19:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save snarlysodboxer/0d6ca45dd542b1f0d4a6d18a16b27e9e to your computer and use it in GitHub Desktop.
Save snarlysodboxer/0d6ca45dd542b1f0d4a6d18a16b27e9e to your computer and use it in GitHub Desktop.
dead man switch, and other options
package main
import (
"flag"
"fmt"
"os"
"syscall"
"time"
)
var (
filePath = flag.String("file_path", "/root/file.txt", "The path to the file whose mtime to check")
ttl = flag.Int("ttl", 60, "Seconds ago within which the file must have been modified. Older than this will report failure.")
)
func main() {
flag.Parse()
stats := &syscall.Stat_t{}
if err := syscall.Stat(*filePath, stats); err != nil {
fmt.Println(err)
os.Exit(1)
}
currentTime := int64(time.Now().Unix())
if currentTime-stats.Mtim.Sec > int64(*ttl) {
fmt.Println("FAILURE")
os.Exit(1)
} else {
fmt.Println("OK")
}
}

monitoring cronjobs

Thoughts and approaches when monitoring cronjobs

In my experience there's often a better way to do things than with cronjobs, however for some use-cases it's the right tool for the job.

  • If it's possible to modify the job itself to push a metric to a metrics system, this often reduces systems setup, coupling, and moving parts.
    • The metrics system can then alert on a lack of recent data points.
    • I've successfully used this method to monitor database backups both using Prometheus's push-gateway, as well as InfluxDB with Grafana.
    • This also enables sending along other data points, such as duration information for the job so it can be graphed.
    • Care must be taken not to allow failed metrics code to cause the job to fail.
  • Where it's not reasonable to modify the job, here's a couple of approaches that can be considered:
    • Replacing the cron entry with a wrapper script that records duration and sends the metric.
      • Signal handling should be implemented and passed through to the child process (the job).
    • Creating a custom metrics exporter that checks on the results of the actions taken by the job.
      • E.G. checking S3 for recent files in the backups directory.
      • Different metrics systems would require different paradigms.
      • With Prometheus, it could be a daemon exposing a /metrics endpoint which when hit reaches out to S3 and formats the returned data for consumption by Prometheus. Prometheus could then alert on both missing backups as well as an unreachable metrics exporter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment