Skip to content

Instantly share code, notes, and snippets.

@MatthaeusHarris
Created May 29, 2014 19:59
Show Gist options
  • Save MatthaeusHarris/48e03d71e06b1b4ac1b5 to your computer and use it in GitHub Desktop.
Save MatthaeusHarris/48e03d71e06b1b4ac1b5 to your computer and use it in GitHub Desktop.
HTTP service uptime nagios plugin
#!/usr/bin/env ruby
# Tweakable knobs
# Which statuses constitute a success?
acceptable_statuses = ["200", "404", "403"]
# How much past history do we consider for each run, in seconds?
history = 60
# Which file are we monitoring?
logfile = './log.txt'
# What are our alert thresholds?
thresholds = {
:warn => 0.99999,
:critical => 0.9999
}
begin
require 'time'
require 'pp'
now = Time.new
log = File.read(logfile).split("\n")
statuses = {}
results = {:good => 0, :bad => 0}
# Used for testing, remove for prodution
now = Time.parse "2014-05-27 10:00:30 UTC"
# Find relevant lines, as defined by being less than one minute old.
log.reject! do |line|
# This only works if the date is the first field.
line_time = Time.parse line
# We are rejecting lines that fall out of our window
(now - line_time) > history || line_time > now
end
# Parse logs to retreive HTTP status
# This is made very easy because the status is the last field
log.each do |line|
status = line.split(' ')[-1]
statuses[status] ||= 0
statuses[status] += 1
end
statuses.each do |status, count|
if acceptable_statuses.include? status
results[:good] += count
else
results[:bad] += count
end
end
result = results[:good] / (results[:good] + results[:bad] + 0.0)
puts "Service has #{result * 100} % uptime"
# Generate the exit code.
exit(0) if result >= thresholds[:warn]
exit(1) if result >= thresholds[:critical]
exit(2) if result < thresholds[:critical]
rescue StandardError => e
puts "Error in script: #{e.message}"
puts e.backtrace
exit(3)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment