Skip to content

Instantly share code, notes, and snippets.

@gane5h
Created October 22, 2014 04:06
Show Gist options
  • Star 17 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save gane5h/78cd17c742e618e2c606 to your computer and use it in GitHub Desktop.
Save gane5h/78cd17c742e618e2c606 to your computer and use it in GitHub Desktop.
Nginx log parsing with datadog
"""
Custom parser for nginx log suitable for use by Datadog 'dogstreams'.
To use, add to datadog.conf as follows:
dogstreams: [path to ngnix log (e.g: "/var/log/nginx/access.log"]:[path to this python script (e.g "/usr/share/datadog/agent/dogstream/nginx.py")]:[name of parsing method of this file ("parse")]
so, an example line would be:
dogstreams: /var/log/nginx/access.log:/usr/share/datadog/agent/dogstream/nginx.py:parse
Log of nginx should be defined like that:
log_format time_log '$time_local "$request" S=$status $bytes_sent T=$request_time R=$http_x_forwarded_for';
when starting dd-agent, you can find the collector.log and check if the dogstream initialized successfully
"""
from datetime import datetime
import time
import re
# mapping between datadog and supervisord log levels
METRIC_TYPES = {
'AVERAGE_RESPONSE': 'nginx.net.avg_response',
'FIVE_HUNDRED_STATUS': 'nginx.net.5xx_status'
}
TIME_REGEX = "\sT=[-+]?[0-9]*\.?[0-9]+\s*"
TIME_REGEX_SPLIT = re.compile("T=")
STATUS_REGEX = "\sS=+5[0-9]{2}\s"
def parse(log, line):
if len(line) == 0:
log.info("Skipping empty line")
return None
timestamp = getTimestamp(line)
avgTime = parseAvgTime(line)
objToReturn = []
if isHttpResponse5XX(line):
objToReturn.append((METRIC_TYPES['FIVE_HUNDRED_STATUS'], timestamp, 1, {'metric_type': 'counter'}))
if avgTime is not None:
objToReturn.append((METRIC_TYPES['AVERAGE_RESPONSE'], timestamp, avgTime, {'metric_type': 'gauge'}))
return objToReturn
def getTimestamp(line):
line_parts = line.split()
dt = line_parts[0]
date = datetime.strptime(dt, "%d/%b/%Y:%H:%M:%S")
date = time.mktime(date.timetuple())
return date
def parseAvgTime(line):
time = re.search(TIME_REGEX, line)
if time is not None:
time = time.group(0)
time = TIME_REGEX_SPLIT.split(time)
if len(time) == 2:
return float(time[1])
return None
def isHttpResponse5XX(line):
response = re.search(STATUS_REGEX, line)
return (response is not None)
if __name__ == "__main__":
import sys
import pprint
import logging
logging.basicConfig()
log = logging.getLogger()
lines = open(sys.argv[1]).readlines()
pprint.pprint([parse(log, line) for line in lines])
@mattbillenstein
Copy link

counters from dogstreams don't get aggregated across flush intervals -- this leads to misleading results. Don't use this gist.

@gravyboat
Copy link

@mattbillenstein Did you make a version of this that handles flush intervals?

@jfacevedo
Copy link

@mattbillenstein is there a better option to handle this <?>

@nicknovitski
Copy link

Dogstream isn't part of the new v6 datadog agent (yet?), but you can read the aggregating code in v5 for yourself. The comment seems misleading, but I understand the actual logic as that, in a given invocation of the check, for a given combination of metric name, timestamp (rounded to the nearest 15 seconds), hostname, and "device name" (?), only the last value will be sent, except for counters, which are explicitly summed.

It seems to me that only counters are usable in this approach, but I don't know what it would mean for them to not get aggregated across flush intervals, so there might be a problem I'm not seeing. I also don't know why the dogstream check does its own aggregation when I thought that was exactly what statsd/dogstatsd is for. This seems to make building request latency histograms from nginx logs unworkable. And the new datadog log aggregating product doesn't have metrics processing either. 😕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment