Skip to content

Instantly share code, notes, and snippets.

@jimr
Created December 6, 2013 11:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jimr/7822401 to your computer and use it in GitHub Desktop.
Save jimr/7822401 to your computer and use it in GitHub Desktop.
Tally up records from httpd access logs by hour of day. Usage: python count_the_hours.py /path/to/access.log.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import itertools
import re
import sys
from datetime import datetime as dt
pattern = re.compile(r'.* \[([^]]*) \+[0-9]+\] .*')
date_fmt = '%d/%b/%Y:%H:%M:%S'
# hour of day counter
hours = dict(zip(range(24), itertools.repeat(0, 24)))
with open(sys.argv[1]) as f:
for line in f.readlines():
m = pattern.match(line)
if not m:
print 'no match: %s' % line
continue
date = dt.strptime(m.groups(1)[0], date_fmt)
hours[date.hour] += 1
for hour, count in sorted(hours.items(), cmp=lambda x, y: cmp(x[1], y[1])):
print '%s\t%s' % (hour, count)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment