Skip to content

Instantly share code, notes, and snippets.

@JCotton1123
Created July 15, 2014 19:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JCotton1123/88c0d6c9ffd82942a704 to your computer and use it in GitHub Desktop.
Save JCotton1123/88c0d6c9ffd82942a704 to your computer and use it in GitHub Desktop.
Parse an apache log into a pipe-delimited file
from __future__ import print_function
import sys
import re
parts = [
r'(?P<host>\S+)', # host %h
r'\S+', # indent %l (unused)
r'(?P<user>\S+)', # user %u
r'\[(?P<time>.+)\]', # time %t
r'"(?P<request>.+)"', # request "%r"
r'(?P<status>[0-9]+)', # status %>s
r'(?P<size>\S+)', # size %b (careful, can be '-')
r'"(?P<referer>.*)"', # referer "%{Referer}i"
r'"(?P<agent>.*)"', # user agent "%{User-agent}i"
]
pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')
with open(sys.argv[1]) as f:
for line in f:
try:
m = pattern.match(line)
res = m.groupdict()
print("|".join([res['status'],res['request'],res['agent']]))
except:
print("Unable to parse line %s" % line, file=sys.stderr)
f.close()
@JCotton1123
Copy link
Author

This may have been borrowed from someone else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment