Skip to content

Instantly share code, notes, and snippets.

@nijave
Forked from sumeetpareek/python_apache_logs.py
Last active May 4, 2021 02:08
Show Gist options
  • Save nijave/774f89600354deb85a2027ce69bda0e1 to your computer and use it in GitHub Desktop.
Save nijave/774f89600354deb85a2027ce69bda0e1 to your computer and use it in GitHub Desktop.
A very simple Apache access log parser in Python
#!/usr/bin/env python3
import json
import re
import sys
parts = [
r'(?P<host>\S+)', # host %h
r'\S+', # indent %l (unused)
r'(?P<user>\S+)', # user %u
r'\[(?P<time>.+)\]', # time %t
r'"(?P<request>.*)"', # request "%r"
r'(?P<status>[0-9]+)', # status %>s
r'(?P<size>\S+)', # size %b (careful, can be '-')
r'"(?P<referrer>.*)"', # referrer "%{Referer}i"
r'"(?P<agent>.*)"', # user agent "%{User-agent}i"
]
pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')
with open(sys.argv[1], "r") as log_file:
for line in log_file:
print(json.dumps(pattern.match(line).groupdict()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment