Skip to content

Instantly share code, notes, and snippets.

@joswr1ght
Created December 16, 2019 11:45
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save joswr1ght/c2e08f520933bb36c0b19aa0dcb6a173 to your computer and use it in GitHub Desktop.
Save joswr1ght/c2e08f520933bb36c0b19aa0dcb6a173 to your computer and use it in GitHub Desktop.
Convert Apache/Nginx Unified Log Format to CSV
# accesslog2csv: Convert default, unified access log from Apache, Nginx
# servers to CSV format.
#
# Original source by Maja Kraljic, July 18, 2017
# Modified by Joshua Wright to parse all elements in the HTTP request as
# different columns, December 16, 2019
import csv
import re
import sys
if len(sys.argv) == 1:
sys.stdout.write("Usage: %s <access.log> <accesslog.csv>\n"%sys.argv[0])
sys.exit(0)
log_file_name = sys.argv[1]
csv_file_name = sys.argv[2]
pattern = re.compile(r'(?P<host>\S+).(?P<rfc1413ident>\S+).(?P<user>\S+).\[(?P<datetime>\S+ \+[0-9]{4})]."(?P<httpverb>\S+) (?P<url>\S+) (?P<httpver>\S+)" (?P<status>[0-9]+) (?P<size>\S+) "(?P<referer>.*)" "(?P<useragent>.*)"\s*\Z')
file = open(log_file_name)
with open(csv_file_name, 'w') as out:
csv_out=csv.writer(out)
csv_out.writerow(['host', 'ident', 'user', 'time', 'verb', 'url', 'httpver', 'status', 'size', 'referer', 'useragent'])
for line in file:
m = pattern.match(line)
result = m.groups()
csv_out.writerow(result)
@mims92
Copy link

mims92 commented Oct 21, 2020

Works great, thanks!

@cys2best
Copy link

It works like a charm, thanks!

@varvir
Copy link

varvir commented Jan 14, 2024

if pattern.match returns None, you have to handle it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment