Skip to content

Instantly share code, notes, and snippets.

@thomasniebler
Created July 12, 2017 08:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thomasniebler/cfef0b5b9ba745beb54c62feabcddefd to your computer and use it in GitHub Desktop.
Save thomasniebler/cfef0b5b9ba745beb54c62feabcddefd to your computer and use it in GitHub Desktop.
Reads a list with Indiana request records, filters wiki navigation, converts it to a readable format and prints it out again
#!/usr/bin/env python
import sys
import struct
from datetime import datetime
with open(sys.argv[1]) as file:
outfile = open("wiki_" + sys.argv[1], "w")
lines = [line.strip() for line in file.readlines()[1:]]
records = [lines[3*i:3*i+3] for i in range(len(lines) / 3)]
for record in records:
if "en.wikipedia.org" in record[0] and "en.wikipedia.org" in record[1]:
date = str(datetime.fromtimestamp(struct.unpack("<L", record[0][:4])[0]))
referrer = record[0][6:]
target = record[1] + record[2]
outfile.write("\"" + date + "\",\"" + record[0][4:6] + "\",\"" + referrer + "\",\"" + target + "\"\n")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment