Skip to content

Instantly share code, notes, and snippets.

@edsu
Created September 10, 2018 16:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edsu/fa2ec08c35725687003060b846411400 to your computer and use it in GitHub Desktop.
Save edsu/fa2ec08c35725687003060b846411400 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
"""
Ugly little program to turn a diffengine database into a csv. Maybe something
prettier should should be part of diffengine itself?
"""
import re
import csv
import sqlite3
db = sqlite3.connect("diffengine.db")
out = csv.writer(open('diffengine.csv', 'w'))
out.writerow(['url', 'old', 'new', 'diff'])
for old_id, new_id in db.execute('SELECT old_id, new_id FROM diff'):
url, old_url = db.execute('SELECT url, archive_url FROM entryversion WHERE id = ?', [old_id]).fetchone()
new_url = db.execute('SELECT archive_url FROM entryversion WHERE id = ?', [new_id]).fetchone()[0]
if not (new_url and old_url):
continue
t1 = re.search('web/(\d+)', str(old_url)).group(1)
t2 = re.search('web/(\d+)', str(new_url)).group(1)
diff = 'http://vbanos-dev.us.archive.org:8092/web/diff/%s/%s/%s' % (t1, t2, url)
out.writerow([url, old_url, new_url, diff])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment