Skip to content

Instantly share code, notes, and snippets.

Simon Willison simonw

Block or report user

Report or block simonw

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
simonw / example-Locations.xml
Last active Jun 14, 2019
Convert Locations.kml (pulled from an iPhone backup) to SQLite
View example-Locations.xml
<?xml version="1.0" encoding="utf-8"?>
<kml xmlns="">
<name>2015-12-18 19:12:32 Source: WhatsApp</name>
simonw / CSV conf CSV schedule.ipynb
Created May 9, 2019
Code for scraping the CSVConf schedule. This is pretty messy - I wrote most of it on a plane with no internet connection so I had to get it working against the offline data I had accidentalyl cached.
View CSV conf CSV schedule.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# For a sample Starlette app
from starlette.applications import Starlette
from starlette.responses import JSONResponse
import sys
import sqlite3
application = Starlette()
View pypi-top-1500.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
simonw /
Created Apr 3, 2019
Fetch metadata from Google Drive API for a list of doc_ids (because their batch API is extremely difficult to figure out)
def fetch_metadata_for_doc_ids(doc_ids, oauth_token):
boundary = 'batch_boundary'
headers = {
'Authorization': 'Bearer {}'.format(oauth_token),
'Content-Type': 'multipart/mixed; boundary=%s' % boundary,
body = ''
for doc_id in doc_ids:
req = 'GET{}?fields=*'.format(doc_id)
body += '--%s\n' % boundary
import csv
from dictdiffer import diff
def load_trees(filepath):
fp = csv.reader(open(filepath))
headings = next(fp)
rows = [dict(zip(headings, line)) for line in fp]
return {r["TreeID"]: r for r in rows}
simonw /
Last active Mar 10, 2019
How I created

How I created

Try it out at - see this Twitter thread for background.

I started by grabbing the URLs to every downloadable Excel spreadsheet.

I navigated to the "Downloads (Public)" link starting from - then I ran this JavaScript in my browser's console to extract all of the URLs as a JSON blob.

csvs-to-sqlite \
--table=candidates \
-c election \
-f name \
-f party_name \
-f post_label \
datasette publish heroku democracyclub.db \
--name="democracyclub-datasette" \
simonw / sessions.json
Created Jan 21, 2019
SRCCON sessions from 2018 (just in case they get over-written for 2019) - from
View sessions.json
"day": "Thursday",
"description": "Get your badges and get some food (plus plenty of coffee), as you gear up for the first day of SRCCON!",
"everyone": "y",
"facilitators": "",
"facilitators_twitter": "",
"id": "thursday-breakfast",
"length": "",
"notepad": "",
simonw /
Last active Jan 21, 2019 one-liner

Bash one-liner I used to create

git clone \
    && csvs-to-sqlite toss-up/data/*.csv toss-up.db \
    && datasette publish now toss-up.db 
        --source_url= \
        --install=datasette-vega \
        --install=datasette-cluster-map \
You can’t perform that action at this time.