Skip to content

Instantly share code, notes, and snippets.

Simon Willison simonw

View GitHub Profile
simonw /
Created Apr 3, 2019
Fetch metadata from Google Drive API for a list of doc_ids (because their batch API is extremely difficult to figure out)
def fetch_metadata_for_doc_ids(doc_ids, oauth_token):
boundary = 'batch_boundary'
headers = {
'Authorization': 'Bearer {}'.format(oauth_token),
'Content-Type': 'multipart/mixed; boundary=%s' % boundary,
body = ''
for doc_id in doc_ids:
req = 'GET{}?fields=*'.format(doc_id)
body += '--%s\n' % boundary
import csv
from dictdiffer import diff
def load_trees(filepath):
fp = csv.reader(open(filepath))
headings = next(fp)
rows = [dict(zip(headings, line)) for line in fp]
return {r["TreeID"]: r for r in rows}
simonw /
Last active Mar 10, 2019
How I created

How I created

Try it out at - see this Twitter thread for background.

I started by grabbing the URLs to every downloadable Excel spreadsheet.

I navigated to the "Downloads (Public)" link starting from - then I ran this JavaScript in my browser's console to extract all of the URLs as a JSON blob.

csvs-to-sqlite \
--table=candidates \
-c election \
-f name \
-f party_name \
-f post_label \
datasette publish heroku democracyclub.db \
--name="democracyclub-datasette" \
simonw / sessions.json
Created Jan 21, 2019
SRCCON sessions from 2018 (just in case they get over-written for 2019) - from
View sessions.json
"day": "Thursday",
"description": "Get your badges and get some food (plus plenty of coffee), as you gear up for the first day of SRCCON!",
"everyone": "y",
"facilitators": "",
"facilitators_twitter": "",
"id": "thursday-breakfast",
"length": "",
"notepad": "",
simonw /
Last active Jan 21, 2019 one-liner

Bash one-liner I used to create

git clone \
    && csvs-to-sqlite toss-up/data/*.csv toss-up.db \
    && datasette publish now toss-up.db 
        --source_url= \
        --install=datasette-vega \
        --install=datasette-cluster-map \
simonw /
Last active Jan 6, 2019
Demonstrating a bug in Peewee's bm25 function - see
import math
import struct
import sqlite3
conn = sqlite3.connect(":memory:")
INSERT INTO docs (c0, c1) VALUES ("this is about a dog", "more about that dog dog");
INSERT INTO docs (c0, c1) VALUES ("this is about a cat", "stuff on that cat cat");
simonw /
Created Nov 21, 2018
How gargoyle selective exclude rules work
simonw / Dockerfile
Last active Mar 30, 2019
The Dockerfile used by the new Datasette Publish to generate images that are smaller than 100MB
View Dockerfile
FROM python:3.6-slim-stretch as csvbuilder
# This one uses csvs-to-sqlite to compile the DB, and then uses datasette
# inspect to generate inspect-data.json Compiling pandas takes way too long
# under alpine so we use slim-stretch for this one instead.
RUN apt-get update && apt-get install -y python3-dev gcc
COPY *.csv csvs/
RUN pip install csvs-to-sqlite datasette
RUN csvs-to-sqlite csvs/names.csv data.db -f "name" -c "legislature" -c "country"
simonw /
Created Oct 31, 2018
How to import a GitHub repository as a subdirectory of a new repository while maintaining commits and datestamps

How to import a GitHub repository as a subdirectory of a new repository while maintaining commits and datestamps

There is probably a better way to do this, but this worked for me.

I had a repository called docsearch that I had been building a prototype in.

I wanted to move the contents of that repository into an existing repository called search_experiments - but I wanted the contents to live in a docsearch/ subdirectory rather than living in the root of the repo.

I solved this using the combination of git format-patch and git apply.

You can’t perform that action at this time.