Skip to content

Instantly share code, notes, and snippets.

Created February 4, 2018 17:16
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
How I mada a Datasette out of a feather file

How I mada a Datasette out of a feather file

Fantastic data journalism by Christine Zhang at the LA Times:

First I grabbed the data - a zipped feather file:


This produced an arrests.feather file.

Next I made a Python virtual environment and installed the dependencies needed to access that file:

virtualenv --python=python3 venv
source venv/bin/activate
pip install feather-format pandas

Then in Python I used pandas to turn the .feather file into a CSV:

import feather
df = feather.read_dataframe('arrests.feather')
df.to_csv(open('arrests.csv', 'w'))

Quick inspection...

$ head arrests.csv 

I used vi to add a id column ad the start of that line:


Then I used csvs-to-sqlite to build a database, extracting some of the columns into foreign key tables:

csvs-to-sqlite arrests.csv -c gender -c race -c occupation -c charge_code -c charge_desc arrests.db

I previewed the database using datasette arrests.db, then I published it to using datasette publish now:

datasette publish now arrests.db \
  --title="LA Times homeless arrests data" \
  --source="LA Times" \

> Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpdq7q__9e/datasette under simonw
> Ready! (copied to clipboard) [36s]
> Synced 3 files (76.57MB) [0ms] 
> Initializing…
> Building

Finally I set up a nicer URL using now alias:

now alias
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment