Skip to content

Instantly share code, notes, and snippets.

@simonw
Created February 4, 2018 17:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save simonw/db8c55ac87f6b0dea33257a89e731fe4 to your computer and use it in GitHub Desktop.
Save simonw/db8c55ac87f6b0dea33257a89e731fe4 to your computer and use it in GitHub Desktop.
How I mada a Datasette out of a feather file

How I mada a Datasette out of a feather file

Fantastic data journalism by Christine Zhang at the LA Times: https://github.com/datadesk/homeless-arrests-analysis

First I grabbed the data - a zipped feather file:

wget https://github.com/datadesk/homeless-arrests-analysis/blob/master/arrests.zip?raw=true
mv arrests.zip\?raw\=true arrests.zip
unzip arrests.zip 

This produced an arrests.feather file.

Next I made a Python virtual environment and installed the dependencies needed to access that file:

virtualenv --python=python3 venv
source venv/bin/activate
pip install feather-format pandas

Then in Python I used pandas to turn the .feather file into a CSV:

import feather
df = feather.read_dataframe('arrests.feather')
df.head()
df.to_csv(open('arrests.csv', 'w'))

Quick inspection...

$ head arrests.csv 
,booking_num,homeless,arrest_year,arrest_ymd,booking_ymd,gender,race,age,occupation,charge_code,charge_desc
...

I used vi to add a id column ad the start of that line:

id,booking_num,homeless,arrest_year,arrest_ymd,booking_ymd,gender,race,age,occupation,charge_code,charge_desc

Then I used csvs-to-sqlite to build a database, extracting some of the columns into foreign key tables:

csvs-to-sqlite arrests.csv -c gender -c race -c occupation -c charge_code -c charge_desc arrests.db

I previewed the database using datasette arrests.db, then I published it to now.sh using datasette publish now:

datasette publish now arrests.db \
  --title="LA Times homeless arrests data" \
  --source="LA Times" \
  --source_url="https://github.com/datadesk/homeless-arrests-analysis"

> Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpdq7q__9e/datasette under simonw
> Ready! https://datasette-qjopjrpscl.now.sh (copied to clipboard) [36s]
> Synced 3 files (76.57MB) [0ms] 
> Initializing…
> Building
...

Finally I set up a nicer URL using now alias:

now alias https://datasette-qjopjrpscl.now.sh la-times-homeless-arrests-analysis.now.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment