simonw/la-times-feather.md

## la-times-feather.md

      
    Raw
  

              la-times-feather.md
            
          
    How I mada a Datasette out of a feather file

Fantastic data journalism by Christine Zhang at the LA Times: https://github.com/datadesk/homeless-arrests-analysis
First I grabbed the data - a zipped feather file:
wget https://github.com/datadesk/homeless-arrests-analysis/blob/master/arrests.zip?raw=true
mv arrests.zip\?raw\=true arrests.zip
unzip arrests.zip 

This produced an arrests.feather file.
Next I made a Python virtual environment and installed the dependencies needed to access that file:
virtualenv --python=python3 venv
source venv/bin/activate
pip install feather-format pandas

Then in Python I used pandas to turn the .feather file into a CSV:
import feather
df = feather.read_dataframe('arrests.feather')
df.head()
df.to_csv(open('arrests.csv', 'w'))

Quick inspection...
$ head arrests.csv 
,booking_num,homeless,arrest_year,arrest_ymd,booking_ymd,gender,race,age,occupation,charge_code,charge_desc
...

I used vi to add a id column ad the start of that line:
id,booking_num,homeless,arrest_year,arrest_ymd,booking_ymd,gender,race,age,occupation,charge_code,charge_desc

Then I used csvs-to-sqlite to build a database, extracting some of the columns into foreign key tables:
csvs-to-sqlite arrests.csv -c gender -c race -c occupation -c charge_code -c charge_desc arrests.db

I previewed the database using datasette arrests.db, then I published it to now.sh using datasette publish now:
datasette publish now arrests.db \
  --title="LA Times homeless arrests data" \
  --source="LA Times" \
  --source_url="https://github.com/datadesk/homeless-arrests-analysis"

> Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpdq7q__9e/datasette under simonw
> Ready! https://datasette-qjopjrpscl.now.sh (copied to clipboard) [36s]
> Synced 3 files (76.57MB) [0ms] 
> Initializing…
> Building
...

Finally I set up a nicer URL using now alias:
now alias https://datasette-qjopjrpscl.now.sh la-times-homeless-arrests-analysis.now.sh