This repository has a dataset of 184.879 crimes committed in Buenos Aires: https://github.com/ramadis/delitos-caba
Download the raw data like this:
wget 'https://github.com/ramadis/delitos-caba/releases/download/3.0/delitos.json'
Now use Pandas to load that into a dataframe:
>>> import pandas as pd
>>> df = pd.read_json(open('delitos.json'))
>>> df
barrio cantidad_vehiculos cantidad_victimas comuna \
0 FLORESTA 2 0 Comuna 10
1 PALERMO 1 0 Comuna 14
2 VILLA CRESPO 3 0 Comuna 15
3 PARQUE AVELLANEDA 1 0 Comuna 9
4 VILLA GRAL MITRE 1 0 Comuna 11
5 FLORESTA 2 0 Comuna 10
6 PARQUE CHACABUCO 1 0 Comuna 7
7 RECOLETA 4 0 Comuna 2
8 FLORES 2 0 Comuna 7
9 NUEVA POMPEYA 2 0 Comuna 4
Pandas can save that file to a SQLite database like so:
>>> import sqlite3
>>> conn = sqlite3.connect('delitos.db')
>>> df.to_sql('delitos', conn)
Now you can deploy that SQLite file as a web-based API using datasette publish. You'll need to install Zeit Now and Datasette first.
Include source information with command-line options:
datasette publish now delitos.db --source_url="https://github.com/ramadis/delitos-caba" --source="ramadis/delitos-caba"
> Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpcagpkk5f/datasette under simonw
> Ready! https://datasette-tkgtsfyjyj.now.sh (copied to clipboard) [6s]
> Synced 2 files (399B) [0ms]
> ...
A few seconds later: https://datasette-tkgtsfyjyj.now.sh
Let's give it a nice URL:
now alias https://datasette-tkgtsfyjyj.now.sh delitos-caba.now.sh
Now you can visit it at https://delitos-caba.now.sh/
Here's an example SQL query showing the most common barrios: https://delitos-caba.now.sh/delitos-6395324?sql=select+%22barrio%22%2C+count%28%2A%29+as+%22count%22+from+delitos++group+by+%22barrio%22+order+by+%22count%22+desc+limit+100
You can also
pd.read_json("https://raw.githubusercontent.com/ramadis/delitos-caba/master/delitos.json")
so should be easy to do a simple python cli tool to take url, load json into pandas dataframe, then create the sqlite db as output.Alternatively, use py to download the file rather than
wget
.