Skip to content

Instantly share code, notes, and snippets.

Last active November 11, 2023 17:42
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
How to turn a list of JSON objects into a Datasette

How to turn a list of JSON objects into a Datasette

This repository has a dataset of 184.879 crimes committed in Buenos Aires:

Download the raw data like this:

wget ''

Now use Pandas to load that into a dataframe:

>>> import pandas as pd
>>> df = pd.read_json(open('delitos.json'))
>>> df
                   barrio  cantidad_vehiculos  cantidad_victimas     comuna  \
0                FLORESTA                   2                  0  Comuna 10   
1                 PALERMO                   1                  0  Comuna 14   
2            VILLA CRESPO                   3                  0  Comuna 15   
3       PARQUE AVELLANEDA                   1                  0   Comuna 9   
4        VILLA GRAL MITRE                   1                  0  Comuna 11   
5                FLORESTA                   2                  0  Comuna 10   
6        PARQUE CHACABUCO                   1                  0   Comuna 7   
7                RECOLETA                   4                  0   Comuna 2   
8                  FLORES                   2                  0   Comuna 7   
9           NUEVA POMPEYA                   2                  0   Comuna 4   

Pandas can save that file to a SQLite database like so:

>>> import sqlite3
>>> conn = sqlite3.connect('delitos.db')
>>> df.to_sql('delitos', conn)

Now you can deploy that SQLite file as a web-based API using datasette publish. You'll need to install Zeit Now and Datasette first.

Include source information with command-line options:

datasette publish now delitos.db --source_url="" --source="ramadis/delitos-caba"
> Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpcagpkk5f/datasette under simonw
> Ready! (copied to clipboard) [6s]
> Synced 2 files (399B) [0ms] 
> ...

A few seconds later:

Let's give it a nice URL:

now alias

Now you can visit it at

Here's an example SQL query showing the most common barrios:

Copy link

psychemedia commented Jan 22, 2018

You can also pd.read_json("") so should be easy to do a simple python cli tool to take url, load json into pandas dataframe, then create the sqlite db as output.

Alternatively, use py to download the file rather than wget.

Copy link

FWIW, it may also be worth limiting the amount of data ingested at any one time (pandas holds dataframes in memory), eg

If the json file is just lines of JSON, pandas has aa chunksize param to allow you to load in chunks of lines:

The pandas library can also load large CSV files in chunkwise:

Copy link

Slowly commented Feb 17, 2020

Yes, you can use pandas read_json - IF it is heeeasssyy and sssiiiiiimple. But if you've got a very deeply nested json datasource that isn't a file and has lists mixed in with dictionaries - it won't work. For example, a graphql query result from a Tableau metadata API call.

Copy link

simonw commented Sep 19, 2022

Since writing this my sqlite-utils CLI tool has grown the ability to do this as a one-liner:

curl -L '' | \
 sqlite-utils insert delitos.db delitos - --pk id

You can see the schema like this:

sqlite-utils schema delitos.db
CREATE TABLE [delitos] (
   [comuna] TEXT,
   [barrio] TEXT,
   [latitud] FLOAT,
   [longitud] FLOAT,
   [fecha] TEXT,
   [hora] TEXT,
   [uso_arma] TEXT,
   [uso_moto] TEXT,
   [lugar] TEXT,
   [origen_dato] TEXT,
   [tipo_delito] TEXT,
   [cantidad_vehiculos] INTEGER,
   [cantidad_victimas] INTEGER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment