Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
How to turn a list of JSON objects into a Datasette

How to turn a list of JSON objects into a Datasette

This repository has a dataset of 184.879 crimes committed in Buenos Aires:

Download the raw data like this:

wget ''

Now use Pandas to load that into a dataframe:

>>> import pandas as pd
>>> df = pd.read_json(open('delitos.json'))
>>> df
                   barrio  cantidad_vehiculos  cantidad_victimas     comuna  \
0                FLORESTA                   2                  0  Comuna 10   
1                 PALERMO                   1                  0  Comuna 14   
2            VILLA CRESPO                   3                  0  Comuna 15   
3       PARQUE AVELLANEDA                   1                  0   Comuna 9   
4        VILLA GRAL MITRE                   1                  0  Comuna 11   
5                FLORESTA                   2                  0  Comuna 10   
6        PARQUE CHACABUCO                   1                  0   Comuna 7   
7                RECOLETA                   4                  0   Comuna 2   
8                  FLORES                   2                  0   Comuna 7   
9           NUEVA POMPEYA                   2                  0   Comuna 4   

Pandas can save that file to a SQLite database like so:

>>> import sqlite3
>>> conn = sqlite3.connect('delitos.db')
>>> df.to_sql('delitos', conn)

Now you can deploy that SQLite file as a web-based API using datasette publish. You'll need to install Zeit Now and Datasette first.

Include source information with command-line options:

datasette publish now delitos.db --source_url="" --source="ramadis/delitos-caba"
> Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpcagpkk5f/datasette under simonw
> Ready! (copied to clipboard) [6s]
> Synced 2 files (399B) [0ms] 
> ...

A few seconds later:

Let's give it a nice URL:

now alias

Now you can visit it at

Here's an example SQL query showing the most common barrios:

Copy link

psychemedia commented Jan 22, 2018

You can also pd.read_json("") so should be easy to do a simple python cli tool to take url, load json into pandas dataframe, then create the sqlite db as output.

Alternatively, use py to download the file rather than wget.

Copy link

psychemedia commented Jan 22, 2018

FWIW, it may also be worth limiting the amount of data ingested at any one time (pandas holds dataframes in memory), eg

If the json file is just lines of JSON, pandas has aa chunksize param to allow you to load in chunks of lines:

The pandas library can also load large CSV files in chunkwise:

Copy link

Slowly commented Feb 17, 2020

Yes, you can use pandas read_json - IF it is heeeasssyy and sssiiiiiimple. But if you've got a very deeply nested json datasource that isn't a file and has lists mixed in with dictionaries - it won't work. For example, a graphql query result from a Tableau metadata API call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment