Skip to content

Instantly share code, notes, and snippets.

@andy-esch
Last active February 7, 2017 20:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andy-esch/9876e006a9b40db788a05b5824e07c41 to your computer and use it in GitHub Desktop.
Save andy-esch/9876e006a9b40db788a05b5824e07c41 to your computer and use it in GitHub Desktop.
NYC Data Science Academy -- Feb 7, 2017

CARTO

Feb 7, 2017

Andy Eschbacher, andy@carto.com, @MrEPhysics

Today

  1. What is CARTO?
  1. CARTO's Data Science toolkit
  1. Carto.js JavaScript library demos
  2. Data Science workflow

What is CARTO?

  • It depends on how you use it ;)
    • Builder -- Cloud-based Analysis and mapping platform
    • Engine -- APIs for building custom applications
  • Geo data matters

Maps I like

Data Science projects


Demo analysis workflow

Importing the data

First we need to sync a new dataset. Search for "usgs earthquake spreadsheet" to find the page (here's the link), or use the following dataset of all recorded seismic events from the past 30 days:

http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv

This is updated every 15 minutes, so it's a great candidate for making a map which updates based on measurements in the world. Let's import that into our accounts. Even cooler: the map we publish will be updated automatically, and the underlying analysis changes dynamically based on the new points added to the dataset.

Building the analysis workflow

To build our analysis workflow, let's focus on the output we would like.

Visualize significant, recent earthquake events by communicating the total population who are living within 50 miles of the epicenter.

We can build this analysis workflow in the Builder by chaining together some of the tools.

  1. We need to restrict our analysis to California (within 50 miles of the borders)
  2. Filter to only show significant events
  3. Find the number of people living with 50 miles of the earthquake events
  4. Visualize the points by the number of people (proportional symbol map)

Getting California

There are many ways to get the boundary of California. One ways is just grab all the states from Natural Earth Data. Another ways is to grab it from my CARTO account:

http://eschbacher.cartodb.com/api/v2/sql?q=SELECT+*+FROM+ne_50m_admin_1_states&format=shp&filename=us_states

Copy that link, and import it just like you did for the earthquake dataset above. Finally, we can do some Analysis with the Builder to get it into the state that we're interested in (50 miles outside of current borders).

Finding Earthquakes that happened in or around California

We can again use the Builder to find all earthquakes which intersect California or are within 50 miles of its border by using the "Aggregate intersection" method.

Filtering out non-significant events

Using the "Filter by column value" analysis method, we can surface only the significant earthquakes (by magnitude).

Finding 50 mile buffers

Using the "Create areas of influence" tool, we can get the regions that are within a 50 mile distance of the epicenter of an earthquake.

Enrich with the Data Observatory

This analysis allows us to attach valuable information from sources such as the US Census. For our demo we will use Total Population that are within 50 miles of the epicenter of the earthquakes.

Visualize by the number of people

Analysis Chain


CARTO's Data Science toolkit

Signing up

Our APIs

JavaScript Libraries

  • Carto.js -- workhorse library for building custom applications off of CARTO's platform
  • Torque.js -- create spatio-temporal maps from your time-series data

Programmatic access / R and Python workflows


Demos with JavaScript


Python workflow with pandas + CARTO APIs

I run a Meetup at CARTO's office called CartoCamp (please drop in sometime :)). We have some Open Office Hours on Friday from 2:30 - 4:30. We're going to do a small part of our last CartoCamp workshop on working with CARTO from a Jupyter notebook.

Checkout our repo, which has some past workshops and experiments (which may be broken :P).

Getting Python environment setup

I like working with virtual environments so we all have the same development environment.

If you already have all the dependencies, don't worry about setting this up (e.g., if you run Anaconda or do sudo pip install ... for all your Python packages).

Clone CartoCamp workshop repo

Clone the CartoCamp workshop repo and move into the directory for today's workshops.

$ git clone https://github.com/CartoCamp/workshops.git
$ cd workshops/2017-01-27-data-sci-maps

Setup environment

If you don't have virtualenv get it with pip install virtualenv.

$ virtualenv venv
$ source venv/bin/activate
(venv) $ pip install -r requirements.txt
(venv) $ jupyter notebook

This last step should fire up the notebook in your browser. Open the Data Science + Maps.ipynb

requirements.txt:

cartodb==0.8.1
jupyter==1.0.0
numpy==1.12.0
scipy==0.18.1
pandas=0.19.2
requests=2.13.0

Going further

Checkout the other notebooks that my colleague Danny and I wrote up for that workshop. Bonus points if you find bugs and PR to fix 'em ;)

Where to go next?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment