andy-esch/NYC Data Sci -- Feb 7, 2017.md

## NYC Data Sci -- Feb 7, 2017.md

      
    Raw
  

              NYC Data Sci -- Feb 7, 2017.md
            
          
    CARTO

Feb 7, 2017

Andy Eschbacher, andy@carto.com, @MrEPhysics
Link to this: http://bit.ly/nyc-data-sci-carto

Today


What is CARTO?


Maps I like
Data science projects
Demo analysis workflow


CARTO's Data Science toolkit


Signing up
Our APIs
Our JS libraries
Programmatic access


Carto.js JavaScript library demos
Data Science workflow


Getting Python env setup
Clone CartoCamp workshop repo
Setup environment
Contents of requirements.txt
Going further!

What is CARTO?


It depends on how you use it ;)

Builder -- Cloud-based Analysis and mapping platform
Engine -- APIs for building custom applications


Geo data matters

Maps I like


Urban Reviewer
Tweets mentioning sunrise
Carto Calculator
Evictions in San Francisco
L Train project
Other projects
NYC Henge

Data Science projects


Animated maps to compliment analysis:  http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/
Our L Train analysis: https://carto.com/blog/looking-at-the-l/
Ben Wellington:

Illegal Parking Ticketing


Demo analysis workflow

Importing the data

First we need to sync a new dataset. Search for "usgs earthquake spreadsheet" to find the page (here's the link), or use the following dataset of all recorded seismic events from the past 30 days:
http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv
This is updated every 15 minutes, so it's a great candidate for making a map which updates based on measurements in the world. Let's import that into our accounts. Even cooler: the map we publish will be updated automatically, and the underlying analysis changes dynamically based on the new points added to the dataset.
Building the analysis workflow

To build our analysis workflow, let's focus on the output we would like.

Visualize significant, recent earthquake events by communicating the total population who are living within 50 miles of the epicenter.

We can build this analysis workflow in the Builder by chaining together some of the tools.

We need to restrict our analysis to California (within 50 miles of the borders)
Filter to only show significant events
Find the number of people living with 50 miles of the earthquake events
Visualize the points by the number of people (proportional symbol map)

Getting California

There are many ways to get the boundary of California. One ways is just grab all the states from Natural Earth Data. Another ways is to grab it from my CARTO account:
http://eschbacher.cartodb.com/api/v2/sql?q=SELECT+*+FROM+ne_50m_admin_1_states&format=shp&filename=us_states

Copy that link, and import it just like you did for the earthquake dataset above. Finally, we can do some Analysis with the Builder to get it into the state that we're interested in (50 miles outside of current borders).
Finding Earthquakes that happened in or around California

We can again use the Builder to find all earthquakes which intersect California or are within 50 miles of its border by using the "Aggregate intersection" method.
Filtering out non-significant events

Using the "Filter by column value" analysis method, we can surface only the significant earthquakes (by magnitude).
Finding 50 mile buffers

Using the "Create areas of influence" tool, we can get the regions that are within a 50 mile distance of the epicenter of an earthquake.
Enrich with the Data Observatory

This analysis allows us to attach valuable information from sources such as the US Census. For our demo we will use Total Population that are within 50 miles of the epicenter of the earthquakes.
Visualize by the number of people


CARTO's Data Science toolkit

Signing up


Go to https://nycdatascience.carto.com/signup

Our APIs


SQL API (write sql, get data via HTTP requests)
Data Services API

Geocoding, routing, etc.
Nice demos here: https://carto.com/location-data-services/


Data Observatory API

Functions for getting measures (World censuses, Zillow, etc.)
Functions for getting boundaries
Catalog of available data: https://cartodb.github.io/bigmetadata/index.html
Find things like median income at our location: SELECT OBS_GetMeasure(CDB_LatLng(40.7529671,-73.9927947), 'us.census.acs.B19013001'::text)
Explain Subway stops: https://jkrauss.carto.com/builder/18644096-65c3-11e6-adbe-0ee66e2c9693/embed


Maps API

For fetching map tiles for use with Google Maps API, Leaflet.js, OpenLayers, etc.
Generating dynamically made image previews (aka, static maps)
For example: http://mamataakella.carto.com/api/v1/map/static/named/tpl_25aaad68_7134_11e6_afae_0e3ebc282e83/1600/1600.png?zoom=5&lat=40.7529777&lon=-73.994722


Import API

Send data to your CARTO account
Setup sync tables (responsive to upstream changes to data so you can create dynamic maps)


Spatial Analysis

geo-statistics using PySAL, scikit-learn, and others
tons more coming soon


JavaScript Libraries


Carto.js -- workhorse library for building custom applications off of CARTO's platform
Torque.js -- create spatio-temporal maps from your time-series data

Programmatic access / R and Python workflows


r2cartodb: https://rpubs.com/walkerke/r2cartodb
CARTO's Python Client: https://github.com/CartoDB/carto-python
I'm trying to figure out how to build of of Python pandas

See some of my experiments here: https://github.com/ohasselblad/cartopandas


R's read.csv and Panda's read_csv work with our SQL API when the format flag is set to csv

Format: http://eschbacher.carto.com/api/v2/sql?q=SELECT * FROM awesome_table&format=csv
Use with Plotly: https://plot.ly/ipython-notebooks/cartodb/


Demos with JavaScript


DC Elevation App -- see index.html on this gist
Charts!

We made this at a little meetup I run last week


Python workflow with pandas + CARTO APIs

I run a Meetup at CARTO's office called CartoCamp (please drop in sometime :)). We have some Open Office Hours on Friday from 2:30 - 4:30. We're going to do a small part of our last CartoCamp workshop on working with CARTO from a Jupyter notebook.
Checkout our repo, which has some past workshops and experiments (which may be broken :P).
Getting Python environment setup

I like working with virtual environments so we all have the same development environment.
If you already have all the dependencies, don't worry about setting this up (e.g., if you run Anaconda or do sudo pip install ... for all your Python packages).
Clone CartoCamp workshop repo

Clone the CartoCamp workshop repo and move into the directory for today's workshops.
$ git clone https://github.com/CartoCamp/workshops.git
$ cd workshops/2017-01-27-data-sci-maps
Setup environment

If you don't have virtualenv get it with pip install virtualenv.
$ virtualenv venv
$ source venv/bin/activate
(venv) $ pip install -r requirements.txt
(venv) $ jupyter notebook
This last step should fire up the notebook in your browser. Open the Data Science + Maps.ipynb
requirements.txt:

cartodb==0.8.1
jupyter==1.0.0
numpy==1.12.0
scipy==0.18.1
pandas=0.19.2
requests=2.13.0

Going further

Checkout the other notebooks that my colleague Danny and I wrote up for that workshop. Bonus points if you find bugs and PR to fix 'em ;)
Where to go next?


Checkout some of the other projects in that workshops repo (e.g., charts + maps is pretty cool)
Builder materials: https://carto.com/learn/guides
CartoCamps!
CARTO's API docs