Feb 7, 2017
Andy Eschbacher, firstname.lastname@example.org, @MrEPhysics
- Getting Python env setup
- Clone CartoCamp workshop repo
- Setup environment
- Contents of requirements.txt
- Going further!
What is CARTO?
- It depends on how you use it ;)
- Geo data matters
Maps I like
- Urban Reviewer
- Tweets mentioning sunrise
- Carto Calculator
- Evictions in San Francisco
- L Train project
- Other projects
- NYC Henge
Data Science projects
- Animated maps to compliment analysis: http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/
- Our L Train analysis: https://carto.com/blog/looking-at-the-l/
- Ben Wellington:
Demo analysis workflow
Importing the data
First we need to sync a new dataset. Search for "usgs earthquake spreadsheet" to find the page (here's the link), or use the following dataset of all recorded seismic events from the past 30 days:
This is updated every 15 minutes, so it's a great candidate for making a map which updates based on measurements in the world. Let's import that into our accounts. Even cooler: the map we publish will be updated automatically, and the underlying analysis changes dynamically based on the new points added to the dataset.
Building the analysis workflow
To build our analysis workflow, let's focus on the output we would like.
Visualize significant, recent earthquake events by communicating the total population who are living within 50 miles of the epicenter.
We can build this analysis workflow in the Builder by chaining together some of the tools.
- We need to restrict our analysis to California (within 50 miles of the borders)
- Filter to only show significant events
- Find the number of people living with 50 miles of the earthquake events
- Visualize the points by the number of people (proportional symbol map)
There are many ways to get the boundary of California. One ways is just grab all the states from Natural Earth Data. Another ways is to grab it from my CARTO account:
Copy that link, and import it just like you did for the earthquake dataset above. Finally, we can do some Analysis with the Builder to get it into the state that we're interested in (50 miles outside of current borders).
Finding Earthquakes that happened in or around California
We can again use the Builder to find all earthquakes which intersect California or are within 50 miles of its border by using the "Aggregate intersection" method.
Filtering out non-significant events
Using the "Filter by column value" analysis method, we can surface only the significant earthquakes (by magnitude).
Finding 50 mile buffers
Using the "Create areas of influence" tool, we can get the regions that are within a 50 mile distance of the epicenter of an earthquake.
Enrich with the Data Observatory
This analysis allows us to attach valuable information from sources such as the US Census. For our demo we will use Total Population that are within 50 miles of the epicenter of the earthquakes.
Visualize by the number of people
CARTO's Data Science toolkit
- SQL API (write sql, get data via HTTP requests)
- Data Services API
- Geocoding, routing, etc.
- Nice demos here: https://carto.com/location-data-services/
- Data Observatory API
- Functions for getting measures (World censuses, Zillow, etc.)
- Functions for getting boundaries
- Catalog of available data: https://cartodb.github.io/bigmetadata/index.html
- Find things like median income at our location:
SELECT OBS_GetMeasure(CDB_LatLng(40.7529671,-73.9927947), 'us.census.acs.B19013001'::text)
- Explain Subway stops: https://jkrauss.carto.com/builder/18644096-65c3-11e6-adbe-0ee66e2c9693/embed
- Maps API
- For fetching map tiles for use with Google Maps API, Leaflet.js, OpenLayers, etc.
- Generating dynamically made image previews (aka, static maps)
- For example: http://mamataakella.carto.com/api/v1/map/static/named/tpl_25aaad68_7134_11e6_afae_0e3ebc282e83/1600/1600.png?zoom=5&lat=40.7529777&lon=-73.994722
- Import API
- Send data to your CARTO account
- Setup sync tables (responsive to upstream changes to data so you can create dynamic maps)
- Spatial Analysis
- geo-statistics using PySAL, scikit-learn, and others
- tons more coming soon
- Carto.js -- workhorse library for building custom applications off of CARTO's platform
- Torque.js -- create spatio-temporal maps from your time-series data
Programmatic access / R and Python workflows
- r2cartodb: https://rpubs.com/walkerke/r2cartodb
- CARTO's Python Client: https://github.com/CartoDB/carto-python
- I'm trying to figure out how to build of of Python pandas
- See some of my experiments here: https://github.com/ohasselblad/cartopandas
read_csvwork with our SQL API when the
formatflag is set to
http://eschbacher.carto.com/api/v2/sql?q=SELECT * FROM awesome_table&format=csv
- Use with Plotly: https://plot.ly/ipython-notebooks/cartodb/
- DC Elevation App -- see index.html on this gist
- We made this at a little meetup I run last week
Python workflow with pandas + CARTO APIs
I run a Meetup at CARTO's office called CartoCamp (please drop in sometime :)). We have some Open Office Hours on Friday from 2:30 - 4:30. We're going to do a small part of our last CartoCamp workshop on working with CARTO from a Jupyter notebook.
Checkout our repo, which has some past workshops and experiments (which may be broken :P).
Getting Python environment setup
I like working with virtual environments so we all have the same development environment.
If you already have all the dependencies, don't worry about setting this up (e.g., if you run Anaconda or do
sudo pip install ... for all your Python packages).
Clone CartoCamp workshop repo
Clone the CartoCamp workshop repo and move into the directory for today's workshops.
$ git clone https://github.com/CartoCamp/workshops.git $ cd workshops/2017-01-27-data-sci-maps
If you don't have
virtualenv get it with
pip install virtualenv.
$ virtualenv venv $ source venv/bin/activate (venv) $ pip install -r requirements.txt (venv) $ jupyter notebook
This last step should fire up the notebook in your browser. Open the
Data Science + Maps.ipynb
cartodb==0.8.1 jupyter==1.0.0 numpy==1.12.0 scipy==0.18.1 pandas=0.19.2 requests=2.13.0
Checkout the other notebooks that my colleague Danny and I wrote up for that workshop. Bonus points if you find bugs and PR to fix 'em ;)