Skip to content

Instantly share code, notes, and snippets.

@kgjenkins
Last active July 18, 2016 15:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kgjenkins/7ba3af33bf6a6e5af85306f12c6d7574 to your computer and use it in GitHub Desktop.
Save kgjenkins/7ba3af33bf6a6e5af85306f12c6d7574 to your computer and use it in GitHub Desktop.
Intro to CartoDB workshop at Mann Library, 2016-04-14

CartoDB

What is CartoDB? It's a popular online tool for creating dynamic maps and analyzing spatial data. CartoDB includes:

  • a simple interface for creating a variety of beautiful maps based upon your data. Share links directly to your maps, or embed them in a webpage, or export static map images.
  • a powerful PostGIS spatial database for processing and analyzing data.
  • an infrastructure for creating more complex maps using custom styles (using CartoCSS), custom popups (using HTML), custom map projections (using PostGIS SQL), and custom interactions (using cartodb.js)

Cornell-affiliated CartoDB accounts

Anyone can sign up for a free CartoDB account, but the free plan is limited to 250MB of data, and any data tables or maps you make are accessible to the public.

Current Cornell students, staff, and faculty with a valid cornell.edu can sign up for a Cornell-affiliated account, which offers private tables and maps (which can be shared with specific Cornell users), more data storage, and data syncing (details below).

Data sources

CartoDB handles many different geospatial data formats, including shapefiles, KML/KMZ, GeoJSON, GPX, CSV, and Excel.

Let's start with a common scenario -- we have an Excel file that contains data for various US states, and want to create a map from that data.

Open the file in Excel and take a look. Notice that the only location information is the name of each state.

  • From your CartoDB dashboard, go to "Datasets" and click the "New Dataset" button.

  • Click "Browse" to select the excel file, or just drag it onto the CartoDB webpage.

  • Click the "Connect Dataset" button at the bottom. Give it a moment, and then you should see the data table.

    Notice the following:

    • The table name at the top-left is derived from the filename (with any non-alphanumeric characters replaced by underscores).
    • New new columns have been added: cartodb_id and the_geom (which is currently null).
    • The original columns have been alphabetized.
    • Numeric values are green.
  • In the top-left, click "state_data" and change the table name to "states"

Georeferencing data (a.k.a. Geocoding)

  • Click the state column header > Georeference...

    CartoDB can automatic calculate point or polygon locations based on latitude/longitude columns, city names, region names, postal codes, IP addresses, and street addresses. The state names that we have are considered regions.

  • Click "Admin. Regions"

  • Set the Region Name: state

  • Set the Country: type "USA" in the free text input

  • Click the "Georeference your data with administrative regions" button (some other types of geocoding offer point or polygon output)

    When it completes, the column called the_geom should now say "Polygon", and there will also be a new column called cartodb_georef_status. (This new column can be helpful if you need to fix and records and re-do the georeferencing.)

Now that we have polygons, let's see them on a map!

  • Click the "Map View" button at the top. You should see the states in their default orange color.

Map Wizards

CartoDB's map "wizards" let you quickly configure your map for some common map-based data visualizations. To access the wizards, click the icon on the right that looks like a paintbrush. Explore the various wizards available for polygon data:

  • Simple - single color and border for all points, with optional labels

    Some terminology is in order here...

    • "Polygon Fill" is the color and opacity of the interior of each polygon.
    • "Polygon Stroke" is the width, color, and opacity of each polygon border.
    • "Composite operation" is useful when multiple features overlap (which doesn't happen with our states).
    • "Label Text" let's you create labels based on the values from a column. Once selected, you'll see many other label options.
  • Choropleth - set polygon color based upon a numeric column

  • Category - change polygon color based upon a text column

  • Bubble - replace each polygon with a circle, with size based upon a numeric column

We will now explore these other features:

  • Basemaps -- the reference map that appears behind your data
  • Infowindows -- pop-up windows that display data about the feature that was clicked or hovered over
  • Filters -- view a subset of data
  • Legends -- customize the key to the map styles
  • CSS -- customize the map styles that were generated by the wizards
  • SQL -- perform more advanced filtering, sorting, analysis, and manipulation
  • Privacy/sharing -- control access to your data or map

Multi-layer maps

So far, we've only been looking at a single map layer. Actually, we've just been looking at the map view of single data table.

To add more layers, we first need to create what CartoDB calls a "map", which can include multiple data tables. We will add points representing recent earthquakes, using data from a live feed of earthquake data from the USGS. Instead of downloading the data, we can link to it directly from CartoDB.

  • Go to http://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php

  • In the bottom-right, look at the links under "Past 30 Days".

  • Right-click the "All Earthquakes" link > Copy link address

  • Go back to your map view of the states, and click the blue "+" at the top-right.

    You'll get a notice that a map is required to add layers.

  • Click "OK, Create Map"

    You'll see a list of all the datasets you've used so far. But we want to add the USGS Earthquake data feed.

  • Click "Connect dataset" (at the top)

  • Paste the URL you just copied into the input on the right, and click "Submit"

    The Cornell-affiliated CartoDB accounts have the ability to sync remote datasets every hour, day, week, or month. Sync'd datasets are not editable, as the old data will be overwritten with each update.

  • For now, leave the sync setting at "Never", and click "Add Layer" at the bottom-right.

    Once the data loads, notice that there are a lot of columns, perhaps the most interesting being depth, mag, place, time, and type. Notice also that the the_geom column has automatically been set to the point coordinates contained in the latitude and longitude columns.

  • Switch to the "Map View".

    At this point, you should see the earthquake point on top of the states map. Since earthquakes are completely independent from state electoral votes, let's change the style of the states data so that we just have simple outlines of the states. If you look along the right side of the map, there are now two different layers -- look for the "1" and the "2".

  • Click the "1" to select the first layer (which should be the states).

  • Select the "Simple" wizard and set the following:

    • Polygon Fill opacity = 0
    • Polygon Stroke = width 0.5, black, opacity 1
  • Click the "2" to select the second layer (which should be the earthquake points).

  • Explore the additional wizards that are available for points:

    • Cluster - replaces clusters of nearby points with a larger symbol and the number of separate points in the cluster
    • Torque - animated map based on time/date column
    • Heatmap - point density visualized by a spectrum of colors, useful for many overlapping points
    • Torque Cat - animated map with categories
    • Intensity - point density visualized by color intensity, useful for many overlapping points
    • Density - point counts within square or hexagonal bins, useful for many overlapping points

Cross-layer analysis

CartoDB is built upon PostGIS, a powerful spatial extension of the PostgreSQL database. CartoDB gives us full access to PostGIS via the "SQL" tab for each layer.

As an example, we'll use an SQL query to calculate the number of earthquakes per state.

  • Click "1" to switch back to the "states" layer.
  • Switch to the "Data View" (at top of page)
  • Click the "SQL" tab, and enter the following query:
SELECT
  s.state,
  count(*) AS eqs
FROM
  states AS s,
  all_month AS e
WHERE ST_Contains(s.the_geom, e.the_geom)
GROUP BY s.cartodb_id
ORDER BY eqs DESC

To display the results on a map, we also need to include the the_geom_webmercator from the states layer, and also the cartodb_id in order to enable map interactivity:

SELECT
  s.state,
  count(*) AS eqs,
  s.the_geom_webmercator,
  s.cartodb_id
FROM
  states AS s,
  all_month AS e
WHERE ST_Contains(s.the_geom, e.the_geom)
GROUP BY s.cartodb_id
ORDER BY eqs DESC

This is just a single example, but there is almost no limit to what can be done with spatial SQL.

For a complete list of PostGIS spatial functions, see http://www.postgis.org/docs/reference.html For other text and numeric SQL functions, see http://www.postgresql.org/docs/current/static/functions.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment