Skip to content

Instantly share code, notes, and snippets.

@andy-esch
Last active October 9, 2015 22:28
Embed
What would you like to do?
Stanford CartoDB Workshop 3/9/2015

Introduction to CartoDB

March 9th, 2015

Andy Eschbacher, Map Scientist, CartoDB

@MrEPhysics | eschbacher@cartodb.com

Find this write up here: bit.ly/cdb-stanford

The write up pulls from a gist hosted in my GitHub account.

Goals for the workshop

  • Quickly and easily make meaningful maps from data in minutes
  • Show the breadth of data analysis available to you
  • Analyze geospatial data to explore a dataset on the Afghan presidential elections of 2014

Outline:

  1. Introduce CartoDB (20)
  2. Setting up accounts (5)
  3. Data import (10)
  4. Walk through of the interface, different types of visualizations (20)
  5. Custom SQL to get new information from data set to show possible election fraud (10)
  6. Advanced visualizations using CartoCSS (cut & paste): category + bubble to visualize two variables (10)
  7. Sharing visualizations (10)
  8. Resources (5)

Getting you all set up

Account sign up here: https://cartodb.com/signup?plan=academy

The Data

The data we will be using today comes from a group that looked at voting irregularities in the Afghan presidential elections of 2014 (general and then runoff).

I read about this data from a blog titled Electoral Integrity Project.

Import data into your account

Copy the following URL and paste it into the table import box:

http://eschbacher.cartodb.com/api/v2/sql?q=SELECT%20*%20FROM%20afghanistan_runoff&format=geojson&filename=afghanistan_runoff

import dialog

Make sure that the name of your data table is: afghanistan_runoff.

You can rename it to this by double-clicking the table name in the upper left hand corner and then entering a new one.

change table name

Map View

Click over to MAP VIEW to see your data visualized.

Afghan Data Visualized

SQL Schema

Switch back over to the DATA VIEW to see how your data is represented in the database.

Table Header

Data tables are structured similar to what you would see in an Excel spreadsheet: columns with headers, rows with unique identifiers. Cells contain data: booleans (true/false), numbers, strings of characters, and geometries (geographical information). The lack of a value is called null and is a valid entry in any cell.

You can easily change the data type by clicking on the column header's data type.

Columns to notice:

  1. cartodb_id is a unique identifier for each row, important for click events (interactivity)
  2. the_geom contains the geographic information as lat/long pairs, but other formats can be linestrings or polygons
  3. besides the_geom, other columns are either numbers, strings, or dates

Exploring the Data with Different Visualizations

  • Choropleth on a number column

Imgur

  • Bubble by number of voters (marker size is proportional to row variable chosen)

Imgur

  • Heatmap on location

Imgur

Explore the Data Using Filters

We can easily explore our data by applying filters to each column.

Notice that filters work on limiting the number of rows. We'll discuss later how to reduce or add columns.

Let's apply some of the filters to look at some of the extremes of our number data:

  1. How many total votes per district? Filters will produce a histogram for us.
  2. Show polling locations where abdullah got between 0 and 10 votes
  3. Show polling locations where ghani got between 590 and 600 votes

Imgur

How filters work

Click over to the SQL tab to see that filters are applying SQL to the data table. Custom SQL is much more customizable than using filters, and there is geospatial functionality to it called PostGIS that we'll dabble in a little later.

Imgur

Infowindows and hovers

With our reduced data set, it's easier to investigate individual points on the map. We can enable infowindows and hover boxes by clicking on the interactivity tab below the visualization wizard tab.

You can easily select the data that you want displayed.

Imgur

Creating Data on the Fly

CartoDB is built with a database that uses SQL -- a powerful language that lets you ask questions of your data to get specific answers.

Using SQL, you can slice and dice our table however you want it by creating new rows and columns along the way to extend our analysis.

Let's make a bubble map on the percentage of win.

Copy and paste the following SQL into your editor and hit Apply Query:

SELECT
  ghani / total AS ghani_perc,
  abdullah / total AS abdullah_perc,
  the_geom_webmercator,
  cartodb_id
FROM
  afghanistan_runoff

A few things to notice: we are creating two new columns (ghani_perc and abdullah_perc), and selecting two existing columns, the_geom_webmercator and cartodb_id. the_geom_webmercator is a special column that results in the actual plotting of data -- it is a projection of the_geom into the Mercator projection. Whenever you do SQL and want your data to appear for the temporary table, make sure to select the_geom_webmercator

If you look at your your data table you'll see that you only have columns that you created.

Checking out your map, go to "Bubble" in the Visualization Wizard and select ghani in the choropleth column. Your map now shows the data visualized according to the percent of Ghani's vote (in the range 0% to 100%). But maybe we can do better visualizing this as we don't know without switching columns whether Ghani won a polling station or Abdullah did...

Choropleth on Relative Difference in Votes

Finding the relative difference can help us solve this problem. This way our calculation gives us a number in the range -100 to 100 where positive is Abdullah and negative is Ghani.

SELECT
  100 * (abdullah - ghani) / total AS vote_perc_diff,
  the_geom_webmercator,
  cartodb_id
FROM
  afghanistan_runoff

Now we can make a choropleth on vote_diff to show the areas that swing towards one candidate or the other. We can use a choropleth color ramp that has a divergent color scheme, like red-tan-blue, to visualize the markers to show a geographical distribution of voter preference.

One problem with this map is the information overload. We can handle this problem by filtering for only the data that we're interested in. On the blog mentioned before, they look for polling places where candidates get 95% of the vote or more. We can alter our SQL to give filter for this:

SELECT
  100 * (abdullah - ghani) / total AS vote_perc_diff,
  the_geom_webmercator,
  cartodb_id
FROM
  afghanistan_runoff
WHERE
  abdullah / total >= 0.95 OR ghani / total >= 0.95

This produces a very limited set of values so we need to alter our CartoCSS to accommodate it:

In the CSS tab, delete all the text there and replace it with this:

#afghanistan_runoff{
  marker-fill-opacity: 0.8;
  marker-line-color: #FFF;
  marker-line-width: 0.5;
  marker-line-opacity: 1;
  marker-width: 10;
  marker-fill: #0080ff;
  marker-allow-overlap: true;
}
#afghanistan_runoff [ vote_perc_diff <= 100] {
   marker-fill: #ff4d4d;
}
#afghanistan_runoff [ vote_perc_diff <= -95] {
   marker-fill: #0080ff;
}

Advanced SQL and CartoCSS

We can do better, though! The previous map only showed us who won each district -- what if we want to visualize how significant each win is. We could still color code according to the candidate, but then size our marker according to the size of the win in votes.

First we need to find out the winner in each district, then we need to filter by percent win.

SELECT
  ghani,
  abdullah,
  CASE WHEN ghani > abdullah THEN 'ghani' ELSE 'abdullah' END AS winner,
  abs(ghani - abdullah) AS vote_diff,
  the_geom_webmercator,
  cartodb_id
FROM
  afghanistan_runoff
WHERE
  abdullah / total >= 0.95 OR ghani / total >= 0.95

Let's use a bubble visualization:

#afghanistan_runoff{
  marker-fill-opacity: 0.9;
  marker-line-color: #FFF;
  marker-line-width: 1.5;
  marker-line-opacity: 1;
  marker-placement: point;
  marker-multi-policy: largest;
  marker-type: ellipse;
  marker-fill: #FF5C00;
  marker-allow-overlap: true;
  marker-clip: false;
}
#afghanistan_runoff [ vote_diff <= 600] {
   marker-width: 23.3;
}
#afghanistan_runoff [ vote_diff <= 599.5] {
   marker-width: 21.7;
}
#afghanistan_runoff [ vote_diff <= 598.5] {
   marker-width: 20.0;
}
#afghanistan_runoff [ vote_diff <= 596] {
   marker-width: 18.3;
}
#afghanistan_runoff [ vote_diff <= 591.5] {
   marker-width: 16.7;
}
#afghanistan_runoff [ vote_diff <= 582.71875] {
   marker-width: 15.0;
}
#afghanistan_runoff [ vote_diff <= 557] {
   marker-width: 13.3;
}
#afghanistan_runoff [ vote_diff <= 487.6875] {
   marker-width: 11.7;
}
#afghanistan_runoff [ vote_diff <= 272.6418604651163] {
   marker-width: 10.0;
}

And then categorize by the winner to give different colors. Paste this below the CartoCSS you just put in. We're applying multiple visualizations at once.

#afghanistan_runoff[winner="ghani"] {
   marker-fill: #F84F40;
}
#afghanistan_runoff[winner="abdullah"] {
   marker-fill: #1F78B4;
}

Great! This produces a map that has a rich variety of detail about the election. Let's toggle on a Legend by clicking on the Legend tab, and customizing it:

Custom HTML for the legend

Produces interesting results: Election results

Creating a visualization and sharing it

So far, we've been working with a datatable in a sandbox. We can create visualizations that are saved as maps that you can come back to edit and share.

Creating a visualization

You have three main sharing options:

Sharing your visualization

Resources

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment