Skip to content

Instantly share code, notes, and snippets.

@wboykinm
Last active December 17, 2015 07:59
Show Gist options
  • Save wboykinm/5576812 to your computer and use it in GitHub Desktop.
Save wboykinm/5576812 to your computer and use it in GitHub Desktop.
Mapbox, CartoDB and assessing the arbitrary.

I think I can narrow this down to a question:

What would it take to make arbitrary SQL performant?

You mentioned that Mapbox's tile-serving tools are built with speed, consistency and predictability in mind, so that things are reliably scalable. By contrast a tile-generating SQL API might contain everything from SELECT * FROM {{table-name}} to some monstrosity that returns a full geoprocess. So presumably the latter is placing more stress on

  1. The connection at API call
  2. The server as it converts the query into data and then into tiles
  3. The client side as it receives potentially-heavy tiles

So obviously something that a.) occupies more encoded bits at step one, b.) chews up more CPU at step 2, and c.) takes forever to return a bunch of images at step 3 is going to be a bad thing. But in reality there's both a continuum to the "badness" and a hypothetical upper limit to how bad it can get. And this is the case for Mapbox as well, even in the vector tile scheme. Here's a worst-case scenario and a best-case scenario:

21KB 3KB

So attribute 1 doesn't suffer more under the first tile call, but for attributes 2 and 3 (server and client delivery), tile #1 certainly must be at the upper level of a cost continuum that you've budgeted for.

So why is arbitrary SQL different? Attributes 1 and 3 are undeniably going to have a cost continuum between types of tile calls:

12KB 20KB

There's a big selective geoprocess in the former . . . and the latter is just some points: SELECT * FROM ag_survey_2011

Let's up the ante: here's the tile with a bunch of CartoCSS pushed through as well as SQL:

19KB

So the status of our three attributes on this last one is:

  1. The connection at API call is stressed; that's a huge string for a URL. But still not that hard for HTTP.
  2. The server as it converts the query into data and then into tiles is presumably coughing and wheezing under the burden of SQL processing and style compiling. But it still returns it super-fast.
  3. The client side as it receives potentially-heavy tiles is actually doing better, because in most cases applying SQL is actually going to simplify the delivered image. Maybe less so with additional CartoCSS, but again it's still downloading fast in "complex" form.

My point is that at least the attributes I can assess from the client side - 1 and 3 - have a pretty comparable cost continuum between Mapbox and CartoDB. And why not? it's a lot of the same architecture. So the real differences must be in attribute 2 - on the server side - where CartoDB is burning additional CPU to keep up with Mapbox speeds. And this is also where you mentioned that arbitrary SQL is at its most dangerous, potentially inconsistent and unpredictable at scale.

But it seems you've solved that problem for Mapbox Streets and Tilemill 2. Or at least you've narrowed the cost continuum to the point where it's acceptable to apportion the same resources to this tile: 11KB as to this tile: 2KB

PostGIS is not that complex in the scheme of things, and it seems like it should be possible to either measure its maximum cost or limit it in some way. At least to get it to within an acceptable continuum as you have with your product.

So am I being sophomoric, nuts or reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment