I think I can narrow this down to a question:
What would it take to make arbitrary SQL performant?
You mentioned that Mapbox's tile-serving tools are built with speed, consistency and predictability in mind, so that things are reliably scalable. By contrast a tile-generating SQL API might contain everything from SELECT * FROM {{table-name}}
to some monstrosity that returns a full geoprocess. So presumably the latter is placing more stress on
- The connection at API call
- The server as it converts the query into data and then into tiles
- The client side as it receives potentially-heavy tiles
So obviously something that a.) occupies more encoded bits at step one, b.) chews up more CPU at step 2, and c.) takes forever to return a bunch of images at step 3 is going to be a bad thing. But in reality there's both a continuum to the "badness" and a hypothetical upper limit to how bad it can get. And this is the case for Mapbox as well, even in the vector tile scheme. Here's a worst-case scenario and a best-case scenario:
So attribute 1 doesn't suffer more under the first tile call, but for attributes 2 and 3 (server and client delivery), tile #1 certainly must be at the upper level of a cost continuum that you've budgeted for.
So why is arbitrary SQL different? Attributes 1 and 3 are undeniably going to have a cost continuum between types of tile calls:
There's a big selective geoprocess in the former . . . and the latter is just some points: SELECT * FROM ag_survey_2011
Let's up the ante: here's the tile with a bunch of CartoCSS pushed through as well as SQL:
So the status of our three attributes on this last one is:
- The connection at API call is stressed; that's a huge string for a URL. But still not that hard for HTTP.
- The server as it converts the query into data and then into tiles is presumably coughing and wheezing under the burden of SQL processing and style compiling. But it still returns it super-fast.
- The client side as it receives potentially-heavy tiles is actually doing better, because in most cases applying SQL is actually going to simplify the delivered image. Maybe less so with additional CartoCSS, but again it's still downloading fast in "complex" form.
My point is that at least the attributes I can assess from the client side - 1 and 3 - have a pretty comparable cost continuum between Mapbox and CartoDB. And why not? it's a lot of the same architecture. So the real differences must be in attribute 2 - on the server side - where CartoDB is burning additional CPU to keep up with Mapbox speeds. And this is also where you mentioned that arbitrary SQL is at its most dangerous, potentially inconsistent and unpredictable at scale.
But it seems you've solved that problem for Mapbox Streets and Tilemill 2. Or at least you've narrowed the cost continuum to the point where it's acceptable to apportion the same resources to this tile: as to this tile:
PostGIS is not that complex in the scheme of things, and it seems like it should be possible to either measure its maximum cost or limit it in some way. At least to get it to within an acceptable continuum as you have with your product.
So am I being sophomoric, nuts or reasonable?