Average time results (ms):
Method | Actual Avg | Planning Avg | Execution Avg |
---|---|---|---|
A | 11800.092 | 0.0694 | 11800.295 |
B | 12450.2705 | 0.0852 | 12450.4355 |
C | 11962.5443 | 0.074 | 11962.6857 |
There are several methods for building GeoJSON directly in the database. Each
method has it's own pros and cons, so I thought it would be useful to see an
example of each method and try to provide some sort of initial benchmarking. The
table was built as a subset (just MA counties statefp='25'
) from a larger table
that contained all US counties.
$ psql -c "SELECT version();"
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
(1 row)
$ psql -c "SELECT PostGIS_full_version();"
postgis_full_version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.9.4" LIBJSON="0.12.1" RASTER
(1 row)
This creates a function that will take the number of loops and the sql query as
arguments.
You will want to change the search_path
if you use a different schema.
SET search_path=public;
CREATE FUNCTION f_test(ct int, sql text) RETURNS void AS
$func$
DECLARE
i int;
BEGIN
FOR i IN 1 .. $1 LOOP
EXECUTE sql; -- not safe against SQLi!
END LOOP;
END
$func$ LANGUAGE plpgsql
Method A Pros:
- Easiest to implement complex queries
- Easiest to set column names in output
- Fastest query of the three methods (by a hair)
Method A Cons:
- Longest/verbose query
EXPLAIN ANALYZE
SELECT f_test(100,
$x$WITH counties AS
(SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon, the_geom FROM public.ma_counties ),
gis_data AS (
SELECT
json_build_object(
'type' , 'FeatureCollection',
'features', json_agg(
json_build_object(
'type' , 'Feature',
'geometry' , ST_AsGeoJSON(the_geom)::json,
'properties', json_build_object(
'gid', gid,
'statefp', statefp,
'countyfp', countyfp,
'countyns', countyns,
'cntyidfp', cntyidfp,
'name', name,
'namelsad', namelsad,
'lsad', lsad,
'classfp', classfp,
'mtfcc', mtfcc,
'csafp', csafp,
'cbsafp', cbsafp,
'metdivfp', metdivfp,
'funcstat', funcstat,
'aland', aland,
'awater', awater,
'intptlat', intptlat,
'intptlon', intptlon
)
)
)
) AS json_data
FROM counties
)
SELECT
json_data::text
FROM gis_data$x$)
Method B Pros:
- Shorter/less verbose query than Method A
Method B Cons:
- Harder than Method A to implement complex query
- Slowest query of the three methods due to self-join
EXPLAIN ANALYZE
SELECT f_test(100,
$x$SELECT row_to_json(fc)
FROM ( SELECT 'FeatureCollection' As type, array_to_json(array_agg(f)) As features
FROM (SELECT 'Feature' As type
, ST_AsGeoJSON(lg.the_geom)::json As geometry
, row_to_json(lp) As properties
FROM public.ma_counties As lg
INNER JOIN (SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon FROM public.ma_counties ) As lp
ON lg.countyfp = lp.countyfp ) As f ) As fc ;$x$)
Method C Pros:
- Second fasest query (basically as fast as Method A)
- Shortest/least verbose query
- Avoids self-join by using a nested subselect
Method C Cons:
- Harder than Method A to implement complex query
EXPLAIN ANALYZE
SELECT f_test(100,
$x$SELECT row_to_json(fc)
FROM ( SELECT 'FeatureCollection' As type, array_to_json(array_agg(f)) As features
FROM (SELECT 'Feature' As type
, ST_AsGeoJSON(lg.the_geom)::json As geometry
, row_to_json((SELECT l FROM (SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon) As l
)) As properties
FROM public.ma_counties As lg ) As f ) As fc;$x$)
Here are the results (ms). Each query was set for 100 loops and was run 10 times to provide an average that makes sure caching is taken into account.
Method A | Actual | Planning | Execution |
---|---|---|---|
1 | 12451.658 | 0.067 | 12452.036 |
2 | 11786.96 | 0.067 | 11787.132 |
3 | 11636.196 | 0.073 | 11636.367 |
4 | 12423.064 | 0.045 | 12423.236 |
5 | 11680.952 | 0.081 | 11681.107 |
6 | 11728.312 | 0.073 | 11728.481 |
7 | 11497.578 | 0.068 | 11497.782 |
8 | 11550.097 | 0.073 | 11550.287 |
9 | 11630.386 | 0.074 | 11630.602 |
10 | 11615.717 | 0.073 | 11615.92 |
Avg | 11800.092 | 0.0694 | 11800.295 |
Method B | Actual | Planning | Execution |
---|---|---|---|
1 | 12087.377 | 0.11 | 12087.605 |
2 | 12490.38 | 0.083 | 12490.536 |
3 | 11733.517 | 0.079 | 11733.673 |
4 | 12488.525 | 0.076 | 12488.714 |
5 | 12329.816 | 0.069 | 12329.963 |
6 | 12759.869 | 0.078 | 12760.013 |
7 | 13041.588 | 0.079 | 13041.756 |
8 | 13044.637 | 0.083 | 13044.794 |
9 | 12318.512 | 0.069 | 12318.673 |
10 | 12208.484 | 0.126 | 12208.628 |
Avg | 12450.2705 | 0.0852 | 12450.4355 |
Method C | Actual | Planning | Execution |
---|---|---|---|
1 | 12029.017 | 0.076 | 12029.306 |
2 | 12176.148 | 0.066 | 12176.276 |
3 | 11607.342 | 0.075 | 11607.463 |
4 | 12070.27 | 0.074 | 12070.393 |
5 | 11944.737 | 0.077 | 11944.861 |
6 | 11736.442 | 0.074 | 11736.595 |
7 | 12217.075 | 0.074 | 12217.179 |
8 | 12106.019 | 0.074 | 12106.151 |
9 | 11877.754 | 0.041 | 11877.869 |
10 | 11860.639 | 0.109 | 11860.764 |
Avg | 11962.5443 | 0.074 | 11962.6857 |