Skip to content

Instantly share code, notes, and snippets.

@dmofot
Last active March 5, 2022 05:25
Show Gist options
  • Save dmofot/a382ea40b798cd18ea4b45eb62ecf97f to your computer and use it in GitHub Desktop.
Save dmofot/a382ea40b798cd18ea4b45eb62ecf97f to your computer and use it in GitHub Desktop.
Generating GeoJSON in Postgres/PostGIS

Table of Contents

TL;DR

Average time results (ms):

Method Actual Avg Planning Avg Execution Avg
A 11800.092 0.0694 11800.295
B 12450.2705 0.0852 12450.4355
C 11962.5443 0.074 11962.6857

Teal Deer

General

There are several methods for building GeoJSON directly in the database. Each method has it's own pros and cons, so I thought it would be useful to see an example of each method and try to provide some sort of initial benchmarking. The table was built as a subset (just MA counties statefp='25') from a larger table that contained all US counties.

Method Information:

Benchmarking Information:

Postgres/PostGIS Information:

$ psql -c "SELECT version();"
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
(1 row)

$ psql -c "SELECT PostGIS_full_version();"
                                                                         postgis_full_version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.9.4" LIBJSON="0.12.1" RASTER
(1 row)

Create Benchmark Function

This creates a function that will take the number of loops and the sql query as arguments. You will want to change the search_path if you use a different schema.

SET search_path=public;
CREATE FUNCTION f_test(ct int, sql text) RETURNS void AS
$func$
DECLARE
   i int;
BEGIN

FOR i IN 1 .. $1 LOOP
    EXECUTE sql;  -- not safe against SQLi!
END LOOP;

END
$func$ LANGUAGE plpgsql

Tests

Method A Test

Method A Pros:

  • Easiest to implement complex queries
  • Easiest to set column names in output
  • Fastest query of the three methods (by a hair)

Method A Cons:

  • Longest/verbose query
EXPLAIN ANALYZE
SELECT f_test(100, 
$x$WITH counties AS 
    (SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon, the_geom FROM public.ma_counties ), 
gis_data AS ( 
    SELECT 
    json_build_object( 
        'type'    , 'FeatureCollection', 
        'features', json_agg( 
            json_build_object( 
                'type'      , 'Feature', 
                'geometry'  , ST_AsGeoJSON(the_geom)::json, 
                'properties', json_build_object( 
                    'gid', gid,
                    'statefp', statefp,
                    'countyfp', countyfp,
                    'countyns', countyns,
                    'cntyidfp', cntyidfp,
                    'name', name,
                    'namelsad', namelsad,
                    'lsad', lsad,
                    'classfp', classfp,
                    'mtfcc', mtfcc,
                    'csafp', csafp,
                    'cbsafp', cbsafp,
                    'metdivfp', metdivfp,
                    'funcstat', funcstat,
                    'aland', aland,
                    'awater', awater,
                    'intptlat', intptlat,
                    'intptlon', intptlon
                ) 
            ) 
        ) 
    ) AS json_data 
    FROM counties 
) 
 
SELECT 
json_data::text 
FROM gis_data$x$)

Method B Test

Method B Pros:

  • Shorter/less verbose query than Method A

Method B Cons:

  • Harder than Method A to implement complex query
  • Slowest query of the three methods due to self-join
EXPLAIN ANALYZE
SELECT f_test(100, 
$x$SELECT row_to_json(fc)
 FROM ( SELECT 'FeatureCollection' As type, array_to_json(array_agg(f)) As features 
 FROM (SELECT 'Feature' As type
    , ST_AsGeoJSON(lg.the_geom)::json As geometry
    , row_to_json(lp) As properties
   FROM public.ma_counties As lg
         INNER JOIN (SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon FROM public.ma_counties ) As lp 
       ON lg.countyfp = lp.countyfp ) As f ) As fc ;$x$)

Method C Test

Method C Pros:

  • Second fasest query (basically as fast as Method A)
  • Shortest/least verbose query
  • Avoids self-join by using a nested subselect

Method C Cons:

  • Harder than Method A to implement complex query
EXPLAIN ANALYZE
SELECT f_test(100, 
$x$SELECT row_to_json(fc)
 FROM ( SELECT 'FeatureCollection' As type, array_to_json(array_agg(f)) As features
 FROM (SELECT 'Feature' As type
    , ST_AsGeoJSON(lg.the_geom)::json As geometry
    , row_to_json((SELECT l FROM (SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon) As l
      )) As properties
   FROM public.ma_counties As lg ) As f ) As fc;$x$)

Results

Here are the results (ms). Each query was set for 100 loops and was run 10 times to provide an average that makes sure caching is taken into account.

Method A Results

Method A Actual Planning Execution
1 12451.658 0.067 12452.036
2 11786.96 0.067 11787.132
3 11636.196 0.073 11636.367
4 12423.064 0.045 12423.236
5 11680.952 0.081 11681.107
6 11728.312 0.073 11728.481
7 11497.578 0.068 11497.782
8 11550.097 0.073 11550.287
9 11630.386 0.074 11630.602
10 11615.717 0.073 11615.92
Avg 11800.092 0.0694 11800.295

Method B Results

Method B Actual Planning Execution
1 12087.377 0.11 12087.605
2 12490.38 0.083 12490.536
3 11733.517 0.079 11733.673
4 12488.525 0.076 12488.714
5 12329.816 0.069 12329.963
6 12759.869 0.078 12760.013
7 13041.588 0.079 13041.756
8 13044.637 0.083 13044.794
9 12318.512 0.069 12318.673
10 12208.484 0.126 12208.628
Avg 12450.2705 0.0852 12450.4355

Method C Results

Method C Actual Planning Execution
1 12029.017 0.076 12029.306
2 12176.148 0.066 12176.276
3 11607.342 0.075 11607.463
4 12070.27 0.074 12070.393
5 11944.737 0.077 11944.861
6 11736.442 0.074 11736.595
7 12217.075 0.074 12217.179
8 12106.019 0.074 12106.151
9 11877.754 0.041 11877.869
10 11860.639 0.109 11860.764
Avg 11962.5443 0.074 11962.6857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment