dmofot/GeoJSON-Generation.md

## GeoJSON-Generation.md

      
    Raw
  

              GeoJSON-Generation.md
            
          
    Table of Contents


TL;DR
General

Method Information:
Benchmarking Information:
Postgres/PostGIS Information:


Create Benchmark Function
Tests

Method A Test
Method B Test
Method C Test


Results

Method A Results
Method B Results
Method C Results


TL;DR

Average time results (ms):


Method
Actual Avg
Planning Avg
Execution Avg


A
11800.092
0.0694
11800.295


B
12450.2705
0.0852
12450.4355


C
11962.5443
0.074
11962.6857


General

There are several methods for building GeoJSON directly in the database.  Each
method has it's own pros and cons, so I thought it would be useful to see an
example of each method and try to provide some sort of initial benchmarking.  The
table was built as a subset (just MA counties statefp='25') from a larger table
that contained all US counties.
Method Information:


Method A
Method B & C

Benchmarking Information:


Benchmark Function

Postgres/PostGIS Information:

$ psql -c "SELECT version();"
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
(1 row)

$ psql -c "SELECT PostGIS_full_version();"
                                                                         postgis_full_version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.9.4" LIBJSON="0.12.1" RASTER
(1 row)
Create Benchmark Function

This creates a function that will take the number of loops and the sql query as
arguments.
You will want to change the search_path if you use a different schema.
SET search_path=public;
CREATE FUNCTION f_test(ct int, sql text) RETURNS void AS
$func$
DECLARE
   i int;
BEGIN

FOR i IN 1 .. $1 LOOP
    EXECUTE sql;  -- not safe against SQLi!
END LOOP;

END
$func$ LANGUAGE plpgsql
Tests

Method A Test

Method A Pros:

Easiest to implement complex queries
Easiest to set column names in output
Fastest query of the three methods (by a hair)

Method A Cons:

Longest/verbose query

EXPLAIN ANALYZE
SELECT f_test(100, 
$x$WITH counties AS 
    (SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon, the_geom FROM public.ma_counties ), 
gis_data AS ( 
    SELECT 
    json_build_object( 
        'type'    , 'FeatureCollection', 
        'features', json_agg( 
            json_build_object( 
                'type'      , 'Feature', 
                'geometry'  , ST_AsGeoJSON(the_geom)::json, 
                'properties', json_build_object( 
                    'gid', gid,
                    'statefp', statefp,
                    'countyfp', countyfp,
                    'countyns', countyns,
                    'cntyidfp', cntyidfp,
                    'name', name,
                    'namelsad', namelsad,
                    'lsad', lsad,
                    'classfp', classfp,
                    'mtfcc', mtfcc,
                    'csafp', csafp,
                    'cbsafp', cbsafp,
                    'metdivfp', metdivfp,
                    'funcstat', funcstat,
                    'aland', aland,
                    'awater', awater,
                    'intptlat', intptlat,
                    'intptlon', intptlon
                ) 
            ) 
        ) 
    ) AS json_data 
    FROM counties 
) 
 
SELECT 
json_data::text 
FROM gis_data$x$)
Method B Test

Method B Pros:

Shorter/less verbose query than Method A

Method B Cons:

Harder than Method A to implement complex query
Slowest query of the three methods due to self-join

EXPLAIN ANALYZE
SELECT f_test(100, 
$x$SELECT row_to_json(fc)
 FROM ( SELECT 'FeatureCollection' As type, array_to_json(array_agg(f)) As features 
 FROM (SELECT 'Feature' As type
    , ST_AsGeoJSON(lg.the_geom)::json As geometry
    , row_to_json(lp) As properties
   FROM public.ma_counties As lg
         INNER JOIN (SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon FROM public.ma_counties ) As lp 
       ON lg.countyfp = lp.countyfp ) As f ) As fc ;$x$)
Method C Test

Method C Pros:

Second fasest query (basically as fast as Method A)
Shortest/least verbose query
Avoids self-join by using a nested subselect

Method C Cons:

Harder than Method A to implement complex query

EXPLAIN ANALYZE
SELECT f_test(100, 
$x$SELECT row_to_json(fc)
 FROM ( SELECT 'FeatureCollection' As type, array_to_json(array_agg(f)) As features
 FROM (SELECT 'Feature' As type
    , ST_AsGeoJSON(lg.the_geom)::json As geometry
    , row_to_json((SELECT l FROM (SELECT gid, statefp, countyfp, countyns, cntyidfp, name, namelsad, lsad, classfp, mtfcc, csafp, cbsafp, metdivfp, funcstat, aland, awater, intptlat, intptlon) As l
      )) As properties
   FROM public.ma_counties As lg ) As f ) As fc;$x$)
Results

Here are the results (ms).  Each query was set for 100 loops and was run
10 times to provide an average that makes sure caching is taken into account.
Method A Results


Method A
Actual
Planning
Execution


1
12451.658
0.067
12452.036


2
11786.96
0.067
11787.132


3
11636.196
0.073
11636.367


4
12423.064
0.045
12423.236


5
11680.952
0.081
11681.107


6
11728.312
0.073
11728.481


7
11497.578
0.068
11497.782


8
11550.097
0.073
11550.287


9
11630.386
0.074
11630.602


10
11615.717
0.073
11615.92


Avg
11800.092
0.0694
11800.295


Method B Results


Method B
Actual
Planning
Execution


1
12087.377
0.11
12087.605


2
12490.38
0.083
12490.536


3
11733.517
0.079
11733.673


4
12488.525
0.076
12488.714


5
12329.816
0.069
12329.963


6
12759.869
0.078
12760.013


7
13041.588
0.079
13041.756


8
13044.637
0.083
13044.794


9
12318.512
0.069
12318.673


10
12208.484
0.126
12208.628


Avg
12450.2705
0.0852
12450.4355


Method C Results


Method C
Actual
Planning
Execution


1
12029.017
0.076
12029.306


2
12176.148
0.066
12176.276


3
11607.342
0.075
11607.463


4
12070.27
0.074
12070.393


5
11944.737
0.077
11944.861


6
11736.442
0.074
11736.595


7
12217.075
0.074
12217.179


8
12106.019
0.074
12106.151


9
11877.754
0.041
11877.869


10
11860.639
0.109
11860.764


Avg
11962.5443
0.074
11962.6857
Method	Actual Avg	Planning Avg	Execution Avg
A	11800.092	0.0694	11800.295
B	12450.2705	0.0852	12450.4355
C	11962.5443	0.074	11962.6857
Method A	Actual	Planning	Execution
1	12451.658	0.067	12452.036
2	11786.96	0.067	11787.132
3	11636.196	0.073	11636.367
4	12423.064	0.045	12423.236
5	11680.952	0.081	11681.107
6	11728.312	0.073	11728.481
7	11497.578	0.068	11497.782
8	11550.097	0.073	11550.287
9	11630.386	0.074	11630.602
10	11615.717	0.073	11615.92
Avg	11800.092	0.0694	11800.295
Method B	Actual	Planning	Execution
1	12087.377	0.11	12087.605
2	12490.38	0.083	12490.536
3	11733.517	0.079	11733.673
4	12488.525	0.076	12488.714
5	12329.816	0.069	12329.963
6	12759.869	0.078	12760.013
7	13041.588	0.079	13041.756
8	13044.637	0.083	13044.794
9	12318.512	0.069	12318.673
10	12208.484	0.126	12208.628
Avg	12450.2705	0.0852	12450.4355
Method C	Actual	Planning	Execution
1	12029.017	0.076	12029.306
2	12176.148	0.066	12176.276
3	11607.342	0.075	11607.463
4	12070.27	0.074	12070.393
5	11944.737	0.077	11944.861
6	11736.442	0.074	11736.595
7	12217.075	0.074	12217.179
8	12106.019	0.074	12106.151
9	11877.754	0.041	11877.869
10	11860.639	0.109	11860.764
Avg	11962.5443	0.074	11962.6857