Skip to content

Instantly share code, notes, and snippets.

@mxm
Last active January 9, 2018 15:59
Show Gist options
  • Save mxm/5572865f4b4e95ae05190238f87c3530 to your computer and use it in GitHub Desktop.
Save mxm/5572865f4b4e95ae05190238f87c3530 to your computer and use it in GitHub Desktop.
Postgres sorted merge join vs hash join

Join Strategy

Sorted Merge with t1.size ~= t2.size

max=# explain select * from t1 join t2 on t1.col1 = t2.col1;
                            QUERY PLAN
------------------------------------------------------------------
 Merge Join  (cost=359.57..860.00 rows=32512 width=8)
   Merge Cond: (t1.col1 = t2.col1)
   ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
         Sort Key: t1.col1
         ->  Seq Scan on t1  (cost=0.00..35.50 rows=2550 width=4)
   ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
         Sort Key: t2.col1
         ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4)
(8 rows)

Hash Join with t1.size >> t2.size

max=# explain select * from t1 join t2 on t1.col1 = t2.col1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Hash Join  (cost=648541.02..688224.02 rows=3094 width=8)
   Hash Cond: (t2.col1 = t1.col1)
   ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4)
   ->  Hash  (cost=490844.01..490844.01 rows=9612001 width=4)
         ->  Seq Scan on t1  (cost=0.00..490844.01 rows=9612001 width=4)
(5 rows)

Benchmark Hash Join vs Sorted Merge Join

Hash Join with t1.size ~= t3.size

max=# explain analyze select * from t1 join t3 on t1.col1 = t3.id ;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=351477.66..10240258.86 rows=12288024 width=8) (actual time=2967.984..12401.812 rows=16200069 loops=1)
   Hash Cond: (t1.col1 = t3.id)
   ->  Seq Scan on t1  (cost=0.00..490844.01 rows=9612001 width=4) (actual time=83.087..2464.934 rows=11400005 loops=1)
   ->  Hash  (cost=164444.18..164444.18 rows=11400118 width=4) (actual time=2877.470..2877.470 rows=11400005 loops=1)
         Buckets: 131072  Batches: 256  Memory Usage: 2599kB
         ->  Seq Scan on t3  (cost=0.00..164444.18 rows=11400118 width=4) (actual time=0.019..1019.312 rows=11400005 loops=1)
 Planning time: 0.122 ms
 Execution time: 12974.097 ms
(8 rows)

Sorted Merge Join with t1.size ~= t3.size

max=# explain analyze select * from t1 join t3 on t1.col1 = t3.id ;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=3918192.23..4232516.46 rows=15607121 width=8) (actual time=14056.200..24009.864 rows=16200069 loops=1)
   Merge Cond: (t1.col1 = t3.id)
   ->  Sort  (cost=1868492.37..1892522.37 rows=9612001 width=4) (actual time=7835.377..10211.875 rows=11400005 loops=1)
         Sort Key: t1.col1
         Sort Method: external merge  Disk: 156056kB
         ->  Seq Scan on t1  (cost=0.00..490844.01 rows=9612001 width=4) (actual time=81.890..2362.803 rows=11400005 loops=1)
   ->  Materialize  (cost=2049699.86..2114014.68 rows=12862965 width=4) (actual time=6220.819..9888.607 rows=16200069 loops=1)
         ->  Sort  (cost=2049699.86..2081857.27 rows=12862965 width=4) (actual time=6220.815..8571.248 rows=11400005 loops=1)
               Sort Key: t3.id
               Sort Method: external merge  Disk: 156048kB
               ->  Seq Scan on t3  (cost=0.00..179072.65 rows=12862965 width=4) (actual time=0.299..1350.992 rows=11400005 loops=1)
 Planning time: 0.393 ms
 Execution time: 24598.336 ms
(13 rows)

Hash Join with t1.size >> t2.size

max=# explain analyze select * from t1 join t2 on t1.col1 = t2.col1 ;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=648541.02..688224.02 rows=3094 width=8) (actual time=4438.188..5969.536 rows=65 loops=1)
   Hash Cond: (t2.col1 = t1.col1)
   ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.012..0.014 rows=9 loops=1)
   ->  Hash  (cost=490844.01..490844.01 rows=9612001 width=4) (actual time=3965.638..3965.638 rows=11400005 loops=1)
         Buckets: 131072 (originally 131072)  Batches: 256 (originally 128)  Memory Usage: 3073kB
         ->  Seq Scan on t1  (cost=0.00..490844.01 rows=9612001 width=4) (actual time=0.118..2328.936 rows=11400005 loops=1)
 Planning time: 0.092 ms
 Execution time: 5969.845 ms
(8 rows)

Sorted Merge Join with t1.size >> t2.size

max=# set enable_hashjoin = false;
SET
max=# explain analyze select * from t1 join t2 on t1.col1 = t2.col1 ;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1868672.15..1916778.57 rows=3094 width=8) (actual time=7196.360..7196.377 rows=65 loops=1)
   Merge Cond: (t1.col1 = t2.col1)
   ->  Sort  (cost=1868492.37..1892522.37 rows=9612001 width=4) (actual time=7196.332..7196.337 rows=30 loops=1)
         Sort Key: t1.col1
         Sort Method: external merge  Disk: 156048kB
         ->  Seq Scan on t1  (cost=0.00..490844.01 rows=9612001 width=4) (actual time=1.735..2281.426 rows=11400005 loops=1)
   ->  Sort  (cost=179.78..186.16 rows=2550 width=4) (actual time=0.024..0.026 rows=60 loops=1)
         Sort Key: t2.col1
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.011..0.012 rows=9 loops=1)
 Planning time: 0.083 ms
 Execution time: 7366.838 ms
(12 rows)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment