Skip to content

Instantly share code, notes, and snippets.

@fengb
Created March 9, 2016 21:54
Show Gist options
  • Save fengb/058e2378e4931b69b869 to your computer and use it in GitHub Desktop.
Save fengb/058e2378e4931b69b869 to your computer and use it in GitHub Desktop.
Postgres: LEFT JOIN vs NOT IN vs NOT EXISTS

LEFT JOIN

$ EXPLAIN SELECT base.* FROM base LEFT JOIN relation ON relation.base_id = base.id WHERE relation.id IS NULL ORDER BY id LIMIT 1000;
                                            QUERY PLAN                           
---------------------------------------------------------------------------------------------------
 Limit  (cost=755508.94..755508.95 rows=1 width=54)                              
   ->  Sort  (cost=755508.94..755508.95 rows=1 width=54)                         
         Sort Key: base.id                                                       
         ->  Hash Left Join  (cost=44.65..755508.93 rows=1 width=54)             
               Hash Cond: (base.id = relation.base_id)                           
               Filter: (relation.id IS NULL)                                     
               ->  Seq Scan on base  (cost=0.00..641946.64 rows=30267264 width=54)
               ->  Hash  (cost=25.40..25.40 rows=1540 width=8)                   
                     ->  Seq Scan on relation  (cost=0.00..25.40 rows=1540 width=8) 

LEFT JOIN optimized

$ EXPLAIN SELECT base.* FROM base LEFT JOIN relation ON base_id=base.id WHERE base_id IS NULL ORDER BY id LIMIT 1000;
                                                                     QUERY PLAN  
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=80.66..121.26 rows=1000 width=54)                                  
   ->  Merge Anti Join  (cost=80.66..1215880.10 rows=29941264 width=54)          
         Merge Cond: (base.id = relation.base_id)                                
         ->  Index Scan using index_base_on_id on base  (cost=0.56..1125262.52 rows=30267264 width=54)
         ->  Index Only Scan using index_relation_on_base_id on relation  (cost=0.42..10874.42 rows=326000 width=4)

NOT IN

$ EXPLAIN SELECT base.* FROM base WHERE id NOT IN (SELECT base_id FROM relation) ORDER BY id LIMIT 1000; 
                                                              QUERY PLAN         
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..9376079.92 rows=1000 width=54)                               
   ->  Index Scan using index_base_on_id on base  (cost=0.56..141894134562.68 rows=15133632 width=54)
         Filter: (NOT (SubPlan 1))                                               
         SubPlan 1                                                               
           ->  Materialize  (cost=0.00..8561.00 rows=326000 width=4)             
                 ->  Seq Scan on relation  (cost=0.00..5657.00 rows=326000 width=4) 

NOT IN optimized

$ EXPLAIN SELECT base.* FROM base WHERE id NOT IN (SELECT base_id FROM relation WHERE base_id=base.id) ORDER BY id LIMIT 1000;
                                                                  QUERY PLAN     
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..8944.92 rows=1000 width=54)                                  
   ->  Index Scan using index_base_on_id on base  (cost=0.56..135360578.36 rows=15133632 width=54)
         Filter: (NOT (SubPlan 1))                                               
         SubPlan 1                                                               
           ->  Index Only Scan using index_relation_on_base_id on relation  (cost=0.42..8.44 rows=1 width=4)
                 Index Cond: (base_id = base.id)                                 

NOT EXISTS

$ EXPLAIN SELECT base.* FROM base WHERE NOT EXISTS (SELECT * FROM relation WHERE base_id=base.id) ORDER BY id LIMIT 1000; 
                                                                     QUERY PLAN  
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=80.66..121.26 rows=1000 width=54)                                  
   ->  Merge Anti Join  (cost=80.66..1215880.10 rows=29941264 width=54)          
         Merge Cond: (base.id = relation.base_id)                                
         ->  Index Scan using index_base_on_id on base  (cost=0.56..1125262.52 rows=30267264 width=54)
         ->  Index Only Scan using index_relation_on_base_id on relation  (cost=0.42..10874.42 rows=326000 width=4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment