Skip to content

Instantly share code, notes, and snippets.

@hitsumabushi
Created June 8, 2018 09:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hitsumabushi/92add96e13b1baa13ef3a58de9fec8a3 to your computer and use it in GitHub Desktop.
Save hitsumabushi/92add96e13b1baa13ef3a58de9fec8a3 to your computer and use it in GitHub Desktop.
One example: PostgreSQL doesn't use index with search condition on not balanced column

How to solve this problem

  • change random_page_cost

How to run

Setup

  1. run postgres
docker run --rm -p 5432:5432 --name postgres -e POSTGRES_PASSWORD=pass -d postgres:latest
  1. generate table records with gen_table1_records.py
python gen_ps_table_record.py > sample.csv
  1. create table and import csv data
psql -h localhost -p 5432 -W -U postgres

postgres=# CREATE TABLE test1 (id integer primary key, group_id integer, name text);
postgres=# CREATE INDEX idx_group_id ON test1 USING btree (group_id);
postgres=# \copy test1 (id, group_id, name) from 'sample.csv' with csv;

EXPLAIN query

  1. seasrch with 'general' values
postgres=# EXPLAIN ANALYZE select * from test1 where group_id = 10;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_group_id on test1  (cost=0.42..9.31 rows=33 width=14) (actual time=0.039..0.039 rows=1 loops=1)
   Index Cond: (group_id = 10)
 Planning time: 0.222 ms
 Execution time: 0.053 ms
  1. seasrch with 'rare' values
postgres=# EXPLAIN ANALYZE select * from test1 where group_id = 5;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on test1  (cost=0.00..17907.25 rows=999900 width=14) (actual time=0.010..109.217 rows=1000001 loops=1)
   Filter: (group_id = 5)
   Rows Removed by Filter: 99
 Planning time: 0.063 ms
 Execution time: 141.340 ms
special_group_id = 5
other_records = 100
for x in range(other_records):
print(f"{x},{x},{x}")
for x in range(1000000):
id = other_records + x
print(f"{id},{special_group_id},{id}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment