- OS: Debian Bullseye 5.10.0-23-amd64
- Virtualized: Yes (Proxmox)
- CPU: E5-2650 v2 @ 2.60GHz
- Allocated Core Count: 16
- Allocated RAM: 64 GiB PC3-12800R
- Disk: Samsung PM983 1.92 TiB via Ceph
- Filesystem: XFS
- Mount Options: defaults,noatime
- Postgres Version: 15.3
- Postgres Options (non-default):
- shared_buffers = 16GB
- max_wal_size = 8GB
The table is leftover from a class I taught for my last job on normalization (in this case, unnormalized).
CREATE TABLE public.todo_unf (
id bigint NOT NULL PRIMARY KEY,
user_id uuid NOT NULL,
user_email character varying(254) NOT NULL,
created_at timestamp with time zone DEFAULT now() NOT NULL,
tags jsonb DEFAULT '[]'::jsonb NOT NULL,
shared_with jsonb DEFAULT '[]'::jsonb NOT NULL,
due_date timestamp with time zone,
is_completed boolean DEFAULT false NOT NULL,
is_overdue boolean DEFAULT false NOT NULL,
description text NOT NULL,
title character varying(255) NOT NULL,
user_name character varying(255) NOT NULL
);
The table has 25,000,000 rows, precisely half of which have is_completed = TRUE
:
SELECT COUNT(*) FROM todo_unf;
count
----------
25000000
(1 row)
SELECT COUNT(*) FROM todo_unf WHERE is_completed;
count
----------
12500000
(1 row)
There are other indices as well, but the below two were created for the purposes of this demonstration.
CREATE INDEX todo_unf_is_comp ON public.todo_unf USING btree (is_completed);
CREATE INDEX todo_unf_is_comp_false ON public.todo_unf USING btree (is_completed) WHERE (NOT is_completed);
EXPLAIN (ANALYZE, BUFFERS, COSTS) SELECT * FROM todo_unf WHERE NOT is_completed;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Seq Scan on todo_unf (cost=0.00..946414.00 rows=12446518 width=186) (actual time=73.242..4992.580 rows=12500000 loops=1)
Filter: (NOT is_completed)
Rows Removed by Filter: 12500000
Buffers: shared hit=696417
Planning Time: 0.459 ms
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.107 ms, Inlining 3.020 ms, Optimization 44.039 ms, Emission 25.878 ms, Total 74.045 ms
Execution Time: 5679.940 ms
(10 rows)
As you can see, the DB has been warmed up, and the entire table is in buffers - no disk reads. Even still, Postgres chose to perform a sequential scan, and it took 5.7 seconds.
I'll now disable sequential scans, and re-run:
SET enable_seqscan=OFF;
SET
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on todo_unf (cost=107648.65..1048156.09 rows=12446518 width=186) (actual time=1123.364..6877.809 rows=12500000 loops=1)
Recheck Cond: (NOT is_completed)
Rows Removed by Index Recheck: 12392772
Heap Blocks: exact=38348 lossy=658069
Buffers: shared hit=706941
-> Bitmap Index Scan on todo_unf_is_comp_false (cost=0.00..104537.02 rows=12446518 width=0) (actual time=1084.098..1084.099 rows=12500000 loops=1)
Buffers: shared hit=10524
Planning Time: 0.385 ms
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.142 ms, Inlining 1.748 ms, Optimization 17.962 ms, Emission 10.057 ms, Total 31.908 ms
Execution Time: 7563.283 ms
(13 rows)
This time, it used the partial index todo_unf_is_comp_false
, resulting in a bitmap index scan + bitmap heap scan
(as expected, since not all columns in the selection are covered by an index), and a total time of 7.6 seconds.
Now I'll disable the partial index, forcing it to only have the full index. I use a function for this, but to just
do a one-off you can run UPDATE pg_index SET indisvalid = FALSE WHERE indexrelid = $YOUR_INDEX_NAME::regclass;
.
Or for this case, you could test the opposite (WHERE is_completed
) as that can't use the partial index.
SELECT toggle_idx('todo_unf_is_comp_false');
toggle_idx
------------------------------------------
index todo_unf_is_comp_false is DISABLED
(1 row)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on todo_unf (cost=138580.94..1079088.38 rows=12446518 width=186) (actual time=923.364..6981.017 rows=12500000 loops=1)
Filter: (NOT is_completed)
Rows Removed by Filter: 12392772
Heap Blocks: exact=38348 lossy=658069
Buffers: shared hit=706941
-> Bitmap Index Scan on todo_unf_is_comp (cost=0.00..135469.32 rows=12446517 width=0) (actual time=884.125..884.126 rows=12500000 loops=1)
Index Cond: (is_completed = false)
Buffers: shared hit=10524
Planning:
Buffers: shared hit=8
Planning Time: 0.686 ms
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.548 ms, Inlining 1.763 ms, Optimization 18.547 ms, Emission 10.064 ms, Total 31.922 ms
Execution Time: 7735.204 ms
(16 rows)
This was fairly close to the first at 7.7 seconds; a larger table still would likely show further amplification.