Skip to content

Instantly share code, notes, and snippets.

@collimarco
Last active January 15, 2020 13:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save collimarco/039412b4fe0dcf39955888f96eff29db to your computer and use it in GitHub Desktop.
Save collimarco/039412b4fe0dcf39955888f96eff29db to your computer and use it in GitHub Desktop.
PostgreSQL bad plan when you add many OR conditions
"index_subscriptions_on_project_id_and_created_at" btree (project_id, created_at DESC)
"index_subscriptions_on_project_id_and_tags" gin (project_id, tags) WHERE trashed_at IS NULL
"index_subscriptions_on_project_id_and_tags_using_btree" btree (project_id, tags) WHERE trashed_at IS NULL
...
EXPLAIN ANALYZE SELECT "subscriptions".* FROM "subscriptions"
WHERE "subscriptions"."project_id" = 12345 AND "subscriptions"."trashed_at" IS NULL AND
((tags @> ARRAY['crt:2018_04']::varchar[]) OR (tags @> ARRAY['crt:2018_05']::varchar[]) OR (tags @> ARRAY['crt:2018_06']::varchar[])
OR (tags @> ARRAY['crt:2018_07']::varchar[]) OR (tags @> ARRAY['crt:2018_08']::varchar[]) OR (tags @> ARRAY['crt:2018_09']::varchar[])
OR (tags @> ARRAY['crt:2018_10']::varchar[]) OR (tags @> ARRAY['crt:2018_11']::varchar[]) OR (tags @> ARRAY['crt:2018_12']::varchar[])
OR (tags @> ARRAY['crt:2019_01']::varchar[]) OR (tags @> ARRAY['crt:2019_02']::varchar[]) OR (tags @> ARRAY['crt:2019_03']::varchar[])
OR (tags @> ARRAY['crt:2019_04']::varchar[]) OR (tags @> ARRAY['crt:2019_05']::varchar[]) OR (tags @> ARRAY['crt:2019_06']::varchar[]))
ORDER BY "subscriptions"."created_at" DESC LIMIT 30 OFFSET 0;
QUERY PLAN
Limit (cost=15276.53..15276.61 rows=30 width=382) (actual time=244.343..244.343 rows=0 loops=1)
-> Sort (cost=15276.53..15293.52 rows=6796 width=382) (actual time=244.342..244.342 rows=0 loops=1)
Sort Key: created_at DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on subscriptions (cost=1529.34..15075.82 rows=6796 width=382) (actual time=244.335..244.335 rows=0 loops=1)
Recheck Cond: (((project_id = 12345) AND (tags @> '{crt:2018_04}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2018_05}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2018_06}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2018_07}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2018_08}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2018_09}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2018_10}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2018_11}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2018_12}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2019_01}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2019_02}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2019_03}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2019_04}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2019_05}'::character varying[]) AND (trashed_at IS NULL)) OR ((project_id = 12345) AND (tags @> '{crt:2019_06}'::character varying[]) AND (trashed_at IS NULL)))
-> BitmapOr (cost=1529.34..1529.34 rows=6799 width=0) (actual time=244.333..244.333 rows=0 loops=1)
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=18.808..18.808 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_04}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=8.046..8.046 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_05}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=12.184..12.184 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_06}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=5.299..5.300 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_07}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=6.942..6.942 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_08}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..103.51 rows=800 width=0) (actual time=4.437..4.437 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_09}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=6.162..6.162 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_10}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=5.278..5.278 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_11}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=3.842..3.842 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2018_12}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=6.458..6.458 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2019_01}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=5.705..5.705 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2019_02}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..103.51 rows=800 width=0) (actual time=49.725..49.725 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2019_03}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=46.062..46.062 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2019_04}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=58.572..58.572 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2019_05}'::character varying[]))
-> Bitmap Index Scan on index_subscriptions_on_project_id_and_tags (cost=0.00..99.76 rows=400 width=0) (actual time=6.804..6.804 rows=0 loops=1)
Index Cond: ((project_id = 12345) AND (tags @> '{crt:2019_06}'::character varying[]))
Planning Time: 0.400 ms
Execution Time: 244.458 ms
(39 rows)
EXPLAIN ANALYZE SELECT "subscriptions".* FROM "subscriptions"
WHERE "subscriptions"."project_id" = 12345 AND "subscriptions"."trashed_at" IS NULL AND
((tags @> ARRAY['crt:2018_04']::varchar[]) OR (tags @> ARRAY['crt:2018_05']::varchar[]) OR (tags @> ARRAY['crt:2018_06']::varchar[])
OR (tags @> ARRAY['crt:2018_07']::varchar[]) OR (tags @> ARRAY['crt:2018_08']::varchar[]) OR (tags @> ARRAY['crt:2018_09']::varchar[])
OR (tags @> ARRAY['crt:2018_10']::varchar[]) OR (tags @> ARRAY['crt:2018_11']::varchar[]) OR (tags @> ARRAY['crt:2018_12']::varchar[])
OR (tags @> ARRAY['crt:2019_01']::varchar[]) OR (tags @> ARRAY['crt:2019_02']::varchar[]) OR (tags @> ARRAY['crt:2019_03']::varchar[])
OR (tags @> ARRAY['crt:2019_04']::varchar[]) OR (tags @> ARRAY['crt:2019_05']::varchar[]) OR (tags @> ARRAY['crt:2019_06']::varchar[])
OR (tags @> ARRAY['crt:2019_07']::varchar[]) OR (tags @> ARRAY['crt:2019_08']::varchar[]) OR (tags @> ARRAY['crt:2019_09']::varchar[])
OR (tags @> ARRAY['crt:2019_10']::varchar[]) OR (tags @> ARRAY['crt:2019_11']::varchar[]) OR (tags @> ARRAY['crt:2019_12']::varchar[])
OR (tags @> ARRAY['crt:2020_01']::varchar[]) OR (tags @> ARRAY['crt:2020_02']::varchar[]) OR (tags @> ARRAY['crt:2020_03']::varchar[])
OR (tags @> ARRAY['crt:2020_04']::varchar[]) OR (tags @> ARRAY['crt:2020_05']::varchar[]) OR (tags @> ARRAY['crt:2020_06']::varchar[])
OR (tags @> ARRAY['crt:2020_07']::varchar[]) OR (tags @> ARRAY['crt:2020_08']::varchar[]) OR (tags @> ARRAY['crt:2020_09']::varchar[])
OR (tags @> ARRAY['crt:2020_10']::varchar[]) OR (tags @> ARRAY['crt:2020_11']::varchar[]) OR (tags @> ARRAY['crt:2020_12']::varchar[]))
ORDER BY "subscriptions"."created_at" DESC LIMIT 30 OFFSET 0;
QUERY PLAN
Limit (cost=1000.69..10772.90 rows=30 width=382) (actual time=37805.798..37805.798 rows=0 loops=1)
-> Gather Merge (cost=1000.69..4817075.07 rows=14785 width=382) (actual time=37805.797..38062.156 rows=0 loops=1)
Workers Planned: 7
Workers Launched: 7
-> Parallel Index Scan using index_subscriptions_on_project_id_and_created_at on subscriptions (cost=0.56..4814263.79 rows=2112 width=382) (actual time=37792.830..37792.830 rows=0 loops=8)
Index Cond: (project_id = 12345)
Filter: ((trashed_at IS NULL) AND ((tags @> '{crt:2018_04}'::character varying[]) OR (tags @> '{crt:2018_05}'::character varying[]) OR (tags @> '{crt:2018_06}'::character varying[]) OR (tags @> '{crt:2018_07}'::character varying[]) OR (tags @> '{crt:2018_08}'::character varying[]) OR (tags @> '{crt:2018_09}'::character varying[]) OR (tags @> '{crt:2018_10}'::character varying[]) OR (tags @> '{crt:2018_11}'::character varying[]) OR (tags @> '{crt:2018_12}'::character varying[]) OR (tags @> '{crt:2019_01}'::character varying[]) OR (tags @> '{crt:2019_02}'::character varying[]) OR (tags @> '{crt:2019_03}'::character varying[]) OR (tags @> '{crt:2019_04}'::character varying[]) OR (tags @> '{crt:2019_05}'::character varying[]) OR (tags @> '{crt:2019_06}'::character varying[]) OR (tags @> '{crt:2019_07}'::character varying[]) OR (tags @> '{crt:2019_08}'::character varying[]) OR (tags @> '{crt:2019_09}'::character varying[]) OR (tags @> '{crt:2019_10}'::character varying[]) OR (tags @> '{crt:2019_11}'::character varying[]) OR (tags @> '{crt:2019_12}'::character varying[]) OR (tags @> '{crt:2020_01}'::character varying[]) OR (tags @> '{crt:2020_02}'::character varying[]) OR (tags @> '{crt:2020_03}'::character varying[]) OR (tags @> '{crt:2020_04}'::character varying[]) OR (tags @> '{crt:2020_05}'::character varying[]) OR (tags @> '{crt:2020_06}'::character varying[]) OR (tags @> '{crt:2020_07}'::character varying[]) OR (tags @> '{crt:2020_08}'::character varying[]) OR (tags @> '{crt:2020_09}'::character varying[]) OR (tags @> '{crt:2020_10}'::character varying[]) OR (tags @> '{crt:2020_11}'::character varying[]) OR (tags @> '{crt:2020_12}'::character varying[])))
Rows Removed by Filter: 1054858
Planning Time: 0.525 ms
Execution Time: 38062.199 ms
(10 rows)
@collimarco
Copy link
Author

Also note that removing ORDER BY clause from the slow query, makes Postgresql choose the correct plan again

@vsolovyov
Copy link

I once stumbled upon the same problem, I was using @> on jsonb column (indexed with jsonb_path_ops GIN) and after some stacking of OR @> OR ..., query plan ended up degrading into sequential scan.

The culprit is that it's really hard to correctly gather statistics for @> operation to give correct estimates. When I looked into it then, it just had a 0.001 hardcoded selectivity, meaning that Postgres thinks each @> operation will match 1/1000 of all table rows.

https://github.com/postgres/postgres/blob/7559d8ebfa11d98728e816f6b655582ce41150f3/src/backend/utils/adt/geo_selfuncs.c#L86 looks like it hasn't changed and has 0.001 selectivity.

I don't understand why two of the clauses have estimate for 800 rows and all the others have 400. My understanding is that all of them should be identical. Anyone, please correct my understanding here if you understand why they're different.

So Postgres thinks it will match 6796 rows in the first case. (It's less than 6800, because Postgres assumes uniform distribution here and different @> can theoretically match same rows). When you stack more ORs its estimate rises to 14785 rows, and the combination of predicted number of results, random page access costs and sequential page access costs put you on the threshold of separate Bitmap Index Scans and Parallel Index Scan. That's why ORDER BY matters. When you'll add even more @>, it will stop switching between them and will use only index scan regardless of ORDER BY, I think.

When I was solving this problem myself, I found a threshold number of ORs where plan would still go to stitching many Bitmap Index Scans together instead of sequential scan, and "sharded" the query from the application side, meaning I would execute many selects, each with at most 150 @> operations. A couple of years after that, we refactored this part to use an additional table with a regular btree index and it sped up these queries very much and simplified the code too.

In your case it means to add additional "subscriptions_tags" table with columns like subscription_id, project_id, tag and then query it.

Something like this will be much faster than even your current fast query and will be much more predictable in performance:

SELECT subscriptions.* FROM subscriptions WHERE id IN (SELECT subscription_id FROM subscriptions_tags WHERE project_id = 12345 AND tag = ANY(['crt:2018_05', 'crt:2018_06', 'crt:2018_07', ..., 'crt:2022_13'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment