Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
EXPLAIN SELECT * FROM messages_binary WHERE group_id = '983d5259-97ff-49e3-8829-101ab8dead92' AND timestamp > '1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e';
// gives:
Gather (cost=1000.00..408555.61 rows=4630 width=183)
Workers Planned: 2
-> Parallel Seq Scan on messages_binary (cost=0.00..407092.61 rows=1929 width=183)
Filter: (("timestamp" > '1970-01-01T00:00:00.000Z-0000-ae26b84edae7349e'::text) AND (group_id = '983d5259-97ff-49e3-8829-101ab8dead92'::text))
(4 rows)
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------------+----------------------+------------+--------------------------------------------------------------------------------------------------------
public | messages_binary | messages_binary_pkey | | CREATE UNIQUE INDEX messages_binary_pkey ON public.messages_binary USING btree ("timestamp", group_id)
(1 row)
@vhosurtest
Copy link

vhosurtest commented Nov 7, 2021

Could you reverse the column order in the index? Could you create an index on (group_id, timestamp) instead of (timestamp, group_id)? There are two reasons for this:

  1. The index will be used to seek the group_id which will be blazing fast.
  2. The second column in the index will be used to filter the timestamp range. The timestamp column will already be sorted so the scan is fast too.

Having said that, I have no clue about the selectivity of your timestamp and group_id columns but my solution is worth a shot. Let me know if that works out!

@ryankshaw
Copy link

ryankshaw commented Nov 8, 2021

It sounds from your twitter thread that you have already figured something out but I think you might get the best perf by keeping, but changing the column order in your composite key. Could you change it to this?

CREATE UNIQUE INDEX index_messages_binary_on_group_id_and_timestamp ON public.messages_binary USING btree (group_id, "timestamp")

I think (but I'm a dummy so not sure) this should be even faster than just having an index on group_id for this query.

query plans in plain english:

Using just group_id:

  1. use index of group_id to go straight to all the rows for this group_id (bypassing a complete sequential scan of the whole table).
  2. now that you have all of those rows for this group_id, manually filter each of those looking for ones with timestamp > '1970...

using group_id, "timestamp" index:

  1. use combined group_id, "timestamp" index to go straight to the rows for this group id after timestamp > '1970... (bypassing complete sequential scan of table but also bypassing the second step above where you have to filter by timestamp)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment