Skip to content

Instantly share code, notes, and snippets.

@bnm3k
Last active January 11, 2022 14:24
Show Gist options
  • Save bnm3k/921676a34dc9c31e1a2717d293adb7a6 to your computer and use it in GitHub Desktop.
Save bnm3k/921676a34dc9c31e1a2717d293adb7a6 to your computer and use it in GitHub Desktop.
Postgres indexing for overlaps operator doesn't work

Postgres indexing for overlaps operator doesn't work

Credits: Jonathan Katz: Lets Build a Complex Real Time Data Management Application -- PGCon 2018

Start with the following table:

create table bookings (
  id int generated by default as identity primary key,
  start_time timestamptz not null,
  end_time timestamptz not null,
  check (end_time >= start_time)
);

Populate it with some generated data:

insert into bookings(start_time, end_time)
select 
    y.start_time + (x || ' days')::interval,
    y.end_time + (x || ' days')::interval
from generate_series(1, 400000) x,
(
    select *
    from ( values
      ('2003-04-01 9:00'::timestamptz, '2003-04-01 12:30'::timestamptz),
      ('2003-04-01 14:00'::timestamptz, '2003-04-01 16:30'::timestamptz),
      ('2003-04-01 19:00'::timestamptz, '2003-04-01 21:00'::timestamptz)
    ) z(start_time, end_time)
) y;

Suppose you have the following query:

select * from bookings 
where 
  (start_time, end_time) overlaps
  ('2018-05-30 09:00'::timestamptz, '2018-05-30 12:30'::timestamptz);

Currently, the query does a full table scan.

To avoid the sequential scan, I've tried variations of the following indexes but none of them works:

create index b1 on bookings(start_time);
create index b2 on bookings(end_time);
create index b3 on bookings(start_time,end_time);

The query plan always ends up as so:

 Seq Scan on bookings  (cost=0.00..22644.00 rows=400000 width=20)
   Filter: ((start_time, end_time) OVERLAPS (A, B)

It seems the only way to use the indexes is to rewrite the query. Unless I'm missing something, this should be equivalent:

select * from bookings 
where 
  (start_time >= '2018-05-30 09:00'::timestamptz and start_time < '2018-05-30 12:30'::timestamptz)
  or
  (end_time > '2018-05-30 09:00'::timestamptz and end_time <= '2018-05-30 12:30'::timestamptz)

If we drop index b3 and keep b2 and b1, we get the following plan for the above query (I've truncated the arguments):

 Bitmap Heap Scan on bookings  (cost=8.88..12.90 rows=1 width=20) (actual time=0.031..0.034 rows=1 loops=1)
   Recheck Cond: (((start_time >= A) AND (start_time < B)) OR ((end_time > A AND (end_time <= B )
   Heap Blocks: exact=1
   ->  BitmapOr  (cost=8.88..8.88 rows=1 width=0) (actual time=0.025..0.026 rows=0 loops=1)
         ->  Bitmap Index Scan on b1  (cost=0.00..4.44 rows=1 width=0) (actual time=0.015..0.015 rows=1 loops=1)
               Index Cond: ((start_time >= A) AND (start_time < B))
         ->  Bitmap Index Scan on b2  (cost=0.00..4.44 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)
               Index Cond: ((end_time > A) AND (end_time <= B))
 Planning Time: 0.537 ms
 Execution Time: 0.077 ms
(10 rows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment