Skip to content

Instantly share code, notes, and snippets.

@nh2
Last active October 31, 2020 11:18
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nh2/bbbff93be04ea6c5181ec948981081b1 to your computer and use it in GitHub Desktop.
Save nh2/bbbff93be04ea6c5181ec948981081b1 to your computer and use it in GitHub Desktop.
index-only scan on table isn't working, so SELECT COUNT(*) FROM t is slow -- see https://stackoverflow.com/questions/16916633/if-postgresql-count-is-always-slow-how-to-paginate-complex-queries

Using postgresql 10 on a spinning disk.

Table layout

postgres=# \d+ t;
                                                 Table "public.t"
  Column  |  Type   | Collation | Nullable |            Default            | Storage  | Stats target | Description 
----------+---------+-----------+----------+-------------------------------+----------+--------------+-------------
 id       | integer |           | not null | nextval('t_id_seq'::regclass) | plain    |              | 
 the_data | jsonb   |           |          |                               | extended |              | 
Indexes:
    "t_pkey" PRIMARY KEY, btree (id)
    "t_to_tsvector_idx" gin (to_tsvector('english'::regconfig, the_data))

The jsonb values are ~500 bytes in size each.

The first count is very slow (19 seconds)

postgres=# EXPLAIN analyze SELECT COUNT(*) FROM t;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=158885.61..158885.62 rows=1 width=8) (actual time=18941.636..18941.636 rows=1 loops=1)
   ->  Gather  (cost=158885.40..158885.61 rows=2 width=8) (actual time=18941.624..18941.631 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=157885.40..157885.41 rows=1 width=8) (actual time=18930.993..18930.993 rows=1 loops=3)
               ->  Parallel Seq Scan on t  (cost=0.00..155389.12 rows=998512 width=0) (actual time=1781.971..18877.031 rows=799319 loops=3)
 Planning time: 65.893 ms
 Execution time: 18943.667 ms
(8 rows)

Unfortunately I didn't use EXPLAIN buffers when I ran this the first time, so we only have analyze output.

Subsequent counts are faster, but still slow (200 ms)

postgres=# EXPLAIN (buffers, analyze) SELECT COUNT(*) FROM t;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=158885.61..158885.62 rows=1 width=8) (actual time=216.548..216.548 rows=1 loops=1)
   Buffers: shared hit=1428 read=48220
   ->  Gather  (cost=158885.40..158885.61 rows=2 width=8) (actual time=216.539..216.544 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=1428 read=48220
         ->  Partial Aggregate  (cost=157885.40..157885.41 rows=1 width=8) (actual time=214.568..214.568 rows=1 loops=3)
               Buffers: shared hit=2387 read=143017
               ->  Parallel Seq Scan on t  (cost=0.00..155389.12 rows=998512 width=0) (actual time=8.865..167.778 rows=799319 loops=3)
                     Buffers: shared hit=2387 read=143017
 Planning time: 0.047 ms
 Execution time: 218.602 ms

Showing size of the table

postgres=# select count(*) from t;
  count  
---------
 2397957
(1 row)

Time: 180.140 ms

EDIT: VACUUM ANALYZE seems to help!

postgres=# VACUUM (ANALYZE) t;
VACUUM
Time: 25379.988 ms (00:25.380)

After that:

postgres=# EXPLAIN (buffers, analyze) SELECT COUNT(*) FROM t;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=54822.69..54822.70 rows=1 width=8) (actual time=132.144..132.144 rows=1 loops=1)
   Buffers: shared hit=52 read=2164 written=404
   ->  Gather  (cost=54822.48..54822.69 rows=2 width=8) (actual time=132.133..132.140 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=52 read=2164 written=404
         ->  Partial Aggregate  (cost=53822.48..53822.49 rows=1 width=8) (actual time=129.863..129.863 rows=1 loops=3)
               Buffers: shared hit=155 read=6424 written=1238
               ->  Parallel Index Only Scan using t_pkey on t  (cost=0.43..51330.43 rows=996818 width=0) (actual time=0.023..84.308 rows=799319 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=155 read=6424 written=1238
 Planning time: 0.114 ms
 Execution time: 134.729 ms

Now Parallel Index Only Scan is used, and non-EXPLAIN timings (with \timing) have improved from 180 ms to 80 ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment