Skip to content

Instantly share code, notes, and snippets.

@stephanGarland
Created February 9, 2024 23:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stephanGarland/0ba0d6348ca0dedae8b4c2269f2b3eb9 to your computer and use it in GitHub Desktop.
Save stephanGarland/0ba0d6348ca0dedae8b4c2269f2b3eb9 to your computer and use it in GitHub Desktop.
Demonstrating Postgres' clustering operation

This table has 1,000,000 rows, consisting of a UUIDv4 PK, a random int of range (1,1000000), and ~1 KiB of Lorem Ipsum text.

postgres=# \d+ uuid_pk
                                          Table "public.uuid_pk"
 Column  |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
---------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 id      | uuid    |           | not null |         | plain    |             |              |
 user_id | integer |           | not null |         | plain    |             |              |
 lorem   | text    |           | not null |         | extended |             |              |
Indexes:
    "uuid_pk_pkey" PRIMARY KEY, btree (id)
Access method: heap

postgres=# SELECT
  pg_size_pretty(index_size) AS index_size,
  internal_pages, leaf_pages,
  avg_leaf_density,
  leaf_fragmentation
  FROM pgstatindex('uuid_pk_pkey');
 index_size | internal_pages | leaf_pages | avg_leaf_density | leaf_fragmentation
------------+----------------+------------+------------------+--------------------
 38 MB      |             24 |       4854 |            71.15 |              50.02
(1 row)

postgres=# SELECT
  pg_size_pretty(table_len) AS table_size,
   tuple_count,
   tuple_percent,
   pg_size_pretty(free_space) AS free_space,
   free_percent
   FROM pgstattuple('public.uuid_pk');
 table_size | tuple_count | tuple_percent | free_space | free_percent
------------+-------------+---------------+------------+--------------
 1116 MB    |     1000000 |         91.86 | 78 MB      |         7.03
(1 row)

The table was then clustered using the UUID PK:

postgres=# CLUSTER VERBOSE uuid_pk USING uuid_pk_pkey;
INFO:  clustering "public.uuid_pk" using sequential scan and sort
INFO:  "public.uuid_pk": found 0 removable, 1000000 nonremovable row versions in 142858 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 5.48 s, system: 10.02 s, elapsed: 49.93 s.
CLUSTER
postgres=# SELECT
  pg_size_pretty(index_size) AS index_size,
  internal_pages, leaf_pages,
  avg_leaf_density,
  leaf_fragmentation
  FROM pgstatindex('uuid_pk_pkey');
 index_size | internal_pages | leaf_pages | avg_leaf_density | leaf_fragmentation
------------+----------------+------------+------------------+--------------------
 30 MB      |             20 |       3832 |            90.03 |                  0
(1 row)

The table was analyzed to pick up the new changes, then an EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) was ran:

postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM uuid_pk;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=26621.03..26621.04 rows=1 width=8) (actual time=315.899..329.927 rows=1 loops=1)
   Buffers: shared hit=47 read=3802
   ->  Gather  (cost=26620.82..26621.03 rows=2 width=8) (actual time=315.495..329.878 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=47 read=3802
         ->  Partial Aggregate  (cost=25620.82..25620.83 rows=1 width=8) (actual time=257.535..257.537 rows=1 loops=3)
               Buffers: shared hit=47 read=3802
               ->  Parallel Index Only Scan using uuid_pk_pkey on uuid_pk  (cost=0.42..24579.15 rows=416669 width=0) (actual time=0.221..200.480 rows=333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=47 read=3802
 Planning:
   Buffers: shared hit=14
 Planning Time: 1.130 ms
 Execution Time: 330.379 ms

Next, the table was re-clustered using user_id, which is a random int – this has the effect of randomly ordering the UUID. To do this, a temporary index was created on the user_id column.

postgres=# CREATE INDEX uuid_pk_int ON uuid_pk (user_id);
CREATE INDEX
Time: 1783.558 ms (00:01.784)
postgres=# CLUSTER VERBOSE uuid_pk USING uuid_pk_int;
INFO:  clustering "public.uuid_pk" using sequential scan and sort
INFO:  "public.uuid_pk": found 0 removable, 1000000 nonremovable row versions in 142858 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 5.06 s, system: 8.81 s, elapsed: 49.16 s.
CLUSTER

Following another analyze, the first SELECT COUNT(*) preferred to use the index on the integer column. Notably, it takes about 73% of the time to do the same as with the UUID PK, due to fewer pages being read (2728 vs 3802).

postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM uuid_pk;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=22161.03..22161.04 rows=1 width=8) (actual time=233.789..239.188 rows=1 loops=1)
   Buffers: shared hit=15 read=2728
   ->  Gather  (cost=22160.82..22161.03 rows=2 width=8) (actual time=233.630..239.174 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=15 read=2728
         ->  Partial Aggregate  (cost=21160.82..21160.83 rows=1 width=8) (actual time=186.978..186.980 rows=1 loops=3)
               Buffers: shared hit=15 read=2728
               ->  Parallel Index Only Scan using uuid_pk_int on uuid_pk  (cost=0.42..20119.15 rows=416669 width=0) (actual time=0.314..133.483 rows=333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=15 read=2728
 Planning:
   Buffers: shared hit=27
 Planning Time: 0.569 ms
 Execution Time: 239.440 ms

That index was then dropped, and the query repeated. Here, despite nearly all pages being in shared_buffers, due to the massive fragmentation causing visibility map lookup misses, leading to 783770 (plus 3834) pages being read. This took about 85% longer to run than with the integer index, and about 28% longer than when clustered.

postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM uuid_pk;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=26621.03..26621.04 rows=1 width=8) (actual time=410.897..422.809 rows=1 loops=1)
   Buffers: shared hit=783770 read=3834
   ->  Gather  (cost=26620.82..26621.03 rows=2 width=8) (actual time=410.493..422.775 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=783770 read=3834
         ->  Partial Aggregate  (cost=25620.82..25620.83 rows=1 width=8) (actual time=360.653..360.655 rows=1 loops=3)
               Buffers: shared hit=783770 read=3834
               ->  Parallel Index Only Scan using uuid_pk_pkey on uuid_pk  (cost=0.42..24579.15 rows=416669 width=0) (actual time=0.305..303.873 rows=333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=783770 read=3834
 Planning:
   Buffers: shared hit=6 dirtied=1
 Planning Time: 0.435 ms
 Execution Time: 422.962 ms

On a cold run (stop Postgres, drop OS caches, restart Postgres), it's understandably even worse:

postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM uuid_pk;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=26620.97..26620.98 rows=1 width=8) (actual time=760.739..771.814 rows=1 loops=1)
   Buffers: shared hit=783756 read=3839
   ->  Gather  (cost=26620.76..26620.97 rows=2 width=8) (actual time=760.466..771.794 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=783756 read=3839
         ->  Partial Aggregate  (cost=25620.76..25620.77 rows=1 width=8) (actual time=744.637..744.639 rows=1 loops=3)
               Buffers: shared hit=783756 read=3839
               ->  Parallel Index Only Scan using uuid_pk_pkey on uuid_pk  (cost=0.42..24579.09 rows=416667 width=0) (actual time=2.078..683.930 rows=333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=783756 read=3839
 Planning:
   Buffers: shared hit=40 read=21
 Planning Time: 33.432 ms
 Execution Time: 773.579 ms

In comparison, a cold run with the index on the integer column:

postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM uuid_pk;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=22161.03..22161.04 rows=1 width=8) (actual time=549.814..559.344 rows=1 loops=1)
   Buffers: shared hit=10 read=2733
   ->  Gather  (cost=22160.82..22161.03 rows=2 width=8) (actual time=549.634..559.323 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=10 read=2733
         ->  Partial Aggregate  (cost=21160.82..21160.83 rows=1 width=8) (actual time=531.750..531.752 rows=1 loops=3)
               Buffers: shared hit=10 read=2733
               ->  Parallel Index Only Scan using uuid_pk_int on uuid_pk  (cost=0.42..20119.15 rows=416669 width=0) (actual time=1.832..462.815 rows=333333 loops=3)
                     Heap Fetches: 0
                     Buffers: shared hit=10 read=2733
 Planning:
   Buffers: shared hit=53 read=27 dirtied=2
 Planning Time: 31.980 ms
 Execution Time: 561.492 ms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment