danbst/postgresql-vacuum-freeze.md

## postgresql-vacuum-freeze.md

      
    Raw
  

              postgresql-vacuum-freeze.md
            
          
    VACUUM FREEZE

Our database has fast rate of transactions. I did think, that such rate is impossible, until I've seen it with my eyes. I've been instinctively insisting on fixing code to reduce number of transactions (sometimes successful), but never checked what are consequences for database.
One of consequences is fast rate of autovacuum VACUUM FREEZE.
VACUUM is essential for Postgresql to mark non-visible rows (those which were normal rows, but became invisible because of update or delete) as visible to PG. Note, that they become visible only to PG, previously PG didn't know they are free for use, and thus had allocated extra buffers in the end of a table for any new rows (inserts or updates). PG not knowing which rows are free for use is another problem, which causes table bloat, which deserves another post.
VACUUM FREEZE is special kind of VACUUM. From resource usage perspective, it differs from VACUUM by that it has to scan full table (normal VACUUM processes only buffers known to contain at least 1 dead row, this information can be retreived from visibility maps). But not only scan, also set "row is quite old" bit for every row that is quite old and save that buffer to disk.
"Quite old row" is essentially every row. For example, you have large table of 100GB. You do select * from that_table limit 1. You've just made a transaction to that table, which changed table's parameter relfrozenxid. You can watch how tables advance their max row age:
SELECT c.oid::regclass as table_name,                                                                                                                                                          
       greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age                                                                                                                                                    
FROM pg_class c                                                                           
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm') order by 2 desc;

If you run that query multiple times, you'll see it advances ALL of the tables max transaction IDs. Even if table is just created, it's max transaction will be advanced on every query:
=> create table test ();
CREATE TABLE
=> select age(c.relfrozenxid) from pg_class c where c.oid::regclass::text = 'test';
  max  
-------
 39024
(1 row)

=> select age(c.relfrozenxid) from pg_class c where c.oid::regclass::text = 'test';
  max  
-------
 40956
(1 row)

Oh wow! You just do SELECT, but it's max transaction age already advances! This is because you do inserts/updates to other tables, VACUUM FREEZE will decide to set "quite old row" for all rows which were created before last INSERT/UPDATE in database, essentially rewriting table!
Yes, VACUUM FREEZE will take your table and rewrite it fully just to make sure it's age(c.relfrozenxid) is 0. So, do you want to burn your disk subsytem a bit? Run a VACUUM FREEZE, no doubts!
Autovacuum VACUUM FREEZE

Our good friend autovacuum does periodic VACUUM FREEZE to every table, to prevent transaction ID wraparound (situation when relfrozenxid equals to 2^31-1). Wait, does it means, that autovacuum periodically rewrites all tables? Say, I have only 5 tables which are 50Gb+, will it cause rewrites for all of them?
Em, yes and no. It will do VACUUM FREEZE but with slightly reduced effect. For append only and select only tables it will freeze only non-freezed yet rows, but for often updated table it will do lots of updates. In both cases it will read full table, to detect min transaction age.
When does autovacuum VACUUM FREEZE happens

Usually if it happens seldomly, we don't have to bother. But I've mentioned we have large number of transactions. Here's simple view on order of tables for autovacuum VACUUM FREEZE:
SELECT
oid::regclass::text AS table,
age(relfrozenxid) AS xid_age,
mxid_age(relminmxid) AS mxid_age,
least(
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_freeze_max_age') - age(relfrozenxid),
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_multixact_freeze_max_age') - mxid_age(relminmxid)
) AS tx_before_wraparound_vacuum,
pg_size_pretty(pg_total_relation_size(oid)) AS size,
pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum
FROM pg_class
WHERE not (relfrozenxid = 0)
AND oid > 16384
ORDER BY tx_before_wraparound_vacuum;

Well, this gives us time in "transactions", not time. So, obviously, we want to know rate of transactions, from which we can determine how much time left to next autovacuum VACUUM FREEZE.
To do this, we create an anchor table, that will track current time and transaction ID:
=> create table temp_age_stat as SELECT now() currenttime, max(age(datfrozenxid)) maxage FROM pg_database where datname = '...';
CREATE TABLE

Then we will periodically run query, that tracks ratio of difference in transactions passed to difference in time passed:
with a1 as (
   select now() - currenttime as since
   , (select age(datfrozenxid) from pg_database where datname = '...') - maxage as transactions_since
   , 1.0*((select age(datfrozenxid) from pg_database where datname = '...') - maxage)/extract(epoch from (now() - currenttime)) as tr_rate
   from temp_age_stat
)
select round(extract(hour from 200000000/tr_rate * interval '1 second')*1.0/24) || ' days you can live without autovacuum freeze'
from a1;
?column?                  
-------------------------------------------
 5 days you can live without autovacuum freeze
(1 row)

What the output of query means - if you do VACUUM FREEZE now (full rewrite of all tables to nullify max transaction age), you'll still start doing autovacuum freezes after 5 days or so. Which is... crap, don't run VACUUM FREEZE, because it's effect will last only 5 days, but cost of VACUUM FREEZE is HUGE (full rewrite of all tables).
If we combine both queries and apply some filters (only consider large table vacuums problematic), we can see the "schedule" of our autovacuum FREEZE:
=> with a1 as (
   select now() - currenttime as since
   , (select age(datfrozenxid) from pg_database where datname = current_database()) - maxage as transactions_since
   , 1.0*((select age(datfrozenxid) from pg_database where datname = current_database()) - maxage)/extract(epoch from (now() - currenttime)) as tr_rate
   from temp_age_stat
),
a2 as (
SELECT
oid::regclass::text AS table,
age(relfrozenxid) AS xid_age,
mxid_age(relminmxid) AS mxid_age,
least(
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_freeze_max_age') - age(relfrozenxid),
(SELECT setting::int
FROM pg_settings
WHERE name = 'autovacuum_multixact_freeze_max_age') - mxid_age(relminmxid)
) AS tx_before_wraparound_vacuum,
pg_size_pretty(pg_total_relation_size(oid)) AS size,
pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum
FROM pg_class
WHERE not (relfrozenxid = 0)
AND oid > 16384
and pg_total_relation_size(oid) > 30000000000
ORDER BY tx_before_wraparound_vacuum
)
select "table", "size", tx_before_wraparound_vacuum, tx_before_wraparound_vacuum::float/(select tr_rate from a1) * interval '1 second' from a2;
           table            |  size  | tx_before_wraparound_vacuum |     ?column?     
----------------------------+--------+-----------------------------+------------------
 table1                     | 101 GB |                  -101847006 | -61:12:49.762825
 table2                     | 81 GB  |                    16993591 | 10:12:49.599474
 table3                     | 29 GB  |                    19971850 | 12:00:13.757778
 table4                     | 34 GB  |                    27997788 | 16:49:39.747442
 table5                     | 33 GB  |                    28881959 | 17:21:32.857716
 table6                     | 37 GB  |                    59350998 | 35:40:19.733347
 table7                     | 63 GB  |                    62436365 | 37:31:35.644802
 table8                     | 181 GB |                    82082633 | 49:20:04.92995
 table9                     | 47 GB  |                   125240669 | 75:16:27.411487
(9 rows)

So, autovacuum for table1 is already running, and 6 more scheduled for next 2 days. Nice. We really blow up our IO subsytem, but not with transactions, but with autovacuum FREEZEs.
TODO: query result is incorrect! Real vacuum duration for table1 is 13 hours. So this must even overestimation of time, and in reality autovacuum freeze will occur for often!
Any solutions here?

The first idea is to bump default autovacuum_freeze_max_age from 200M to 1B. This will 5x increase time between autovacuums. Will it increase chance on reaching transaction ID wraparound? Definitely. You can see that table1 has summarized value -101847006, which means, there passed 101M transactions while it was being vacuumed. The large table like table8 can take even more time, even more transactions will pass. If your vacuum runs for 3 days, with current rate of transactions you can reach +1B max transaction age while it's still running, and who knows, maybe this can be close enough TO DOOM YOUR DATABASE?
So, if you bump autovacuum_freeze_max_age to 1B, you have to ensure your vacuums are faster than 3 days, and you don't cancel those, and you check those are scheduled in time.
So, best way to reduce autovacuum FREEZes is to reduce rate of transactions. Speak to developers and ask them to group their insert/update/delete transaction whenever possible.