Given a table bar
that we need to compact while maximizing availability of the data in it and minimize load/IO during compaction.
First, if there are any tables with foriegn key columns referencing columns in bar those foreign keys need to be disabled, e.g:
ALTER TABLE foo DISABLE TRIGGER fkey_bar_id
Now we can work on compacting the data by copying it to a new table.
SQL to create a new, empty table and have the current table inherit from it:
BEGIN;
ALTER TABLE bar RENAME TO bar_old;
CREATE TABLE bar (LIKE bar_old INCLUDING ALL);
ALTER TABLE bar_old INHERIT bar;
ALTER SEQUENCE bar_id_seq OWNED BY bar.id;
COMMIT;
Now we have a new, empty table but queries against it will include all the existing data in bar_old thanks to the inheritance link.
Next, loop over batches (e.g. id ranges) from bar_old, deleting them and inserting them into bar (pseudocode):
min_id := 0
batch_size := 10000
while true
-- This query will:
-- DELETE up to batch_size rows from bar_old ordered on id
-- INSERT the rows deleted from bar_old into bar
-- return a result set consisting of the ids of the rows inserted into bar
result = execute_sql('WITH del AS (
DELETE FROM bar_old
WHERE id IN (SELECT id
FROM bar_old
WHERE id > #{min_id}
ORDER BY id
LIMIT #{batch_size})
RETURNING *)
INSERT INTO bar
SELECT * FROM del
RETURNING id;')
break if result.count < batch_size
min_id := result.ids.max
Confirm no more rows in bar_old, this should return nothing
SELECT max(id) FROM bar_old;
Drop the old table
DROP TABLE bar_old;
-
Fix up index and foreign key names (if needed) on the new bar table
-
Re-enable any foreign keys previously disabled. This is going to be doing a lot of reading while holdinng a lock on
foo
to check every value offoo.bar_id
against thebar.id
values but as long as both sides are properly indexed it shouldn't be too bad.
ALTER TABLE foo ENABLE TRIGGER fkey_bar_id