Skip to content

Instantly share code, notes, and snippets.

@mage2k
Last active August 24, 2018 18:03
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mage2k/747674bb8a1a970007952adc283a857c to your computer and use it in GitHub Desktop.
Save mage2k/747674bb8a1a970007952adc283a857c to your computer and use it in GitHub Desktop.
Given a table bar that we need to compact while maximizing availability of the data in it and minimize load/IO during compaction.
First, if there are any tables with foriegn key columns referencing columns in bar those foreign keys need to be disabled, e.g:
`ALTER TABLE foo DISABLE TRIGGER fkey_bar_id`
Now we can work on compacting the data by copying it to a new table.
SQL to create a new, empty table and have the current table inherit from it:
```
BEGIN;
ALTER TABLE bar RENAME TO bar_old;
CREATE TABLE bar (LIKE bar_old INCLUDING ALL);
ALTER TABLE bar_old INHERIT bar;
ALTER SEQUENCE bar_id_seq OWNED BY bar.id;
COMMIT;
```
Now we have a new, empty table but queries against it will include all the existing data in bar_old thanks to the inheritance link.
Next, loop over batches (e.g. id ranges) from bar_old, deleting them and inserting them into bar (pseudocode):
```
min_id := 0
batch_size := 10000
while true
-- This query will:
-- DELETE up to batch_size rows from bar_old ordered on id
-- INSERT the rows deleted from bar_old into bar
-- return a result set consisting of the ids of the rows inserted into bar
result = execute_sql('WITH del AS (
DELETE FROM bar_old
WHERE id IN (SELECT id
FROM bar_old
WHERE id > #{min_id}
ORDER BY id
LIMIT #{batch_size})
RETURNING *)
INSERT INTO bar
SELECT * FROM del
RETURNING id;')
break if result.count < batch_size
min_id := result.ids.max
```
Confirm no more rows in bar_old, this should return nothing
`SELECT max(id) FROM bar_old;`
Drop the old table
`DROP TABLE bar_old;`
* Fix up index and foreign key names (if needed) on the new bar table
* Re-enable any foreign keys previously disabled. This is going to be doing a lot of reading while holdinng a lock on `foo` to check every value of `foo.bar_id` against the `bar.id` values but as long as both sides are properly indexed it shouldn't be too bad.
`ALTER TABLE foo ENABLE TRIGGER fkey_bar_id`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment