Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jhargis/9e96e98986092db3fc7c62893d34f4d5 to your computer and use it in GitHub Desktop.
Save jhargis/9e96e98986092db3fc7c62893d34f4d5 to your computer and use it in GitHub Desktop.
Trick for removing table bloat without requiring a full table lock and production halt

Given a table bar that we need to compact while maximizing availability of the data in it and minimize load/IO during compaction.

First, if there are any tables with foriegn key columns referencing columns in bar those foreign keys need to be disabled, e.g:

ALTER TABLE foo DISABLE TRIGGER fkey_bar_id

Now we can work on compacting the data by copying it to a new table.

SQL to create a new, empty table and have the current table inherit from it:

BEGIN;

ALTER TABLE bar RENAME TO bar_old;

CREATE TABLE bar (LIKE bar_old INCLUDING ALL);

ALTER TABLE bar_old INHERIT bar;

ALTER SEQUENCE bar_id_seq OWNED BY bar.id;

COMMIT;

Now we have a new, empty table but queries against it will include all the existing data in bar_old thanks to the inheritance link.

Next, loop over batches (e.g. id ranges) from bar_old, deleting them and inserting them into bar (pseudocode):

min_id := 0
batch_size := 10000

while true
  -- This query will:
  -- DELETE up to batch_size rows from bar_old ordered on id
  -- INSERT the rows deleted from bar_old into bar
  -- return a result set consisting of the ids of the rows inserted into bar
  result = execute_sql('WITH del AS (
                          DELETE FROM bar_old
                          WHERE id IN (SELECT id
                                       FROM bar_old
                                       WHERE id > #{min_id}
                                       ORDER BY id
                                       LIMIT #{batch_size})
                          RETURNING *)
                        INSERT INTO bar
                          SELECT * FROM del
                        RETURNING id;')
  break if result.count < batch_size
  min_id := result.ids.max

Confirm no more rows in bar_old, this should return nothing

SELECT max(id) FROM bar_old;

Drop the old table

DROP TABLE bar_old;

  • Fix up index and foreign key names (if needed) on the new bar table

  • Re-enable any foreign keys previously disabled. This is going to be doing a lot of reading while holdinng a lock on foo to check every value of foo.bar_id against the bar.id values but as long as both sides are properly indexed it shouldn't be too bad.

ALTER TABLE foo ENABLE TRIGGER fkey_bar_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment