Skip to content

Instantly share code, notes, and snippets.

@kitallis
Last active Sep 5, 2017
Embed
What would you like to do?
some notes on using pg_repack

so if table is 350g, free space should be = 700

given you current DB = 900gb, and your disk = 1.3gb, you should bump it up to at least 1.6tb

that will ensure 20% free space and space for an additional table

things to keep in mind before using repack:

1. response time during repack *will* increase – make sure you do it at night
2. all queries will error out during the final swap – expect downtime of a minute or two
3. your disk post repack will fill up faster than usual, because under usual circumstances postgres reuses holes in pages, post repack (or VACUUM) postgres will allocate new pages, this will normalize once it reaches your usual growth of your table
4. if your rate of increase of table/db increase is too high, you will have to repeat this process soon, for eg. if you repack 50gb, and your table grows by 50gb every 5 days, you're back to square 1 after 5 days
5. you can parallelize jobs in repack
6. it runs in a transaction and creates temp table so exiting in the middle is clean
7. cpu *will* spike during repack
8. before running do a test on a snapshot to gather swap time
9. do a dry run on production table with debug logs on (dry-run flag in repack)
10. target table must have a PRIMARY KEY, or at least a UNIQUE total index on a NOT NULL column

read this before doing it https://github.com/reorg/pg_repack/blob/master/doc/pg_repack.rst#details

this is how it replays logs on the main table: https://github.com/reorg/pg_repack/blob/master/lib/repack.c#L756

in my experience, repack has diminishing returns very quick on large tables that update fast, you will have to shard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment