Skip to content

Instantly share code, notes, and snippets.

@JosiahSiegel
Last active May 3, 2024 14:28
Show Gist options
  • Save JosiahSiegel/6a1a1c2a37572789b689fd0eee2cc378 to your computer and use it in GitHub Desktop.
Save JosiahSiegel/6a1a1c2a37572789b689fd0eee2cc378 to your computer and use it in GitHub Desktop.
PostgreSQL / Citus Migrate Production Data

⚠️Recommends support request⚠️

No support request

1. Set variables

export PGPASSWORD="<password>"

host="<cluster name>.postgres.cosmos.azure.com"
user="citus"
db="citus"
schema="public"
Optional: create test table
create table public.test (name varchar)
insert into public.test (name) values ('test')

2. Dump schema

pg_dump \
--format=plain \
--no-owner \
--schema-only \
--file=/mnt/storage/$schema.sql \
--schema=$schema \
postgres://$user@$host:5432/$db

3. Dump data

pg_dump \
--format=custom \
--no-owner \
--data-only \
--file=/mnt/storage/${schema}data.dump \
--schema=$schema \
postgres://$user@$host:5432/$db
Dump slow?

Table bloats and vacuuming

High dead_pct (dead tubles percentage) indicates the table may need to be vacuumed:

select 
schemaname,
relname as table_name,
n_dead_tup,
n_live_tup,
round(n_dead_tup::float/n_live_tup::float*100) dead_pct,
autovacuum_count,
last_vacuum,
last_autovacuum,
last_autoanalyze,
last_analyze 
from pg_stat_all_tables 
where n_live_tup >0 
order by round(n_dead_tup::float/n_live_tup::float*100) desc;

Vacuum:

vacuum(analyze, verbose) <table_name>

Reduce network bottlenecks

For Azure, create a Container Instance with a mounted storage account file share. Install a version of PostgreSQL that matches the target server. Open the IP of the container instance on the PostgreSQL server. Run pg_dump on the container instance with the file targeting the storage account mount.

Optional: drop test table
drop table public.test;

4. Restore schema

psql \
-h $host \
-d $db \
-U $user \
-f /mnt/storage/$schema.sql

5. (Citus) Run configuration functions

  • Run your create_distributed_table and create_reference_table statements. If you get an error about foreign keys, it’s generally due to the order of operations. Drop foreign keys before distributing tables and then re-add them.
  • Put the application into maintenance mode, and disable any other writes to the old database.

6. Restore data

pg_restore  \
--host=$host \
--dbname=$db \
--username=$user \
/mnt/storage/${schema}data.dump
Optional: select test table
select * from public.test limit 1;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment