Skip to content

Instantly share code, notes, and snippets.

@cabecada
Created July 10, 2021 16:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cabecada/89ccbd76887569089586926e7d4f575d to your computer and use it in GitHub Desktop.
Save cabecada/89ccbd76887569089586926e7d4f575d to your computer and use it in GitHub Desktop.
reattach replica to primary
##create a db
initdb -D primary
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
creating directory primary ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Kolkata
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:
pg_ctl -D primary -l logfile start
# start the primary
postgres@db:~/playground/demo$ pg_ctl -D primary -l logfile start
waiting for server to start.... done
server started
## create a replica to the above primary, the command creates replication slot in primary as well as update recovery conf
postgres@db:~/playground/demo$ pg_basebackup -D replica -R -X stream -c fast -C -P -S replica -v -d "port=5432" -p 5432 -h 127.0.0.1 -U postgres
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created replication slot "replica"
25448/25448 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
## start the replica on port 5433
postgres@db:~/playground/demo$ echo "port=5433" >> replica/postgresql.conf
postgres@db:~/playground/demo$ pg_ctl -D replica -l logfile2 start
waiting for server to start.... done
server started
## verify replication slot created on primary and is active
postgres@db:~/playground/demo$ psql -Ppager -p 5432 -x -c 'select * from pg_replication_slots;'
-[ RECORD 1 ]-------+----------
slot_name | replica
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 1652
xmin |
catalog_xmin |
restart_lsn | 0/3000060
confirmed_flush_lsn |
wal_status | reserved
safe_wal_size |
two_phase | f
# stop the replica for some reason
postgres@db:~/playground/demo$ pg_ctl -D replica -l logfile2 stop
waiting for server to shut down.... done
server stopped
##drop the replication slot to simulate issue with replica
postgres@db:~/playground/demo$ psql -p 5432 -c "select pg_drop_replication_slot('replica');"
pg_drop_replication_slot
--------------------------
(1 row)
## verify it is dropped
postgres@db:~/playground/demo$ psql -Ppager -p 5432 -x -c 'select * from pg_replication_slots;'
(0 rows)
## make changes to primary yo move lsn forward
postgres@db:~/playground/demo$ psql -p 5432 -c 'create table t(id int primary key); insert into t select x from generate_series(1, 1000) x;'
INSERT 0 1000
## start replica, it will keep throwing errors, as it will try to initate connection with the primary but no slot exists
postgres@db:~/playground/demo$ pg_ctl -D replica -l logfile2 start
waiting for server to start.... done
server started
postgres@db:~/playground/demo$ tail logfile2
2021-07-10 22:12:06.846 IST [1762] LOG: database system was shut down in recovery at 2021-07-10 22:10:42 IST
2021-07-10 22:12:06.846 IST [1762] LOG: entering standby mode
2021-07-10 22:12:06.848 IST [1762] LOG: redo starts at 0/2000028
2021-07-10 22:12:06.848 IST [1762] LOG: consistent recovery state reached at 0/3000060
2021-07-10 22:12:06.848 IST [1762] LOG: invalid record length at 0/3000060: wanted 24, got 0
2021-07-10 22:12:06.848 IST [1761] LOG: database system is ready to accept read only connections
2021-07-10 22:12:06.851 IST [1766] FATAL: could not start WAL streaming: ERROR: replication slot "replica" does not exist
2021-07-10 22:12:06.853 IST [1768] FATAL: could not start WAL streaming: ERROR: replication slot "replica" does not exist
2021-07-10 22:12:11.861 IST [1770] FATAL: could not start WAL streaming: ERROR: replication slot "replica" does not exist
2021-07-10 22:12:16.864 IST [1773] FATAL: could not start WAL streaming: ERROR: replication slot "replica" does not exist
## create the replication slot on primary
postgres@db:~/playground/demo$ psql -Ppager -p 5432 -x -c "select pg_create_physical_replication_slot('replica');"
-[ RECORD 1 ]-----------------------+-----------
pg_create_physical_replication_slot | (replica,)
## this is important, if from the time since the replica was stopped, till it started, the wals on the primary are available
## and there is no divergence due to purging etc, creating a replication slot would start the replica fine
## but if the replica and primary have diverged so much, the replica will start throwing messages not able to find the right offset etc
## that would probably mean, rebuild the replica with pg_basebackup
postgres@db:~/playground/demo$ tail logfile2
2021-07-10 22:12:46.898 IST [1789] FATAL: could not start WAL streaming: ERROR: replication slot "replica" does not exist
2021-07-10 22:12:51.902 IST [1791] FATAL: could not start WAL streaming: ERROR: replication slot "replica" does not exist
2021-07-10 22:12:56.900 IST [1793] FATAL: could not start WAL streaming: ERROR: replication slot "replica" does not exist
2021-07-10 22:13:01.911 IST [1796] FATAL: could not start WAL streaming: ERROR: replication slot "replica" does not exist
2021-07-10 22:13:06.910 IST [1803] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
## verify all data is available on replica
postgres@db:~/playground/demo$ psql -p 5433
psql (14beta1)
Type "help" for help.
postgres=# select count(1) from t;
count
-------
1000
(1 row)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment