Skip to content

Instantly share code, notes, and snippets.

@nrvale0
Last active March 10, 2017 01:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nrvale0/7fc810178d7663d83048d4430dae75cf to your computer and use it in GitHub Desktop.
Save nrvale0/7fc810178d7663d83048d4430dae75cf to your computer and use it in GitHub Desktop.
BDR madness

This is the happy path. Node is ejected from cluster in an orderly manner and then kubectl is used to spawn a replacement Pod. Although not shown in the output, the new Pod has the same IP address as the old Pod.

In this scenario, everything works as expected.

$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c 'select * from bdr.bdr_nodes'
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395635580046348310 |             1 |      16387 | r           | mydb-0-11ac0600-1489099948 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395635628273594388 |             1 |      16387 | r           | mydb-1-11ac0700-1489099959 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395635633776132116 |             1 |      16387 | r           | mydb-2-11ac0800-1489099960 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(3 rows)
$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c "SELECT bdr.bdr_part_by_node_names(ARRAY['mydb-2-11ac0800-1489099960']);"
 bdr_part_by_node_names 
------------------------
 
(1 row)
$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c 'select * from bdr.bdr_nodes'
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395635580046348310 |             1 |      16387 | r           | mydb-0-11ac0600-1489099948 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395635628273594388 |             1 |      16387 | r           | mydb-1-11ac0700-1489099959 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395635633776132116 |             1 |      16387 | k           | mydb-2-11ac0800-1489099960 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(3 rows)
$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c "delete from bdr.bdr_nodes where node_name='mydb-2-11ac0800-1489099960';"
DELETE 1
$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c 'select * from bdr.bdr_nodes'
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395635580046348310 |             1 |      16387 | r           | mydb-0-11ac0600-1489099948 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395635628273594388 |             1 |      16387 | r           | mydb-1-11ac0700-1489099959 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(2 rows)
$ k8s delete pod mydb-2
pod "mydb-2" deleted
$ k8s get pod mydb-2
NAME      READY     STATUS    RESTARTS   AGE
mydb-2    1/1       Running   0          8s
$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c 'select * from bdr.bdr_nodes'
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395635580046348310 |             1 |      16387 | r           | mydb-0-11ac0600-1489099948 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395635628273594388 |             1 |      16387 | r           | mydb-1-11ac0700-1489099959 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395638050813399062 |             1 |      16387 | r           | mydb-2-11ac0800-1489100523 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(3 rows)
$ psql -W -h mydb-2.mydb -U arepuser -d mydb -c 'select * from bdr.bdr_nodes'
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395635580046348310 |             1 |      16387 | r           | mydb-0-11ac0600-1489099948 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395635628273594388 |             1 |      16387 | r           | mydb-1-11ac0700-1489099959 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395638050813399062 |             1 |      16387 | r           | mydb-2-11ac0800-1489100523 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(3 rows)
$ psql -W -h mydb-2.mydb -U arepuser -d mydb -c 'select * from bdr.bdr_nodes'
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395656649361047574 |             1 |      16387 | r           | mydb-0-11ac0600-1489104853 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395656701025030166 |             1 |      16387 | r           | mydb-2-11ac0800-1489104865 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395656701008429078 |             1 |      16387 | r           | mydb-1-11ac0700-1489104865 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(3 rows)
$ k8s delete pod mydb-2                                                                                                                                              
pod "mydb-2" deleted
$ k8s get pod mydb-2   
NAME      READY     STATUS    RESTARTS   AGE
mydb-2    1/1       Running   0          13s
$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c 'select * from bdr.bdr_nodes'
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395656649361047574 |             1 |      16387 | r           | mydb-0-11ac0600-1489104853 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395656701025030166 |             1 |      16387 | r           | mydb-2-11ac0800-1489104865 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395656701008429078 |             1 |      16387 | r           | mydb-1-11ac0700-1489104865 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395657268910747670 |             1 |      16387 | i           | mydb-2-11ac0800-1489104998 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(4 rows)

Note that in the above output we have two mydb-2* notes; one in the Replicating(r) state and one in the Initializing(i) state.

$ k8s log mydb-2
...

ERROR:  System identification mismatch between connection and slot
DETAIL:  Connection for bdr (6395656701025030166,1,16387,) resulted in slot on node bdr (6395657268910747670,1,16387,) instead of expected node
LOG:  worker process: bdr db: mydb (PID 301) exited with exit code 1
ERROR:  bdr output plugin: slot creation rejected, bdr.bdr_nodes entry for local node (sysid=6395657268910747670, timelineid=1, dboid=16387): status='c', bdr still starting up: catching up from remote node
HINT:  Monitor pg_stat_replication on the remote node, watch the logs and wait until the node has caught up
CONTEXT:  slot "bdr_16387_6395656701008429078_1_16387__", output plugin "bdr", in the startup callback
ERROR:  bdr output plugin: slot creation rejected, bdr.bdr_nodes entry for local node (sysid=6395657268910747670, timelineid=1, dboid=16387): status='c', bdr still starting up: catching up from remote node
HINT:  Monitor pg_stat_replication on the remote node, watch the logs and wait until the node has caught up
CONTEXT:  slot "bdr_16387_6395656649361047574_1_16387__", output plugin "bdr", in the startup callback

Above we can see that the when the replacement Pod is trying to set up replication to the "seed" node the seed node is reporting it is unhappy because when it dials the new Pod it is getting a system ID which it does not expect at that address.

$ helm list                                                                                                                                                          
NAME         	REVISION	UPDATED                 	STATUS  	CHART             
wayfaring-dog	1       	Thu Mar  9 16:14:12 2017	DEPLOYED	postgres-bdr-0.1.0
$ helm upgrade --set replica_count=4 -f postgres-bdr/test/fixtures/common.yaml wayfaring-dog postgres-bdr
Release "wayfaring-dog" has been upgraded. Happy Helming!
LAST DEPLOYED: Thu Mar  9 16:25:25 2017
NAMESPACE: default
STATUS: DEPLOYED
...

Above I perform a Helm upgrade to increase the replica count to 4.

$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c 'select * from bdr.bdr_nodes'
Password for user arepuser: 
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395656649361047574 |             1 |      16387 | r           | mydb-0-11ac0600-1489104853 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395656701025030166 |             1 |      16387 | r           | mydb-2-11ac0800-1489104865 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395656701008429078 |             1 |      16387 | r           | mydb-1-11ac0700-1489104865 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395657268910747670 |             1 |      16387 | i           | mydb-2-11ac0800-1489104998 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395659544276140052 |             1 |      16387 | i           | mydb-3-11ac0900-1489105527 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(5 rows)

$ k8s show log mydb-03
LOG:  starting background worker process "bdr db: mydb"
ERROR:  System identification mismatch between connection and slot
DETAIL:  Connection for bdr 6395656701025030166,1,16387,) resulted in slot on node bdr <em>(6395657268910747670,1,16387,)</em> instead of expected node
LOG:  worker process: bdr db: mydb (PID 164) exited with exit code 1

$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c 'select node_name, node_sysid, node_status from bdr.bdr_nodes'
Password for user arepuser: 
         node_name          |     node_sysid      | node_status 
----------------------------+---------------------+-------------
 mydb-0-11ac0600-1489104853 | 6395656649361047574 | r
 mydb-2-11ac0800-1489104865 | 6395656701025030166 | r
 mydb-1-11ac0700-1489104865 | 6395656701008429078 | r
 mydb-2-11ac0800-1489104998 | 6395657268910747670 | i
 mydb-3-11ac0900-1489105527 | 6395659544276140052 | i
(5 rows)

Above we see that Pod mydb-3 is stuck in state Initializing(i) due to the confusion about the state of the Connection/Slot to the set of mydb-2 entries. Like mydb-0, it cannot establish a replication channel between itself and mydb-2.

$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c "SELECT bdr.bdr_part_by_node_names(ARRAY['mydb-2-11ac0800-1489104865']);"
 bdr_part_by_node_names 
------------------------
 
(1 row)

$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c 'select node_name, node_sysid, node_status from bdr.bdr_nodes'
         node_name          |     node_sysid      | node_status 
----------------------------+---------------------+-------------
 mydb-0-11ac0600-1489104853 | 6395656649361047574 | r
 mydb-1-11ac0700-1489104865 | 6395656701008429078 | r
 mydb-2-11ac0800-1489104998 | 6395657268910747670 | i
 mydb-3-11ac0900-1489105527 | 6395659544276140052 | i
 mydb-2-11ac0800-1489104865 | 6395656701025030166 | k 
(5 rows)

$ k8s log mydb-3
ERROR:  System identification mismatch between connection and slot
DETAIL:  Connection for bdr (6395656701025030166,1,16387,) resulted in slot on node bdr (6395657268910747670,1,16387,) instead of expected node
LOG:  worker process: bdr db: mydb (PID 544) exited with exit code 1

Execute bdr_part_by_node_name to remove the membership for the dead Pod but the dead node remains in state Killing(k) as the responsibility of cleaning up the various BDR table metadata is up to the parting node (which is dead/unavailable). mydb-3 is still unhappy.

So let's manually remove...

$ psql -W -h mydb-0.mydb -U arepuser -d mydb -c "delete from bdr.bdr_nodes where node_name='mydb-2-11ac0800-1489104865'"
DELETE 1

$ k8s log mydb-3
LOG:  starting background worker process "bdr db: mydb"
ERROR:  System identification mismatch between connection and slot
DETAIL:  Connection for bdr (6395656701025030166,1,16387,) resulted in slot on node bdr (6395657268910747670,1,16387,) instead of expected node
LOG:  worker process: bdr db: mydb (PID 577) exited with exit code 1

mydb=# select * from bdr.bdr_connections ;
     conn_sysid      | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |                                 conn_dsn                                  | conn_apply_delay | conn_replication_sets 
---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+---------------------------------------------------------------------------+------------------+-----------------------
 6395656649361047574 |             1 |      16387 | 0                 |                    0 |                 0 | f                      | port=5432 host=172.17.0.6 dbname=mydb user=arepuser password=areppassword |                  | {default}
 6395656701025030166 |             1 |      16387 | 0                 |                    0 |                 0 | f                      | host=172.17.0.8 port=5432 dbname=mydb user=arepuser password=areppassword |                  | {default}
 6395656701008429078 |             1 |      16387 | 0                 |                    0 |                 0 | f                      | host=172.17.0.7 port=5432 dbname=mydb user=arepuser password=areppassword |                  | {default}
(3 rows)

mydb=# select * from pg_replication_slots;
                slot_name                | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn 
-----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
 bdr_16387_6395656701008429078_1_16387__ | bdr    | logical   |  16387 | mydb     | t      |      |          719 | 0/18E2730
 bdr_16387_6395657268910747670_1_16387__ | bdr    | logical   |  16387 | mydb     | f      |      |          712 | 0/18DFA68
 bdr_16387_6395659544276140052_1_16387__ | bdr    | logical   |  16387 | mydb     | f      |      |          712 | 0/18E0260
(3 rows)

The sysid '-166' for the now invalid Pod still shows up in the BDR table though not in the PG system table. Perhaps I can clean the invalid Pod sysid out of of bdr.bdr_node_slots and things will get better?

mydb=# delete from bdr.bdr_node_slots where node_name='mydb-2-11ac0800-1489104865';
DELETE 1

$ k8s log mydb-3
LOG:  starting background worker process "bdr db: mydb"
ERROR:  System identification mismatch between connection and slot
DETAIL:  Connection for bdr (6395656701025030166,1,16387,) resulted in slot on node bdr (6395657268910747670,1,16387,) instead of expected node
LOG:  worker process: bdr db: mydb (PID 720) exited with exit code 1

mydb=# select * from bdr.bdr_nodes;
     node_sysid      | node_timeline | node_dboid | node_status |         node_name          |                              node_local_dsn                              |                             node_init_from_dsn                             | node_read_only 
---------------------+---------------+------------+-------------+----------------------------+--------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------
 6395656649361047574 |             1 |      16387 | r           | mydb-0-11ac0600-1489104853 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword |                                                                            | f
 6395656701008429078 |             1 |      16387 | r           | mydb-1-11ac0700-1489104865 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395657268910747670 |             1 |      16387 | i           | mydb-2-11ac0800-1489104998 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
 6395659544276140052 |             1 |      16387 | i           | mydb-3-11ac0900-1489105527 | host=/var/run/postgresql dbname=mydb user=arepuser password=areppassword | port=5432 dbname=mydb host=mydb-0.mydb user=arepuser password=areppassword | f
(4 rows)

Unfortunately, no. Despite the the fact that a pg_replication_slots entry exists for the new Pod mydb-2 and that bdr.bdr_node_slots for the old mydb-2 has been cleared, neither the new mydb-2 nor the mydb-3 is happy about the status of replication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment