Let's say we have a cluster of nodes A < B < C where A is the leader, and B and C are
candidates. We want to add a new candidate D to the cluster, and we select node B as
the seed node.
When node D is started, the following happens:
(1) D sends a 'join' msg to B, and D starts monitoring B
(2) B replies with a 'hasLeader, A' message
(3) D starts monitoring A, and D sends a 'isLeader' message to A
(4) A detects that D is a new node and adds it to the candidates list assigning it the lowest priority
(5) A sends 'update_candidates,Candidates' message to all the candidates that are alive except D
(6) A sends a 'ldr' message to D
(7) D accepts A as the leader, updates its candidates list obtained from the 'ldr' message and
starts monitoring all the candidates with higher priority.
When the process finishes the candidates list will be: A < B < C < D
Some Failure Scenarios
I) B crashes in (1) or (2)
In this case, node D would receive a DOWN message and crash.
II) A crashes in (3)
In this case, node D would check if node B is still alive. If it is, the joining procedure is
restarted. Node B will handle the 'join' message from D when the election procedure is completed.
If node B is down, then node D will crash.
III) A crashes in (5) right after sending the 'update_candidates' message to B.
For node D this case is exactly like II) above.
B and C will have disparate candidate lists when the election procedure begins. B will have
"A < B < C < D", and C will have "A < B < C".
That won't be a problem because D has the lowest priority, so there's no way that it could be
When the election procedure starts, D will receive a 'halt' message from B. D will take the
normal action, set its status to wait and reply with an 'ack' message to B. Later, it will
receive a 'ldr' message from B, accept B as leader, update the candidates list and set its
status to norm. C will also receive a 'ldr' message and update its candidates lists.
If B dies in the middle of the election procedure, D will die too.
IV) A dies right after finishing (5) and before starting (6)
Exactly the same thing as III) above.
V) D crashes somewhere between (5) and (6)
The rest of the candidates will have D on the list so they will treat it like a normal
candidate. When D starts again, the joining procedure will be performed normally except that
step (5) will be avoided because all the candidates are already aware of D.