Dynamic addition of candidate nodes in gen_leader

  • Download Gist
Dynamic addition of candidate nodes in gen_leader
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Algorithm
----------
 
Let's say we have a cluster of nodes A < B < C where A is the leader, and B and C are
candidates. We want to add a new candidate D to the cluster, and we select node B as
the seed node.
When node D is started, the following happens:
 
(1) D sends a 'join' msg to B, and D starts monitoring B
(2) B replies with a 'hasLeader, A' message
(3) D starts monitoring A, and D sends a 'isLeader' message to A
(4) A detects that D is a new node and adds it to the candidates list assigning it the lowest priority
(5) A sends 'update_candidates,Candidates' message to all the candidates that are alive except D
(6) A sends a 'ldr' message to D
(7) D accepts A as the leader, updates its candidates list obtained from the 'ldr' message and
starts monitoring all the candidates with higher priority.
 
When the process finishes the candidates list will be: A < B < C < D
 
 
Some Failure Scenarios
-----------------------
 
I) B crashes in (1) or (2)
In this case, node D would receive a DOWN message and crash.
 
II) A crashes in (3)
In this case, node D would check if node B is still alive. If it is, the joining procedure is
restarted. Node B will handle the 'join' message from D when the election procedure is completed.
If node B is down, then node D will crash.
 
III) A crashes in (5) right after sending the 'update_candidates' message to B.
For node D this case is exactly like II) above.
B and C will have disparate candidate lists when the election procedure begins. B will have
"A < B < C < D", and C will have "A < B < C".
That won't be a problem because D has the lowest priority, so there's no way that it could be
elected.
When the election procedure starts, D will receive a 'halt' message from B. D will take the
normal action, set its status to wait and reply with an 'ack' message to B. Later, it will
receive a 'ldr' message from B, accept B as leader, update the candidates list and set its
status to norm. C will also receive a 'ldr' message and update its candidates lists.
If B dies in the middle of the election procedure, D will die too.
 
IV) A dies right after finishing (5) and before starting (6)
Exactly the same thing as III) above.
 
V) D crashes somewhere between (5) and (6)
The rest of the candidates will have D on the list so they will treat it like a normal
candidate. When D starts again, the joining procedure will be performed normally except that
step (5) will be avoided because all the candidates are already aware of D.

hello,

did you implement this in the gen_leader module ?

thanks

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.