The way I'm reading the Zookeeper Double Barriers recipe, the exit condition (for Leave) is that all children of the barrier node must be deleted before any given client process can complete the Leave operation.
So it seems to me that if a client process Enters the barrier after the minimum number of processes have joined the barrier, it can hold up previously-Entered client processes. I feel like a typical implementation of barriers might have the minimum # of processes equal to the total # of processes, so maybe this is an edge case? And maybe this is all totally OK and normal; I'm just looking to confirm/disconfirm my interpretation.
- Am I thinking about this right? If so, is there actually a problem here? If not, why not?
A related point of confusion for me: the language in the ZooKeeper paper for the Double Barrier alternately says "all of the processes have removed their children" (which sort of jives with the pseudocode in the recipe, except for a possible disagreement about what "all of the processes" means), and "processes watch for a particular child to disappear" (which seems at odds with this pseudocode).
- Are those two wordings in the ZK paper somehow consistent with one another? Assuming so, how come?
OK, based on some actual code in Curator, I think my initial interpretation about 1) makes sense (Curator's implementation is similar to this pseudocode). Not sure if it's actually a problem in practice - probably not.
And on 2), I didn't read carefully enough 😧: the second quote went on to say "To leave, processes watch for a particular child to disappear and only check the exit condition once that znode has been removed." This seems like it's just the performance optimization around lowest/highest process nodes in the recipe.
So I think I get this now, unless somebody wants to come along and shed some additional light on it.