Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@cmeiklejohn
Last active January 27, 2019 16:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cmeiklejohn/c84c58aaad31693872f3c8b02d025036 to your computer and use it in GitHub Desktop.
Save cmeiklejohn/c84c58aaad31693872f3c8b02d025036 to your computer and use it in GitHub Desktop.
Protocol implementation and verification from Demers et al. "Epidemic Algorithms for Replicated Database Maintenance"

Introduction

...

Protocol Implementation: Direct Mail

We start by creating a gen_server for the direct mail protocol implementation. This implementation will support two calls broadcast, for sending a message, and update, for updating the membership received from the Partisan system for when view changes occur. For state at each node, we'll track the currently known membership, so we don't have to look it up every time we want to make a broadcast.

%% API
-export([start_link/0,
         broadcast/2,
         update/1]).

%% gen_server callbacks
-export([init/1,
         handle_call/3,
         handle_cast/2,
         handle_info/2,
         terminate/2,
         code_change/3]).

-record(state, {membership}).

%%%===================================================================
%%% API
%%%===================================================================

start_link() ->
    gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

%% @doc Broadcast.
broadcast(ServerRef, Message) ->
    gen_server:cast(?MODULE, {broadcast, ServerRef, Message}).

%% @doc Membership update.
update(LocalState0) ->
    LocalState = partisan_peer_service:decode(LocalState0),
    gen_server:cast(?MODULE, {update, LocalState}).

We now need to implement the behavior for each of these call backs. First, we define the behavior that occurs when the membership is updated. For this, we will update our local cache of the membership.

handle_cast({update, Membership0}, State) ->
    Membership = membership(Membership0),
    {noreply, State#state{membership=Membership}};

We'll use a helper called membership that will be used to ensure that all of the nodes in the system sort the membership the same way -- this ensures that when we want to begin testing, we remain deterministic.

%% @private -- sort to remove nondeterminism in node selection.
membership(Membership) ->
    lists:usort(Membership).

The direct mail protocol states that for each messages we want to broadcast, we deliver the message to ourself and all of the other members in the cluster that we know about. We can do this by implementing a callback for broadcast that uses the clusters membership and forwards a message to every node that is known in the membership.

Our broadcast function is implemented as follows:

  • We take a destination named process identifier that the message will be forwarded to.

  • We take a message and derive a deterministic, unique identifier for that message.

  • For each known member in the cluster, we forward the message to that node.

  • We use local term storage (ETS) to store the messages that have been received at this node, to prevent duplicate processing if we happen to retransmit or receive a duplicate of this message. This also serves to keep track of delivered messages for later assertions on whether or not all of the messages were transmitted and received when trying to test our protocol.

handle_cast({broadcast, ServerRef, Message}, #state{membership=Membership}=State) ->
    Manager = manager(),

    %% Generate message id.
    MyNode = partisan_peer_service_manager:mynode(),
    Id = {MyNode, erlang:unique_integer([monotonic, positive])},

    %% Forward to process.
    partisan_util:process_forward(ServerRef, Message),

    %% Store outgoing message.
    true = ets:insert(?MODULE, {Id, Message}),

    %% Forward messages.
    lists:foreach(fun(N) ->
        lager:info("~p: sending broadcast message to node ~p: ~p", [node(), N, Message]),
        Manager:forward_message(N, ?GOSSIP_CHANNEL, ?MODULE, {broadcast, Id, ServerRef, Message})
    end, membership(Membership) -- [MyNode]),

    {noreply, State};

Now, we need to define how messages should be handled upon receipt by the other nodes in the cluster. We implement a handler for incoming messages, that pattern matches on the body of the message and takes action accordingly. In this example, when a message is received by a node, we forward to the destination process, if we haven't seen the message yet; otherwise, we drop the message without further processing.

Here's our implementation of the message receipt handler.

%% Incoming messages.
handle_info({broadcast, Id, ServerRef, Message}, State) ->
    lager:info("~p received value from broadcast: ~p", [node(), Message]),

    case ets:lookup(?MODULE, Id) of
        [] ->
            %% Forward to process.
            partisan_util:process_forward(ServerRef, Message),

            %% Store.
            true = ets:insert(?MODULE, {Id, Message}),

            ok;
        _ ->
            ok
    end,

    {noreply, State};

With that, the direct mail protocol implementation is finished.

Testing the Implementation: Network Partitions

By now, it should be clear that the direct mail protocol has two big drawbacks. While efficient on the wire, the protocol will suffer and fail delivery of messages under both network partitions and membership changes.

Let's see if we can identify these problems with our testing infrastructure.

Methodology

To test these bugs, we use Partisan's build in testing infrastructure. We start by using the Partisan reliable broadcast model, which takes an input model and states that every transmitted message should be received by all nodes in the cluster after a quiesence period.

The reliable broadcast model operates as follows, and expects that client implementations provide the following behavior:

  • A broadcast function that takes a message and destination process identifier. This process is expected to be running at all nodes in the cluster, and is automatically started by the test harness.

Now specified, Partisan's testing infrastructure will automatically generate random schedules of commands and at each command, insert that the postconditions from each command return true. Partisan's commands are selected from the following types of commands:

  • Membership Commands: Maintaining a minimum number of nodes in the cluster, Partisan will perform random join and leave operations for a number of additional nodes. This ensures that behavior remains correct under cluster transitions.

  • Model Commands: These commands are the model specific commands that drive application behaior. One of these was previously discussed, broadcast, but the reliable broadcast model supplies an additional function called check_mailnox. This function messages the destination process and asks what messages have been delivered to perform assertions on whether or not the protocol is operating correctly.

  • Fault Commands: Given a fault model, introduce a number of faults, including, but not limited to, stop failures, crash failures, send-omission failures, receive-omission failures, general omission failures, and arbitrary failures. Failures will only be introduced given a failure tolerance level, specified by the application developer.

Network Partitions

Let's start by testing the model under network partitions. We start by running Partisan's fault-injector to find a counterexample with a failure tolerance of 1 simultaneous failure. We can do this simply by running the following command:

$ FAULT_TOLERANCE=1 bin/counterexample-find.sh

We find our first counterexample. Partisan produces the full execution trace in the output, but we retain only the most important parts for our explanation here.

After several schedule tests and commands, our counterexample looks as follows. We see that node_5 is missing two messages in the mailbox assertion.

14:07:04.762 [info] prop_partisan: command conclusion fired for command at node node_5: [check_mailbox,node_5]
14:07:04.763 [info] prop_partisan_reliable_broadcast: verifying mailbox at node node_5: sent: [{298,node_3,-8},{320,node_4,-27},{334,node_4,8},{343,node_2,25},{346,node_3,-6},{350,node_1,-38},{354,node_3,-19}], received: [{298,node_3,-8},{320,node_4,-27},{334,node_4,8},{343,node_2,25},{350,node_1,-38}]
14:07:04.763 [info] prop_partisan_reliable_broadcast: verification of mailbox at node node_5 failed, missing: [{346,node_3,-6},{354,node_3,-19}], received: [{298,node_3,-8},{320,node_4,-27},{334,node_4,8},{343,node_2,25},{350,node_1,-38}]
14:07:04.763 [info] prop_partisan: postcondition result: false; command: prop_partisan_reliable_broadcast:check_mailbox([node_5])

If we look at an excerpt of the message trace of the execution, we see the following:

14:07:04.777 [info] partisan_trace_orchestrator: node_3@GS18227 <- node_1@GS18227: {forward_message,demers_direct_mail,{broadcast,{node_1@GS18227,2},receiver,{350,node_1,-38}}}
14:07:04.777 [info] partisan_trace_orchestrator: node_3@GS18227 => node_5@GS18227: DROPPED {broadcast,{node_3@GS18227,3},receiver,{346,node_3,-6}}
14:07:04.777 [info] partisan_trace_orchestrator: node_3@GS18227 => node_1@GS18227: {forward_message,demers_direct_mail,{broadcast,{node_3@GS18227,4},receiver,{354,node_3,-19}}}
14:07:04.777 [info] partisan_trace_orchestrator: node_1@GS18227 <- node_3@GS18227: {forward_message,demers_direct_mail,{broadcast,{node_3@GS18227,4},receiver,{354,node_3,-19}}}
14:07:04.777 [info] partisan_trace_orchestrator: node_3@GS18227 => node_2@GS18227: {forward_message,demers_direct_mail,{broadcast,{node_3@GS18227,4},receiver,{354,node_3,-19}}}
14:07:04.777 [info] partisan_trace_orchestrator: node_2@GS18227 <- node_3@GS18227: {forward_message,demers_direct_mail,{broadcast,{node_3@GS18227,4},receiver,{354,node_3,-19}}}
14:07:04.777 [info] partisan_trace_orchestrator: node_3@GS18227 => node_4@GS18227: {forward_message,demers_direct_mail,{broadcast,{node_3@GS18227,4},receiver,{354,node_3,-19}}}
14:07:04.777 [info] partisan_trace_orchestrator: node_4@GS18227 <- node_3@GS18227: {forward_message,demers_direct_mail,{broadcast,{node_3@GS18227,4},receiver,{354,node_3,-19}}}
14:07:04.778 [info] partisan_trace_orchestrator: node_3@GS18227 => node_5@GS18227: DROPPED {broadcast,{node_3@GS18227,4},receiver,{354,node_3,-19}}

We see from the trace that Partisan has introduced several failures randomly throughout the execution: both send omission and receive omission failures.

Identifying and Replaying the Fault

We can see from the assertion that node_5 is missing two messages from node_3. Examining the message trace, it is clear that the send omission failure that prohibited node_3 from sending to node_5 caused the two message omissions resulting in the failure; therefore, reliable broadcast cannot be satisfied under this failure model.

We can replay our fault using Partisan's deterministic testing replay behavior. This will use the previous trace and command schedule to run the same set of commands and enforce the message delivery order using barriers to ensure deterministic replay of messages on the network.

We can run the following command for replay:

$ FAULT_TOLERANCE=1 bin/counterexample-replay.sh
Staging counterexample...
Replaying counterexample...

Replaying the fault will produce the same output as the previous example did.

We see under replay we have the same failure:

14:17:57.245 [info] prop_partisan_reliable_broadcast: verifying mailbox at node node_5: sent: [{298,node_3,-8},{320,node_4,-27},{334,node_4,8},{343,node_2,25},{346,node_3,-6},{350,node_1,-38},{354,node_3,-19}], received: [{298,node_3,-8},{320,node_4,-27},{334,node_4,8},{343,node_2,25},{350,node_1,-38}]
14:17:57.245 [info] prop_partisan_reliable_broadcast: verification of mailbox at node node_5 failed, missing: [{346,node_3,-6},{354,node_3,-19}], received: [{298,node_3,-8},{320,node_4,-27},{334,node_4,8},{343,node_2,25},{350,node_1,-38}]
14:17:57.245 [info] prop_partisan: postcondition result: false; command: prop_partisan_reliable_broadcast:check_mailbox([node_5])

...

Counterexample held and replayed...

Failure Minimization

Since schedules are randomly generated, we may generate schedules that are not minimal -- as in, just the failures needed to induce the fault. Therefore, Partisan provides a mechanism for minimizing failures. Failure minimization starts from the end of a schedule and shrinks the execution by removing failure commands in an attempt to identify a minimal set of failures that keep the counterexample valid.

We can run the following command for shrinking:

$ FAULT_TOLERANCE=1 bin/counterexample-minimize-failures.sh
Staging shrunk counterexample...
Loading commands...
{set,{var,1},
     {call,prop_partisan_crash_fault_model,begin_receive_omission,
           [node_1,node_3]}}.
{set,{var,2},
     {call,prop_partisan_reliable_broadcast,broadcast,[node_3,{298,-8}]}}.
{set,{var,3},
     {call,prop_partisan_crash_fault_model,end_receive_omission,
           [node_1,node_3]}}.
{set,{var,4},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_5]}}.
{set,{var,5},
     {call,prop_partisan_crash_fault_model,begin_send_omission,
           [node_3,node_5]}}.
{set,{var,6},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_3]}}.
{set,{var,7},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_4]}}.
{set,{var,8},
     {call,prop_partisan_reliable_broadcast,broadcast,[node_4,{320,-27}]}}.
{set,{var,9},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_1]}}.
{set,{var,10},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_2]}}.
{set,{var,11},
     {call,prop_partisan_reliable_broadcast,broadcast,[node_4,{334,8}]}}.
{set,{var,12},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_3]}}.
{set,{var,13},
     {call,prop_partisan_reliable_broadcast,broadcast,[node_2,{343,25}]}}.
{set,{var,14},
     {call,prop_partisan_reliable_broadcast,broadcast,[node_3,{346,-6}]}}.
{set,{var,15},
     {call,prop_partisan_reliable_broadcast,broadcast,[node_1,{350,-38}]}}.
{set,{var,16},
     {call,prop_partisan_reliable_broadcast,broadcast,[node_3,{354,-19}]}}.
{set,{var,17},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_1]}}.
{set,{var,18},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_3]}}.
{set,{var,19},{call,prop_partisan_reliable_broadcast,check_mailbox,[node_5]}}.
Removing the following command from the trace:
 {set,{var,5},
      {call,prop_partisan_crash_fault_model,begin_send_omission,
            [node_3,node_5]}}
To be removed from trace:
 {exit_command,{node_3,[begin_send_omission,node_5]}}.
 {enter_command,{node_3,[begin_send_omission,node_5]}}.

Here we see that both the command schedule and trace schedule are modified to remove the first failure identified: the node_3 send omission failure to node_5. Partisan will re-run the counterexample using this modified schedule -- but preserving the ordering of messages and determinizing any new messages in the schedule that result from shrinking. These new messages can occur when a partition or crash is removed from the schedule: more messages may be introduced into the network and must be deterministically ordered in the schedule with respect to the existing messages and commands.

OK: The input passed the test.
===>
1/1 counterexamples passed
Minimal counterexample found...

Here, we see that by removing the first partition in the schedule (discovered in reverse order) that the counterexample no longer holds. Therefore, further minimization is stopped. It should be noted here that this example is not minimal -- the initial partition created is not removed because shrinking at the moment happens in reverse order -- from the back of the schedule forward -- by removing commands that introduce failures and resolve them in sequence.

Resolution

Message omissions are problematic when using protocols that only transmit messages once. In this case, a network partition causes messages to be omitted between two participants, resulting in two messages that never get delivered to one of the nodes in the cluster.

One solution to the problem is to use message acknowledgements. We can use a modified Partisan call when transmitting messages to ensure that we continue to redeliver messages until the remote node acknowledges them. That ensure that when a partition is resolved, these messages will eventually be delivered and we will be able to fulfill reliable broadcast.

Here, we provide an example where we modify our call for message forwarding to ask for message acknowledgements. THis call specifies that this message needs to be acknowledged by using the [{ack, true}] tuple as an optional argument.

   %% Forward messages.
    lists:foreach(fun(N) ->
        lager:info("~p: sending broadcast message to node ~p: ~p", [node(), N, Message]),
        Manager:forward_message(N, ?GOSSIP_CHANNEL, ?MODULE, {broadcast, Id, ServerRef, Message}, [{ack, true}])
    end, membership(Membership) -- [MyNode]),

By adding message acknowledgements to the direct mail protocol, which only performs single message transmissions, we can ensure that we are able to recover from message omission failures if and when the network partition heals. This is provided by the mechhanism in Partisan that will retransmit and deduplicate messages based on message identifier, as shown above using the [{ack, true}] option.

However, we quickly discover through random testing that this may not be enough if the network partition doesn't heal. Consider the following execution where the network partition does not recover -- the retransmission results in more omitted messages.

Here's another generated test that fails, due to message omissions.

11:42:10.587 [info] prop_partisan_crash_fault_model: setting fault tolerance level to: 1
11:42:10.587 [info] prop_partisan_reliable_broadcast: executing broadcast command: node_1 => {19,-3}
11:42:10.587 [info] prop_partisan: command preamble fired for command at node node_1: [broadcast,node_1,19,-3]
11:42:10.587 [info] prop_partisan_reliable_broadcast: broadcast from node node_1 message: {19,node_1,-3}
11:42:10.588 [info] prop_partisan: command conclusion fired for command at node node_1: [broadcast,node_1,19,-3]
11:42:10.588 [info] prop_partisan_reliable_broadcast: executing broadcast command: node_4 => {22,6}
11:42:10.588 [info] prop_partisan: command preamble fired for command at node node_4: [broadcast,node_4,22,6]
11:42:10.588 [info] prop_partisan_reliable_broadcast: broadcast from node node_4 message: {22,node_4,6}
11:42:10.589 [info] prop_partisan: command conclusion fired for command at node node_4: [broadcast,node_4,22,6]
11:42:10.590 [info] prop_partisan_reliable_broadcast: executing broadcast command: node_1 => {27,-3}
11:42:10.590 [info] prop_partisan: command preamble fired for command at node node_1: [broadcast,node_1,27,-3]
11:42:10.590 [info] prop_partisan_reliable_broadcast: broadcast from node node_1 message: {27,node_1,-3}
11:42:10.592 [info] prop_partisan: command conclusion fired for command at node node_1: [broadcast,node_1,27,-3]
11:42:10.593 [info] prop_partisan: command preamble fired for command at node node_4: [begin_send_omission,node_2]
11:42:10.594 [info] prop_partisan_crash_fault_model: begin_send_omission: source_node node_4 destination_node node_2
11:42:10.596 [info] prop_partisan: command conclusion fired for command at node node_4: [begin_send_omission,node_2]
11:42:10.598 [info] prop_partisan_reliable_broadcast: executing check mailbox command: node_2
11:42:10.598 [info] prop_partisan: command preamble fired for command at node node_2: [check_mailbox,node_2]
11:42:10.599 [info] prop_partisan_reliable_broadcast: waiting for message quiescence at node node_2
11:42:30.600 [info] prop_partisan: command conclusion fired for command at node node_2: [check_mailbox,node_2]
11:42:30.600 [info] prop_partisan_reliable_broadcast: verifying mailbox at node node_2:
11:42:30.600 [info] prop_partisan_reliable_broadcast:  => sent: [{19,node_1,-3},{22,node_4,6},{27,node_1,-3}]
11:42:30.600 [info] prop_partisan_reliable_broadcast:  => received: [{19,node_1,-3},{27,node_1,-3}]
11:42:30.600 [info] prop_partisan_reliable_broadcast: verification of mailbox at node node_2 failed, missing: [{22,node_4,6}], received: [{19,node_1,-3},{27,node_1,-3}]
11:42:30.600 [info] prop_partisan: postcondition result: false; command: prop_partisan_reliable_broadcast:check_mailbox([node_2])
11:42:30.601 [info] prop_partisan_reliable_broadcast: ending case

Here's the resulting trace, showing that even retranmission of dropped messages is only sufficient if the partition heals.

11:42:30.601 [info] partisan_trace_orchestrator: node_1 entering command: [broadcast,node_1,19,-3]
11:42:30.601 [info] partisan_trace_orchestrator: node_1 leaving command: [broadcast,node_1,19,-3]
11:42:30.601 [info] partisan_trace_orchestrator: node_4 entering command: [broadcast,node_4,22,6]
11:42:30.601 [info] partisan_trace_orchestrator: node_4 leaving command: [broadcast,node_4,22,6]
11:42:30.601 [info] partisan_trace_orchestrator: node_1 entering command: [broadcast,node_1,27,-3]
11:42:30.601 [info] partisan_trace_orchestrator: node_1 leaving command: [broadcast,node_1,27,-3]
11:42:30.601 [info] partisan_trace_orchestrator: node_4 entering command: [begin_send_omission,node_2]
11:42:30.601 [info] partisan_trace_orchestrator: node_1@parrhesia => node_2@parrhesia: {forward_message,node_1@parrhesia,{undefined,[{node_1@parrhesia,1}]},demers_direct_mail,{broadcast,{node_1@parrhesia,2},receiver,{19,node_1,-3}}}
11:42:30.601 [info] partisan_trace_orchestrator: node_4 leaving command: [begin_send_omission,node_2]
11:42:30.601 [info] partisan_trace_orchestrator: node_2 entering command: [check_mailbox,node_2]
11:42:30.601 [info] partisan_trace_orchestrator: node_1@parrhesia => node_3@parrhesia: {forward_message,node_1@parrhesia,{undefined,[{node_1@parrhesia,2}]},demers_direct_mail,{broadcast,{node_1@parrhesia,2},receiver,{19,node_1,-3}}}
11:42:30.601 [info] partisan_trace_orchestrator: node_4@parrhesia => node_1@parrhesia: {forward_message,node_4@parrhesia,{undefined,[{node_4@parrhesia,1}]},demers_direct_mail,{broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}}
11:42:30.601 [info] partisan_trace_orchestrator: node_1@parrhesia => node_4@parrhesia: {forward_message,node_1@parrhesia,{undefined,[{node_1@parrhesia,3}]},demers_direct_mail,{broadcast,{node_1@parrhesia,2},receiver,{19,node_1,-3}}}
11:42:30.601 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.601 [info] partisan_trace_orchestrator: node_1@parrhesia => node_2@parrhesia: {forward_message,node_1@parrhesia,{undefined,[{node_1@parrhesia,4}]},demers_direct_mail,{broadcast,{node_1@parrhesia,3},receiver,{27,node_1,-3}}}
11:42:30.601 [info] partisan_trace_orchestrator: node_4@parrhesia => node_3@parrhesia: {forward_message,node_4@parrhesia,{undefined,[{node_4@parrhesia,3}]},demers_direct_mail,{broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}}
11:42:30.601 [info] partisan_trace_orchestrator: node_1@parrhesia => node_3@parrhesia: {forward_message,node_1@parrhesia,{undefined,[{node_1@parrhesia,5}]},demers_direct_mail,{broadcast,{node_1@parrhesia,3},receiver,{27,node_1,-3}}}
11:42:30.601 [info] partisan_trace_orchestrator: node_1@parrhesia => node_4@parrhesia: {forward_message,node_1@parrhesia,{undefined,[{node_1@parrhesia,6}]},demers_direct_mail,{broadcast,{node_1@parrhesia,3},receiver,{27,node_1,-3}}}
11:42:30.601 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.601 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_4@parrhesia => node_2@parrhesia: DROPPED {broadcast,{node_4@parrhesia,2},receiver,{22,node_4,6}}
11:42:30.602 [info] partisan_trace_orchestrator: node_2 leaving command: [check_mailbox,node_2]
11:42:30.631 [info] partisan_trace_orchestrator: writing trace.

We'll come back to addressing this failure shortly.

Testing the Implementation: Crash Failures

Assuming that the system will recover from network partitions eventually, we soon discover that the acknowledgement mechanism is not enough when the system has to consider crash failures. If we consider the case where a node has buffered a message and is retransmitting that message, a crash of that node will prevent the message from ever being delivered to the destination.

In short acknowledgements are sufficient for being robust against general omission failures with the direct mail protocol; however, acknowledgements are not sufficient for being robust against crash failures.

TODO: Insert example of the trace from a crash failure rendering the system unavailable.

Backup Protocol: Anti-Entropy

As Demers et al. discuss, it may be useful to pair a protocol like direct mail with a resilient, more expensive, repair protocol such as anti-entropy. Anti-entropy is a protocol that is designed to be resilient under failures, as the cost of execution. With anti-entropy, processes store all received messages and periodically select a peer to exchange message lists with (or, in the paper the values of objects in the database.) This process is expensive: it requires maintaining message identifiers (if updates are stored in a log) or checksums (if objects are stored in mutable state). When exchanges happen, nodes first exchange metadata describing which updates they have received, based on either checksums, timestamps, or message identifiers, and subsequently send their updates. This can be either pull based, push based or push-pull based, where full exchanges occur between peers.

We can implement anti-entropy as a simple extenstion of our direct mail protocol, to function as a complimentary protocol, or we can implementt anti-entropy as a standalone protocol, which, while being extremely expensive, will satisfy the property of reliable broadcast.

Here's a lightweight implementation of anti-entropy using Partisan.

TODO: Add the implementation of anti-entropy.

We can demonstrate that using the counterexamples from our direct mail example, that either using anti-entropy as a standalone protocol, or using it to repair missing messages from the direct mail protocol, still satisfies the desired reliable broadcast property.

TODO: Example of the test suite passing.

More Efficient Pairing: Rumor-Mongering + Anti-Entropy

...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment