Skip to content

Instantly share code, notes, and snippets.

@ConnorDoyle
Created September 5, 2012 14:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ConnorDoyle/3637670 to your computer and use it in GitHub Desktop.
Save ConnorDoyle/3637670 to your computer and use it in GitHub Desktop.
Akka Cluster Experiment

Introduction

Problem Description

This is an attempt to communicate unexpected behavior using the 2.1-SNAPSHOT version of Akka's cluster module.

Specifically, the unexpected behavior my application is experiencing is that when the cluster leader becomes unreachable due to a SIGINT, it is unable to re-join the cluster when restarted. Strangely, this always works when the node that is killed is not the leader.

Preliminaries

The output listings here consist of handpicked "important" logged events during the course of an Akka cluster session. There are two nodes in the system: node1.mydomain.com and node2.mydomain.com. Both nodes are running identical software, freshly pulled from source control.

Because of the way Akka does leader selection, node1 is always the cluster leader when both nodes are members.

Akka Setup

The Akka configuration is identical on node1 and node2.

node {
  akka {
    log-config-on-start = "on"
    actor.provider = "akka.remote.RemoteActorRefProvider"
    cluster {
      nodename = "node"
      auto-join = "on"
      auto-down = "on"
      seed-nodes = [
        "akka://node@node1.mydomain.com:32001",
        "akka://node@node2.mydomain.com:32001"
      ]
    }
    loglevel = INFO
    remote {
      transport = "akka.remote.netty.NettyRemoteTransport"
      netty {
        # uncomment to override hostname -- the default value is
        # java.net.InetAddress.getLocalHost().getHostName()
        # hostname = "mynode.mydomain.com"
        port = 32001
      }
    }
  }
}

Experiment One

Methods

Both nodes are launched at approximately the same time. During the course of execution, once the logging output indicates that the two nodes have become peers and reached convergence, I manually send SIGINT to the instance running on node2.

The results contain logging output generated by node1 during the course of the experiment.

Results

Peer 1: successful peer join from node2.mydomain.com

[09/04/2012 22:46:59.236] [node-akka.actor.default-dispatcher-7] [akka://node/system/cluster/core]
Cluster Node [akka://node@node1.mydomain.com:32001] - Leader is moving node [akka://node@node2.mydomain.com:32001] from JOINING to UP

[09/04/2012 22:47:01.425] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
Cluster Peers: []

[09/04/2012 22:47:01.427] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
  TreeSet(
    Member(address = akka://node@node1.mydomain.com:32001, status = Up),
    Member(address = akka://node@node2.mydomain.com:32001, status = Up)
  ),
  Set(),
  true,
  Set(akka://node@node1.mydomain.com:32001, akka://node@node2.mydomain.com:32001),
  Some(akka://node@node1.mydomain.com:32001)
)]

[09/04/2012 22:47:03.521] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
Peer Info Update Received from [Actor[akka://node@node2.mydomain.com:32001/user/kernel/myapp]]

[09/04/2012 22:47:11.525] [node-akka.actor.default-dispatcher-8] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://node@node2.mydomain.com:32001/user/kernel/myapp]]

Event 2: peer death (hard JVM stop on node2.mydomain.com)

[09/04/2012 22:47:13.716] [node-3] [NettyRemoteTransport(akka://node@node1.mydomain.com:32001)]
RemoteClientShutdown@akka://node@node2.mydomain.com:32001

[09/04/2012 22:47:18.247] [node-akka.actor.default-dispatcher-4] [FailureDetector(akka://node)]
Phi value [Infinity] for connection [akka://node@node2.mydomain.com:32001], after [4997 ms], based on  [N(979.0588235294117, 114.57361907023495)]

[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-4] [akka://node/system/cluster/core]
Cluster Node [akka://node@node1.mydomain.com:32001] - Marking node(s) as UNREACHABLE [Member(address = akka://node@node2.mydomain.com:32001, status = Up)]

[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(Member(address = akka://node@node1.mydomain.com:32001, status = Up))]

[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://node@node2.mydomain.com:32001, status = Up))]

[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [false]

[09/04/2012 22:47:18.251] [node-akka.actor.default-dispatcher-5] [akka://node/system/cluster/core]
Cluster Node [akka://node@node1.mydomain.com:32001] - Marking unreachable node [akka://node@node2.mydomain.com:32001] as DOWN

[09/04/2012 22:47:18.256] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://node@node2.mydomain.com:32001, status = Down))]

[09/04/2012 22:47:18.257] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [true]

[09/04/2012 22:47:21.622] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]

[09/04/2012 22:47:21.822] [node-akka.actor.default-dispatcher-6] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
  TreeSet(
    Member(address = akka://node@node1.mydomain.com:32001, status = Up)
  ),
  Set(Member(address = akka://node@node2.mydomain.com:32001, status = Down)),
  true,
  Set(akka://node@node1.mydomain.com:32001),
  Some(akka://node@node1.mydomain.com:32001)
)]

Event 3: successful peer restart and rejoin

[09/04/2012 22:47:29.663] [node-5] [NettyRemoteTransport(akka://node@node1.mydomain.com:32001)]
RemoteClientStarted@akka://node@node2.mydomain.com:32001

[09/04/2012 22:47:29.687] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(
  Member(address = akka://node@node1.mydomain.com:32001, status = Up),
  Member(address = akka://node@node2.mydomain.com:32001, status = Joining)
)]

[09/04/2012 22:47:29.688] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set()]

[09/04/2012 22:47:29.688] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [false]

[09/04/2012 22:47:29.732] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [true]

[09/04/2012 22:47:30.255] [node-akka.actor.default-dispatcher-7] [akka://node/system/cluster/core]
Cluster Node [akka://node@node1.mydomain.com:32001] - Leader is moving node [akka://node@node2.mydomain.com:32001] from JOINING to UP

[09/04/2012 22:47:30.256] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(
  Member(address = akka://node@node1.mydomain.com:32001, status = Up),
  Member(address = akka://node@node2.mydomain.com:32001, status = Up)
)]

[09/04/2012 22:47:30.256] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [false]

[09/04/2012 22:47:30.468] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [true]

[09/04/2012 22:47:31.722] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://node@node2.mydomain.com:32001/user/kernel/myapp]]

[09/04/2012 22:47:31.923] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
  TreeSet(
    Member(address = akka://node@node1.mydomain.com:32001, status = Up),
    Member(address = akka://node@node2.mydomain.com:32001, status = Up)
  ),
  Set(),
  true,
  Set(akka://node@node1.mydomain.com:32001, akka://node@node2.mydomain.com:32001),
  Some(akka://node@node1.mydomain.com:32001)
)]

[09/04/2012 22:47:34.744] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Peer Info Update Received from [Actor[akka://node@node2.mydomain.com:32001/user/kernel/myapp]]

Experiment Two

Methods

Both nodes are launched at approximately the same time. During the course of execution, once the logging output indicates that the two nodes have become peers and reached convergence, I manually send SIGINT to the instance running on node1.

The results contain logging output generated by node2 during the course of the experiment.

Results

Event 1: successful peer join from node1.mydomain.com

[09/05/2012 09:01:25.801] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
  TreeSet(
    Member(address = akka://node@node1.mydomain.com:32001, status = Up),
    Member(address = akka://node@node2.mydomain.com:32001, status = Up)
  ),
  Set(),
  true,
  Set(akka://node@node1.mydomain.com:32001, akka://node@node2.mydomain.com:32001),
  Some(akka://node@node1.mydomain.com:32001)
)]

[09/05/2012 09:01:25.847] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Routing table update received from [Actor[akka://node@node1.mydomain.com:32001/user/kernel/myapp]]

[09/05/2012 09:01:25.897] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://node@node1.mydomain.com:32001/user/kernel/myapp]]

Event 2: peer death (hard JVM stop on node2.mydomain.com)

[09/05/2012 09:01:37.880] [node-akka.actor.default-dispatcher-3] [NettyRemoteTransport(akka://node@node2.mydomain.com:32001)]
RemoteClientShutdown@akka://node@node1.mydomain.com:32001

[09/05/2012 09:01:42.620] [node-akka.actor.default-dispatcher-6] [FailureDetector(akka://node)]
Phi value [Infinity] for connection [akka://node@node1.mydomain.com:32001], after [4977 ms], based on  [N(998.7222222222222, 100.0)]

[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-6] [akka://node/system/cluster/core]
Cluster Node [akka://node@node2.mydomain.com:32001] - Marking node(s) as UNREACHABLE [Member(address = akka://node@node1.mydomain.com:32001, status = Up)]

[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(Member(address = akka://node@node2.mydomain.com:32001, status = Up))]

[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://node@node1.mydomain.com:32001, status = Up))]

[09/05/2012 09:01:42.627] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node2.mydomain.com:32001)] [false]

[09/05/2012 09:01:42.644] [node-akka.actor.default-dispatcher-6] [akka://node/system/cluster/core]
Cluster Node [akka://node@node2.mydomain.com:32001] - Leader is marking unreachable node [akka://node@node1.mydomain.com:32001] as DOWN

[09/05/2012 09:01:42.645] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://node@node1.mydomain.com:32001, status = Down))]

[09/05/2012 09:01:42.645] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node2.mydomain.com:32001)] [true]

[09/05/2012 09:01:46.100] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]

[09/05/2012 09:01:46.101] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
  TreeSet(
    Member(address = akka://node@node2.mydomain.com:32001, status = Up)
  ),
  Set(Member(address = akka://node@node1.mydomain.com:32001, status = Down)),
  true,
  Set(akka://node@node2.mydomain.com:32001),
  Some(akka://node@node2.mydomain.com:32001)
)]

[09/05/2012 09:01:56.199] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
  TreeSet(
    Member(address = akka://node@node2.mydomain.com:32001, status = Up)
  ),
  Set(Member(address = akka://node@node1.mydomain.com:32001, status = Down)),
  true,
  Set(akka://node@node2.mydomain.com:32001),
  Some(akka://node@node2.mydomain.com:32001)
)]

[09/05/2012 09:01:56.199] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]

Event 3: unsuccessful peer rejoin after restart (node1 restarts at 09/05/2012 09:01:57.666)

[INFO] [09/05/2012 09:02:16.597] [node-akka.actor.default-dispatcher-8] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
  TreeSet(
    Member(address = akka://node@node2.mydomain.com:32001, status = Up)
  ),
  Set(Member(address = akka://node@node1.mydomain.com:32001, status = Down)),
  true,
  Set(akka://node@node2.mydomain.com:32001),
  Some(akka://node@node2.mydomain.com:32001)
)]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment