This is an attempt to communicate unexpected behavior using the 2.1-SNAPSHOT
version of Akka's cluster module.
Specifically, the unexpected behavior my application is experiencing is that when the cluster leader becomes unreachable due to a SIGINT
, it is unable to re-join the cluster when restarted. Strangely, this always works when the node that is killed is not the leader.
The output listings here consist of handpicked "important" logged events during the course of an Akka cluster session. There are two nodes in the system: node1.mydomain.com
and node2.mydomain.com
. Both nodes are running identical software, freshly pulled from source control.
Because of the way Akka does leader selection, node1
is always the cluster leader when both nodes are members.
The Akka configuration is identical on node1 and node2.
node {
akka {
log-config-on-start = "on"
actor.provider = "akka.remote.RemoteActorRefProvider"
cluster {
nodename = "node"
auto-join = "on"
auto-down = "on"
seed-nodes = [
"akka://node@node1.mydomain.com:32001",
"akka://node@node2.mydomain.com:32001"
]
}
loglevel = INFO
remote {
transport = "akka.remote.netty.NettyRemoteTransport"
netty {
# uncomment to override hostname -- the default value is
# java.net.InetAddress.getLocalHost().getHostName()
# hostname = "mynode.mydomain.com"
port = 32001
}
}
}
}
Both nodes are launched at approximately the same time. During the course of execution, once the logging output indicates that the two nodes have become peers and reached convergence, I manually send SIGINT
to the instance running on node2
.
The results contain logging output generated by node1
during the course of the experiment.
[09/04/2012 22:46:59.236] [node-akka.actor.default-dispatcher-7] [akka://node/system/cluster/core]
Cluster Node [akka://node@node1.mydomain.com:32001] - Leader is moving node [akka://node@node2.mydomain.com:32001] from JOINING to UP
[09/04/2012 22:47:01.425] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
Cluster Peers: []
[09/04/2012 22:47:01.427] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://node@node1.mydomain.com:32001, status = Up),
Member(address = akka://node@node2.mydomain.com:32001, status = Up)
),
Set(),
true,
Set(akka://node@node1.mydomain.com:32001, akka://node@node2.mydomain.com:32001),
Some(akka://node@node1.mydomain.com:32001)
)]
[09/04/2012 22:47:03.521] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
Peer Info Update Received from [Actor[akka://node@node2.mydomain.com:32001/user/kernel/myapp]]
[09/04/2012 22:47:11.525] [node-akka.actor.default-dispatcher-8] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://node@node2.mydomain.com:32001/user/kernel/myapp]]
[09/04/2012 22:47:13.716] [node-3] [NettyRemoteTransport(akka://node@node1.mydomain.com:32001)]
RemoteClientShutdown@akka://node@node2.mydomain.com:32001
[09/04/2012 22:47:18.247] [node-akka.actor.default-dispatcher-4] [FailureDetector(akka://node)]
Phi value [Infinity] for connection [akka://node@node2.mydomain.com:32001], after [4997 ms], based on [N(979.0588235294117, 114.57361907023495)]
[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-4] [akka://node/system/cluster/core]
Cluster Node [akka://node@node1.mydomain.com:32001] - Marking node(s) as UNREACHABLE [Member(address = akka://node@node2.mydomain.com:32001, status = Up)]
[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(Member(address = akka://node@node1.mydomain.com:32001, status = Up))]
[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://node@node2.mydomain.com:32001, status = Up))]
[09/04/2012 22:47:18.249] [node-akka.actor.default-dispatcher-5] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [false]
[09/04/2012 22:47:18.251] [node-akka.actor.default-dispatcher-5] [akka://node/system/cluster/core]
Cluster Node [akka://node@node1.mydomain.com:32001] - Marking unreachable node [akka://node@node2.mydomain.com:32001] as DOWN
[09/04/2012 22:47:18.256] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://node@node2.mydomain.com:32001, status = Down))]
[09/04/2012 22:47:18.257] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [true]
[09/04/2012 22:47:21.622] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]
[09/04/2012 22:47:21.822] [node-akka.actor.default-dispatcher-6] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://node@node1.mydomain.com:32001, status = Up)
),
Set(Member(address = akka://node@node2.mydomain.com:32001, status = Down)),
true,
Set(akka://node@node1.mydomain.com:32001),
Some(akka://node@node1.mydomain.com:32001)
)]
[09/04/2012 22:47:29.663] [node-5] [NettyRemoteTransport(akka://node@node1.mydomain.com:32001)]
RemoteClientStarted@akka://node@node2.mydomain.com:32001
[09/04/2012 22:47:29.687] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(
Member(address = akka://node@node1.mydomain.com:32001, status = Up),
Member(address = akka://node@node2.mydomain.com:32001, status = Joining)
)]
[09/04/2012 22:47:29.688] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set()]
[09/04/2012 22:47:29.688] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [false]
[09/04/2012 22:47:29.732] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [true]
[09/04/2012 22:47:30.255] [node-akka.actor.default-dispatcher-7] [akka://node/system/cluster/core]
Cluster Node [akka://node@node1.mydomain.com:32001] - Leader is moving node [akka://node@node2.mydomain.com:32001] from JOINING to UP
[09/04/2012 22:47:30.256] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(
Member(address = akka://node@node1.mydomain.com:32001, status = Up),
Member(address = akka://node@node2.mydomain.com:32001, status = Up)
)]
[09/04/2012 22:47:30.256] [node-akka.actor.default-dispatcher-1] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [false]
[09/04/2012 22:47:30.468] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node1.mydomain.com:32001)] [true]
[09/04/2012 22:47:31.722] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://node@node2.mydomain.com:32001/user/kernel/myapp]]
[09/04/2012 22:47:31.923] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://node@node1.mydomain.com:32001, status = Up),
Member(address = akka://node@node2.mydomain.com:32001, status = Up)
),
Set(),
true,
Set(akka://node@node1.mydomain.com:32001, akka://node@node2.mydomain.com:32001),
Some(akka://node@node1.mydomain.com:32001)
)]
[09/04/2012 22:47:34.744] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Peer Info Update Received from [Actor[akka://node@node2.mydomain.com:32001/user/kernel/myapp]]
Both nodes are launched at approximately the same time. During the course of execution, once the logging output indicates that the two nodes have become peers and reached convergence, I manually send SIGINT
to the instance running on node1
.
The results contain logging output generated by node2
during the course of the experiment.
[09/05/2012 09:01:25.801] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://node@node1.mydomain.com:32001, status = Up),
Member(address = akka://node@node2.mydomain.com:32001, status = Up)
),
Set(),
true,
Set(akka://node@node1.mydomain.com:32001, akka://node@node2.mydomain.com:32001),
Some(akka://node@node1.mydomain.com:32001)
)]
[09/05/2012 09:01:25.847] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Routing table update received from [Actor[akka://node@node1.mydomain.com:32001/user/kernel/myapp]]
[09/05/2012 09:01:25.897] [node-akka.actor.default-dispatcher-2] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp], Actor[akka://node@node1.mydomain.com:32001/user/kernel/myapp]]
[09/05/2012 09:01:37.880] [node-akka.actor.default-dispatcher-3] [NettyRemoteTransport(akka://node@node2.mydomain.com:32001)]
RemoteClientShutdown@akka://node@node1.mydomain.com:32001
[09/05/2012 09:01:42.620] [node-akka.actor.default-dispatcher-6] [FailureDetector(akka://node)]
Phi value [Infinity] for connection [akka://node@node1.mydomain.com:32001], after [4977 ms], based on [N(998.7222222222222, 100.0)]
[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-6] [akka://node/system/cluster/core]
Cluster Node [akka://node@node2.mydomain.com:32001] - Marking node(s) as UNREACHABLE [Member(address = akka://node@node1.mydomain.com:32001, status = Up)]
[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
MembersChanged: [TreeSet(Member(address = akka://node@node2.mydomain.com:32001, status = Up))]
[09/05/2012 09:01:42.621] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://node@node1.mydomain.com:32001, status = Up))]
[09/05/2012 09:01:42.627] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node2.mydomain.com:32001)] [false]
[09/05/2012 09:01:42.644] [node-akka.actor.default-dispatcher-6] [akka://node/system/cluster/core]
Cluster Node [akka://node@node2.mydomain.com:32001] - Leader is marking unreachable node [akka://node@node1.mydomain.com:32001] as DOWN
[09/05/2012 09:01:42.645] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
UnreachableMembersChanged: [Set(Member(address = akka://node@node1.mydomain.com:32001, status = Down))]
[09/05/2012 09:01:42.645] [node-akka.actor.default-dispatcher-4] [akka://node/user/kernel/myapp]
LeaderChanged: [Some(akka://node@node2.mydomain.com:32001)] [true]
[09/05/2012 09:01:46.100] [node-akka.actor.default-dispatcher-3] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]
[09/05/2012 09:01:46.101] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://node@node2.mydomain.com:32001, status = Up)
),
Set(Member(address = akka://node@node1.mydomain.com:32001, status = Down)),
true,
Set(akka://node@node2.mydomain.com:32001),
Some(akka://node@node2.mydomain.com:32001)
)]
[09/05/2012 09:01:56.199] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://node@node2.mydomain.com:32001, status = Up)
),
Set(Member(address = akka://node@node1.mydomain.com:32001, status = Down)),
true,
Set(akka://node@node2.mydomain.com:32001),
Some(akka://node@node2.mydomain.com:32001)
)]
[09/05/2012 09:01:56.199] [node-akka.actor.default-dispatcher-7] [akka://node/user/kernel/myapp]
Cluster Peers: [Actor[akka://node/user/kernel/myapp]]
[INFO] [09/05/2012 09:02:16.597] [node-akka.actor.default-dispatcher-8] [akka://node/user/kernel/myapp]
CurrentClusterState: [CurrentClusterState(
TreeSet(
Member(address = akka://node@node2.mydomain.com:32001, status = Up)
),
Set(Member(address = akka://node@node1.mydomain.com:32001, status = Down)),
true,
Set(akka://node@node2.mydomain.com:32001),
Some(akka://node@node2.mydomain.com:32001)
)]