Skip to content

Instantly share code, notes, and snippets.

@ddewaele
Last active June 13, 2017 21:54
Show Gist options
  • Save ddewaele/67ca6cb95b9c894a9eb8d782b2ad99a2 to your computer and use it in GitHub Desktop.
Save ddewaele/67ca6cb95b9c894a9eb8d782b2ad99a2 to your computer and use it in GitHub Desktop.
nifi-cluster-connection-issues.md

We have a node nifi cluster running with 3 zookeeper instances (replicated) in a Docker Swarm Cluster.

Most of time the cluster is operating fine, but from time to time we notice that Nifi stops processing messages completely. It eventually resumes after a while (sometimes after a couple of seconds, sometimes after a couple of minutes).

When I do a grep o.a.n.c.l.e.CuratorLeaderElectionManager /srv/nifi/logs/nifi-app.log on the primary node, I see a lof of suspended / reconnected messages.

  2017-06-13 21:29:35,622 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 Connection State changed to SUSPENDED
  2017-06-13 21:29:35,623 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 Connection State changed to SUSPENDED
  2017-06-13 21:29:35,628 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 has been interrupted; no longer leader for role 'Cluster Coordinator'
  2017-06-13 21:29:35,628 INFO [Leader Election Notification Thread-4] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 has been interrupted; no longer leader for role 'Primary Node'
  2017-06-13 21:29:35,629 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 This node is no longer leader for role 'Cluster Coordinator'
  2017-06-13 21:29:35,629 INFO [Leader Election Notification Thread-4] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 This node is no longer leader for role 'Primary Node'
  2017-06-13 21:29:38,154 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 Connection State changed to RECONNECTED
  2017-06-13 21:29:38,155 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 Connection State changed to RECONNECTED
  2017-06-13 21:29:38,999 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 This node has been elected Leader for Role 'Cluster Coordinator'
  2017-06-13 21:29:39,000 INFO [Leader Election Notification Thread-4] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 This node has been elected Leader for Role 'Primary Node'
  2017-06-13 21:36:38,618 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 Connection State changed to SUSPENDED
  2017-06-13 21:36:38,618 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 Connection State changed to SUSPENDED
  2017-06-13 21:36:38,626 INFO [Leader Election Notification Thread-4] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 has been interrupted; no longer leader for role 'Primary Node'
  2017-06-13 21:36:38,626 INFO [Leader Election Notification Thread-4] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 This node is no longer leader for role 'Primary Node'
  2017-06-13 21:36:38,626 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 has been interrupted; no longer leader for role 'Cluster Coordinator'
  2017-06-13 21:36:38,626 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 This node is no longer leader for role 'Cluster Coordinator'
  2017-06-13 21:36:39,920 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 Connection State changed to RECONNECTED
  2017-06-13 21:36:39,920 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 Connection State changed to RECONNECTED
  2017-06-13 21:36:42,879 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@3485c1a5 This node has been elected Leader for Role 'Cluster Coordinator'
  2017-06-13 21:36:42,903 INFO [Leader Election Notification Thread-4] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@778f44c6 This node has been elected Leader for Role 'Primary Node'

Likewise on the other node, I see similar messages

  2017-06-13 21:36:21,846 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to SUSPENDED
  2017-06-13 21:36:22,183 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to SUSPENDED
  2017-06-13 21:36:36,203 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to RECONNECTED
  2017-06-13 21:36:36,203 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to RECONNECTED
  2017-06-13 21:36:38,974 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to SUSPENDED
  2017-06-13 21:36:39,022 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to SUSPENDED
  2017-06-13 21:36:39,089 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to LOST
  2017-06-13 21:36:39,092 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to LOST
  2017-06-13 21:36:40,327 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to RECONNECTED
  2017-06-13 21:36:40,327 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to RECONNECTED
  2017-06-13 21:37:07,170 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to SUSPENDED
  2017-06-13 21:37:07,170 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to SUSPENDED
  2017-06-13 21:37:08,026 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to RECONNECTED
  2017-06-13 21:37:08,027 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to RECONNECTED
  2017-06-13 21:41:40,436 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to SUSPENDED
  2017-06-13 21:41:40,585 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to SUSPENDED
  2017-06-13 21:41:42,151 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to LOST
  2017-06-13 21:41:42,151 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to LOST
  2017-06-13 21:41:43,278 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@251ef4ed Connection State changed to RECONNECTED
  2017-06-13 21:41:43,278 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@751a16d6 Connection State changed to RECONNECTED

The only real exceptions I'm seeing in the logs are these

  2017-06-13 21:03:26,829 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
  org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
          at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838)
          at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
          at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
          at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)

I also this on the UI from time to time :

  com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out 

Is there anything I can do to further debug this ? Is it normal to see that many connection state changes ? (the logs are full of them). The solution is running on 3 VMs, using Docker Swarm. Nifi is running on 2 of those 3 VMs. We have a zookeeper setup running on all 3 VMs.

I don't see any errors in the zookeeper logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment