Skip to content

Instantly share code, notes, and snippets.

@thaJeztah
Last active July 25, 2016 14:01
Show Gist options
  • Save thaJeztah/1599b09b41e696837235f2509df5ade1 to your computer and use it in GitHub Desktop.
Save thaJeztah/1599b09b41e696837235f2509df5ade1 to your computer and use it in GitHub Desktop.

[1.12.-RC4] Killing leader makes all containers end up on a single node

Note I just re-ran this on current master (Docker version 1.12.0-dev, build 42f2205), and ended up with the same result

Something I just ran into, and can reproduce reliably;

  1. Create 4 droplets on DigitalOcean, and install 1.12-RC4
  2. First node creates swarm (docker swarm init)
  3. Nodes 2, and 3 join the swarm as manager
  4. Node 4 joins the swarm as worker

Then; create a service, and scale to 16

docker service create --name web --replicas=16 -p 80:80 nginx:alpine

On one of the manager nodes (swarm-test-02), watch docker node ls;

ID                           HOSTNAME       MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f    swarm-test-03  Accepted    Ready   Active        Reachable
36oy4vdvxxmnuw72pfli5dv2i    swarm-test-01  Accepted    Ready   Active        Leader
37argq2z05d5719olaiqpauvh *  swarm-test-02  Accepted    Ready   Active        Reachable
bpcy1dxk7v6md85qg3m7jn8il    swarm-test-04  Accepted    Ready   Active

And on all nodes, watch docker ps;

CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS               NAMES
72ea95d66177        nginx:alpine        "nginx -g 'daemon off"   About a minute ago   Up About a minute   80/tcp, 443/tcp     web.2.7h8zuthv89jr2
ao4c1x4r9qce
dd58f8757a06        nginx:alpine        "nginx -g 'daemon off"   About a minute ago   Up About a minute   80/tcp, 443/tcp     web.10.8bolssicpwc2
5op18dleducvg
59966a937870        nginx:alpine        "nginx -g 'daemon off"   About a minute ago   Up About a minute   80/tcp, 443/tcp     web.11.0shxh3ccvzpw
zvjvphjrzefkh
49040cf8f0a6        nginx:alpine        "nginx -g 'daemon off"   About a minute ago   Up About a minute   80/tcp, 443/tcp     web.9.18f9rtrhyamjd
mxnmd9hmokj0

###Kill the leader node

From the DigitalOcean control panel, destroy the leader node, meanwhile, on the nodes, watch what happens

###1. Initial state (before killing leader)

ID                           HOSTNAME            MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
1msvix0hu4vz7czqcwvzi3poq *  ubuntu-2gb-ams3-01  Accepted    Ready   Active        Reachable
264qo2mcvkuagirx68iluzi88    ubuntu-2gb-ams3-02  Accepted    Ready   Active        Reachable
4z3arwkhuwvvqiqwm9rxqg165    ubuntu-2gb-ams3-01  Accepted    Ready   Active        Leader
en76v2a9mlf12sjwdtw3hyb58    ubuntu-2gb-ams3-01  Accepted    Ready   Active

###2. Status "unknown" for all nodes

Just after killing the leader, an rpc deadline error is presented, then, the node status goes through the following stages:

ID                           HOSTNAME       MEMBERSHIP  STATUS   AVAILABILITY  MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f    swarm-test-03  Accepted    Unknown  Active        Reachable
36oy4vdvxxmnuw72pfli5dv2i    swarm-test-01  Accepted    Unknown  Active        Unreachable
37argq2z05d5719olaiqpauvh *  swarm-test-02  Accepted    Unknown  Active        Leader
bpcy1dxk7v6md85qg3m7jn8il    swarm-test-04  Accepted    Unknown  Active

###3. Status "down" for all nodes

ID                           HOSTNAME       MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f    swarm-test-03  Accepted    Down    Active        Reachable
36oy4vdvxxmnuw72pfli5dv2i    swarm-test-01  Accepted    Down    Active        Unreachable
37argq2z05d5719olaiqpauvh *  swarm-test-02  Accepted    Down    Active        Leader
bpcy1dxk7v6md85qg3m7jn8il    swarm-test-04  Accepted    Down    Active

###4. Status "down" for the manager that did not become leader

ID                           HOSTNAME       MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f    swarm-test-03  Accepted    Down    Active        Reachable
36oy4vdvxxmnuw72pfli5dv2i    swarm-test-01  Accepted    Down    Active        Unreachable
37argq2z05d5719olaiqpauvh *  swarm-test-02  Accepted    Ready   Active        Leader
bpcy1dxk7v6md85qg3m7jn8il    swarm-test-04  Accepted    Ready   Active

###5. Status "ready"

ID                           HOSTNAME       MEMBERSHIP  STATUS  AVAILABILITY  MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f    swarm-test-03  Accepted    Ready   Active        Reachable
36oy4vdvxxmnuw72pfli5dv2i    swarm-test-01  Accepted    Down    Active        Unreachable
37argq2z05d5719olaiqpauvh *  swarm-test-02  Accepted    Ready   Active        Leader
bpcy1dxk7v6md85qg3m7jn8il    swarm-test-04  Accepted    Ready   Active

However, at stage 5, all containers ended up on a single node:

swarm-test-02:

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

swarm-test-03:

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

swarm-test-04:

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
06b59fd80536        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.1.7599uowuwx8kln46jzzlzrsej
5e751507a919        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.8.6wcxeir8kw4misv5hm6tbt1gb
9f871df4c0e6        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.9.6348tx8lisem60tge1n9h105p
34d922f5627b        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.10.18x3x83ay20moo4wfk6uqssuk
fe00d09420fe        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 5 minutes        80/tcp, 443/tcp     web.15.7ajzwkx125w0ilbv09kluxsh5
1a365c02c807        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.2.9g06i0ji6oby1a681hrnr7fuv
780f69e79646        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.12.0952zl8ejgp7glfr991xmiqsu
93002b1dcfe7        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.14.62cwuiykimiz6dlmiognotmcn
c96a48711179        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.3.96zsnc2bfgd9dj71bicgds0w1
7634285e896a        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.6.bqwqso18dqrkr80jc51aumonq
3ed44fe7e021        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.11.8s2va2tvmpw187s7v8ce6h3r2
23a083a91b6b        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.13.709wueza0j9631eibdm7ildu3
6d6871c931ce        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.4.1o0ay75v1y8amya2hnhnfb8y0
2a4cb84bd3ee        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.7.7ijasltqhbalctuahsly3n2cy
18bcc8cb66a1        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 5 minutes        80/tcp, 443/tcp     web.16.b7z9i8lo97fzxmtvwd06ufpxp
67fa07227b58        nginx:alpine        "nginx -g 'daemon off"   5 minutes ago       Up 4 minutes        80/tcp, 443/tcp     web.5.6ln0iyt24qxnic2nu0rg3ttxx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment