Note I just re-ran this on current master (
Docker version 1.12.0-dev, build 42f2205
), and ended up with the same result
Something I just ran into, and can reproduce reliably;
- Create 4 droplets on DigitalOcean, and install 1.12-RC4
- First node creates swarm (
docker swarm init
) - Nodes 2, and 3 join the swarm as manager
- Node 4 joins the swarm as worker
Then; create a service, and scale to 16
docker service create --name web --replicas=16 -p 80:80 nginx:alpine
On one of the manager nodes (swarm-test-02
), watch docker node ls
;
ID HOSTNAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f swarm-test-03 Accepted Ready Active Reachable
36oy4vdvxxmnuw72pfli5dv2i swarm-test-01 Accepted Ready Active Leader
37argq2z05d5719olaiqpauvh * swarm-test-02 Accepted Ready Active Reachable
bpcy1dxk7v6md85qg3m7jn8il swarm-test-04 Accepted Ready Active
And on all nodes, watch docker ps
;
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
72ea95d66177 nginx:alpine "nginx -g 'daemon off" About a minute ago Up About a minute 80/tcp, 443/tcp web.2.7h8zuthv89jr2
ao4c1x4r9qce
dd58f8757a06 nginx:alpine "nginx -g 'daemon off" About a minute ago Up About a minute 80/tcp, 443/tcp web.10.8bolssicpwc2
5op18dleducvg
59966a937870 nginx:alpine "nginx -g 'daemon off" About a minute ago Up About a minute 80/tcp, 443/tcp web.11.0shxh3ccvzpw
zvjvphjrzefkh
49040cf8f0a6 nginx:alpine "nginx -g 'daemon off" About a minute ago Up About a minute 80/tcp, 443/tcp web.9.18f9rtrhyamjd
mxnmd9hmokj0
###Kill the leader node
From the DigitalOcean control panel, destroy the leader node, meanwhile, on the nodes, watch what happens
###1. Initial state (before killing leader)
ID HOSTNAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS
1msvix0hu4vz7czqcwvzi3poq * ubuntu-2gb-ams3-01 Accepted Ready Active Reachable
264qo2mcvkuagirx68iluzi88 ubuntu-2gb-ams3-02 Accepted Ready Active Reachable
4z3arwkhuwvvqiqwm9rxqg165 ubuntu-2gb-ams3-01 Accepted Ready Active Leader
en76v2a9mlf12sjwdtw3hyb58 ubuntu-2gb-ams3-01 Accepted Ready Active
###2. Status "unknown" for all nodes
Just after killing the leader, an rpc deadline
error is presented, then, the
node status goes through the following stages:
ID HOSTNAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f swarm-test-03 Accepted Unknown Active Reachable
36oy4vdvxxmnuw72pfli5dv2i swarm-test-01 Accepted Unknown Active Unreachable
37argq2z05d5719olaiqpauvh * swarm-test-02 Accepted Unknown Active Leader
bpcy1dxk7v6md85qg3m7jn8il swarm-test-04 Accepted Unknown Active
###3. Status "down" for all nodes
ID HOSTNAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f swarm-test-03 Accepted Down Active Reachable
36oy4vdvxxmnuw72pfli5dv2i swarm-test-01 Accepted Down Active Unreachable
37argq2z05d5719olaiqpauvh * swarm-test-02 Accepted Down Active Leader
bpcy1dxk7v6md85qg3m7jn8il swarm-test-04 Accepted Down Active
###4. Status "down" for the manager that did not become leader
ID HOSTNAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f swarm-test-03 Accepted Down Active Reachable
36oy4vdvxxmnuw72pfli5dv2i swarm-test-01 Accepted Down Active Unreachable
37argq2z05d5719olaiqpauvh * swarm-test-02 Accepted Ready Active Leader
bpcy1dxk7v6md85qg3m7jn8il swarm-test-04 Accepted Ready Active
###5. Status "ready"
ID HOSTNAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS
0j5yrijzbkgwl3bmjzp1aag1f swarm-test-03 Accepted Ready Active Reachable
36oy4vdvxxmnuw72pfli5dv2i swarm-test-01 Accepted Down Active Unreachable
37argq2z05d5719olaiqpauvh * swarm-test-02 Accepted Ready Active Leader
bpcy1dxk7v6md85qg3m7jn8il swarm-test-04 Accepted Ready Active
However, at stage 5, all containers ended up on a single node:
swarm-test-02:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
swarm-test-03:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
swarm-test-04:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
06b59fd80536 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.1.7599uowuwx8kln46jzzlzrsej
5e751507a919 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.8.6wcxeir8kw4misv5hm6tbt1gb
9f871df4c0e6 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.9.6348tx8lisem60tge1n9h105p
34d922f5627b nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.10.18x3x83ay20moo4wfk6uqssuk
fe00d09420fe nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 5 minutes 80/tcp, 443/tcp web.15.7ajzwkx125w0ilbv09kluxsh5
1a365c02c807 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.2.9g06i0ji6oby1a681hrnr7fuv
780f69e79646 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.12.0952zl8ejgp7glfr991xmiqsu
93002b1dcfe7 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.14.62cwuiykimiz6dlmiognotmcn
c96a48711179 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.3.96zsnc2bfgd9dj71bicgds0w1
7634285e896a nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.6.bqwqso18dqrkr80jc51aumonq
3ed44fe7e021 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.11.8s2va2tvmpw187s7v8ce6h3r2
23a083a91b6b nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.13.709wueza0j9631eibdm7ildu3
6d6871c931ce nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.4.1o0ay75v1y8amya2hnhnfb8y0
2a4cb84bd3ee nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.7.7ijasltqhbalctuahsly3n2cy
18bcc8cb66a1 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 5 minutes 80/tcp, 443/tcp web.16.b7z9i8lo97fzxmtvwd06ufpxp
67fa07227b58 nginx:alpine "nginx -g 'daemon off" 5 minutes ago Up 4 minutes 80/tcp, 443/tcp web.5.6ln0iyt24qxnic2nu0rg3ttxx