KostyaEsmukov/Docker Swarm Lab.md

## Docker Swarm Lab.md

      
    Raw
  

              Docker Swarm Lab.md
            
          
    Lab: Docker Swarm and Network Partitions


Questions:

How does Docker Swarm behave in the different parts of a cluster during
a network partition? Will it recover by itself when the partition
is gone?
What happens to a single-instance service when that node is getting
partitioned off?


Requirements: Vagrant, VirtualBox

Setup the lab and bootstrap the Docker Swarm cluster


Put the attached Vagrantfile to the current dir
$ vagrant plugin install vagrant-vbguest
$ vagrant up
Wait until vagrant completes the provisioning

Ensure that the cluster has been bootstrapped:
$ vagrant ssh node1 -- docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
mklyqd44iq77mw8mn9im2gn07 *   node1               Ready               Active              Leader              18.09.6
l7kcf6g169f46w7wgja32sxq2     node2               Ready               Active              Reachable           18.09.6
669oibd49tieprs71pa4q8dqb     node3               Ready               Active              Reachable           18.09.6

Partition out a single node

Add a simple service:
$ vagrant ssh node2 -- docker service create --replicas 1 --name helloworld alpine ping docker.com

Ensure it is running:
$ vagrant ssh node1 -- docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
iq8vqotfaaa6        helloworld          replicated          1/1                 alpine:latest

$ vagrant ssh node1 -- docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
y5d8388s98ta        helloworld.1        alpine:latest       node3               Running             Running 4 minutes ago

$ vagrant ssh node3 -- docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
81006e80b77e        alpine:latest       "ping docker.com"   5 minutes ago       Up 5 minutes                            helloworld.1.y5d8388s98tao0meljw43ac6u

Partition out the node3 which is currently running the container:
$ vagrant ssh node3 -- sudo iptables -I INPUT -i eth1 -j DROP
$ vagrant ssh node3 -- sudo iptables -I OUTPUT -o eth1 -j DROP

Inspect the service:
$ vagrant ssh node1 -- docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
lly2p3ce15o5        helloworld.1        alpine:latest       node1               Running             Running 29 seconds ago
y5d8388s98ta         \_ helloworld.1    alpine:latest       node3               Shutdown            Running 8 minutes ago

$ vagrant ssh node1 -- docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
f1cab158d24e        alpine:latest       "ping docker.com"   39 seconds ago      Up 37 seconds                           helloworld.1.lly2p3ce15o5el109pz75hjzf

$ vagrant ssh node3 -- docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
81006e80b77e        alpine:latest       "ping docker.com"   8 minutes ago       Up 8 minutes                            helloworld.1.y5d8388s98tao0meljw43ac6u

So the container has restarted in the healthy part of the cluster, but
now we have 2 copies. Not what I was expecting, but ok. Apparently Swarm
doesn't work otherwise: moby/swarmkit#1743
Check out the nodes status:
$ vagrant ssh node1 -- docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
mklyqd44iq77mw8mn9im2gn07 *   node1               Ready               Active              Leader              18.09.6
l7kcf6g169f46w7wgja32sxq2     node2               Ready               Active              Reachable           18.09.6
669oibd49tieprs71pa4q8dqb     node3               Down                Active              Unreachable         18.09.6

$ vagrant ssh node3 -- docker node ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.

Partition out another node

Let's partition out the node2 as well:
$ vagrant ssh node2 -- sudo iptables -I INPUT -i eth1 -j DROP
$ vagrant ssh node2 -- sudo iptables -I OUTPUT -o eth1 -j DROP

Check the status:
$ vagrant ssh node1 -- docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
mklyqd44iq77mw8mn9im2gn07 *   node1               Ready               Active              Leader              18.09.6
l7kcf6g169f46w7wgja32sxq2     node2               Ready               Active              Unreachable         18.09.6
669oibd49tieprs71pa4q8dqb     node3               Down                Active              Unreachable         18.09.6

$ vagrant ssh node2 -- docker node ls
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded

$ vagrant ssh node1 -- docker node ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.

Check the containers:
$ vagrant ssh node1 -- docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
f1cab158d24e        alpine:latest       "ping docker.com"   14 minutes ago      Up 14 minutes                           helloworld.1.lly2p3ce15o5el109pz75hjzf

$ vagrant ssh node2 -- docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

$ vagrant ssh node3 -- docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
81006e80b77e        alpine:latest       "ping docker.com"   21 minutes ago      Up 21 minutes                           helloworld.1.y5d8388s98tao0meljw43ac6u

Restore the connectivity:
$ vagrant ssh node2 -- sudo iptables -D INPUT -i eth1 -j DROP
$ vagrant ssh node2 -- sudo iptables -D OUTPUT -o eth1 -j DROP
$ vagrant ssh node3 -- sudo iptables -D INPUT -i eth1 -j DROP
$ vagrant ssh node3 -- sudo iptables -D OUTPUT -o eth1 -j DROP

Check the status:
$ vagrant ssh node1 -- docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
mklyqd44iq77mw8mn9im2gn07 *   node1               Ready               Active              Reachable           18.09.6
l7kcf6g169f46w7wgja32sxq2     node2               Ready               Active              Unreachable         18.09.6
669oibd49tieprs71pa4q8dqb     node3               Ready               Active              Reachable           18.09.6

$ vagrant ssh node1 -- docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
lly2p3ce15o5        helloworld.1        alpine:latest       node1               Running             Running 27 minutes ago
y5d8388s98ta         \_ helloworld.1    alpine:latest       node3               Shutdown            Shutdown 8 seconds ago

$ vagrant ssh node1 -- docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
mklyqd44iq77mw8mn9im2gn07 *   node1               Ready               Active              Reachable           18.09.6
l7kcf6g169f46w7wgja32sxq2     node2               Ready               Active              Leader              18.09.6
669oibd49tieprs71pa4q8dqb     node3               Ready               Active              Reachable           18.09.6

$ vagrant ssh node1 -- docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
f1cab158d24e        alpine:latest       "ping docker.com"   28 minutes ago      Up 28 minutes                           helloworld.1.lly2p3ce15o5el109pz75hjzf

$ vagrant ssh node3 -- docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                            PORTS               NAMES
81006e80b77e        alpine:latest       "ping docker.com"   35 minutes ago      Exited (137) About a minute ago                       helloworld.1.y5d8388s98tao0meljw43ac6u

$ vagrant ssh node1 -- docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE                 ERROR               PORTS
lly2p3ce15o5        helloworld.1        alpine:latest       node1               Running             Running 28 minutes ago
y5d8388s98ta         \_ helloworld.1    alpine:latest       node3               Shutdown            Shutdown about a minute ago

Conclusion

Answers to the questions:

How does Docker Swarm behave in the different parts of a cluster during
a network partition? Will it recover by itself when the partition
is gone?

The partitioned out nodes don't respond to the service status messages
The cluster does recover by itself


What happens to a single-instance service when that node is getting
partitioned off?

Another copy of the container is getting run in the healthy part
of the cluster, which effectively makes 2 copies of the single-instance
service to be run at the same time. Apparently this is not configurable
at the moment.


## Vagrantfile
Vagrant.configure("2") do |config|
  config.vm.box = "debian/stretch64"

  config.vm.provider "virtualbox" do |vb|
    vb.memory = "512"
  end

  config.vm.synced_folder ".", "/vagrant",
    type: "virtualbox",
    automount: true

  config.vm.define "node1" do |m|
    m.vm.hostname = "node1"
    m.vm.network "private_network", ip: "192.168.98.11"
    m.vm.provision "docker" do |d|
      d.post_install_provision "shell",
        inline: "docker swarm init --advertise-addr 192.168.98.11 \
            && docker swarm join-token -q manager > /vagrant/manager_token"
    end
  end

  config.vm.define "node2" do |m|
    m.vm.hostname = "node2"
    m.vm.network "private_network", ip: "192.168.98.12"
    m.vm.provision "docker" do |d|
      d.post_install_provision "shell",
        inline: "docker swarm join \
            --token `cat /vagrant/manager_token` \
            192.168.98.11:2377"
    end
  end

  config.vm.define "node3" do |m|
    m.vm.hostname = "node3"
    m.vm.network "private_network", ip: "192.168.98.13"
    m.vm.provision "docker" do |d|
      d.post_install_provision "shell",
        inline: "docker swarm join \
            --token `cat /vagrant/manager_token` \
            192.168.98.12:2377"
    end
  end
end
	Vagrant.configure("2") do \|config\|
	config.vm.box = "debian/stretch64"

	config.vm.provider "virtualbox" do \|vb\|
	vb.memory = "512"
	end

	config.vm.synced_folder ".", "/vagrant",
	type: "virtualbox",
	automount: true

	config.vm.define "node1" do \|m\|
	m.vm.hostname = "node1"
	m.vm.network "private_network", ip: "192.168.98.11"
	m.vm.provision "docker" do \|d\|
	d.post_install_provision "shell",
	inline: "docker swarm init --advertise-addr 192.168.98.11 \
	&& docker swarm join-token -q manager > /vagrant/manager_token"
	end
	end

	config.vm.define "node2" do \|m\|
	m.vm.hostname = "node2"
	m.vm.network "private_network", ip: "192.168.98.12"
	m.vm.provision "docker" do \|d\|
	d.post_install_provision "shell",
	inline: "docker swarm join \
	--token `cat /vagrant/manager_token` \
	192.168.98.11:2377"
	end
	end

	config.vm.define "node3" do \|m\|
	m.vm.hostname = "node3"
	m.vm.network "private_network", ip: "192.168.98.13"
	m.vm.provision "docker" do \|d\|
	d.post_install_provision "shell",
	inline: "docker swarm join \
	--token `cat /vagrant/manager_token` \
	192.168.98.12:2377"
	end
	end
	end