kawaii/docker-swarm-ceph-v1.md

## docker-swarm-ceph-v1.md

      
    Raw
  

              docker-swarm-ceph-v1.md
            
          
    This document goes into depth on my experiments with deploying a Docker Swarm with a CEPH block storage cluster as a backend for volumes and persistent data. We'll be using what appears to be the latest stable release of CEPH, v13.2.2 "Mimic". This was a fairly mind-numbing procedure to learn and perfect, CEPH's documentation is quite complicated (maybe I'm just dumb).
Prerequisites

Here's what I deployed in the morning before setting up the cluster;

x5 Debian 9.6 VMs each with 1 vCPU and 1024MB of RAM
each VM is named sequentially, cpu-01 through to cpu-05
preconfigure your ssh config and Salt roster files if you need to
a spare, /dev/vdb 10GB block device attached to each VM
Docker Swarm deployed with 3 managers and 2 workers

ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
ixezgkrbnhbrrkwyotydfuzpc *   cpu-01              Ready               Active              Reachable           18.09.0
hvzdgrzvhe7x8bbk8aaxywwia     cpu-02              Ready               Active                                  18.09.0
wehvcsct2ixoqbhlsld7n4ey1     cpu-03              Ready               Active              Leader              18.09.0
i9ov3qg0fgioao9v5ydfj6swu     cpu-04              Ready               Active                                  18.09.0
r3fmz5ty1rhdcsy3yp3tb9boe     cpu-05              Ready               Active              Reachable           18.09.0


ceph-deploy installed locally on my laptop, instructions here
passwordless sudo and SSH access via keys predeployed to the VMs

Each of these VMs also had three VLAN devices attached, a swarmnet (range 10.0.1.0/24), cephnet-public (range 192.168.100.0/24) and a cephnet-private (range 192.168.200.0/24). These are supposed to represent physical NIC devices in your dedicated servers. We use dedicated networks/devices to prevent one service saturating the overall connectivity of the server. You can see more about CEPH network configuration here. Information regarding Docker Swarm networking can be found here.
CEPH Deployment

According to the 'Create a Cluster' document, we first need to do this;
ceph-deploy new --public-network=192.168.100.0/24 \
                --cluster-network=192.168.200.0/24 \
		cpu-0{1..5}

Which gives us an output that looks like this for each server;
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy new cpu-01 cpu-02 cpu-03 cpu-04 cpu-05
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
[ceph_deploy.new][DEBUG ] Monitor initial members are ['cpu-01', 'cpu-02', 'cpu-03', 'cpu-04', 'cpu-05']
[ceph_deploy.new][DEBUG ] Monitor addrs are ['10.0.0.11', '10.0.0.12', '10.0.0.13', '10.0.0.14', '10.0.0.15']
[ceph_deploy.new][DEBUG ] Creating a random mon key...
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...

This seems to just prepare the local ceph.conf file with the details of your initial nodes, next we need to run;
ceph-deploy install cpu-0{1..5}

That command takes a while to run, depending on the number of nodes in your cluster, probably a good time to get a snack. Now it's time to "deploy the initial monitors" to our cluster, so go ahead and do that;
ceph-deploy mon create-initial

Next, we need to deploy the administration keys to each of the nodes in the cluster using the command below;
ceph-deploy admin cpu-0{1..5} 

Now we can make three of our five nodes into 'managers', I believe these operate in the same way as Docker Swarm managers and maintain a quorum of sorts amongst the nodes. For that reason it's recommended to have an odd number of managers;
ceph-deploy mgr create cpu-0{1,3,5}

Hopefully, your VMs each have an unused block device attached as /dev/vdb, if not - do that now! We're going to create our initial OSD (object storage device) from it;
ceph-deploy osd create --data /dev/vdb cpu-01
ceph-deploy osd create --data /dev/vdb cpu-02
ceph-deploy osd create --data /dev/vdb cpu-03
ceph-deploy osd create --data /dev/vdb cpu-04
ceph-deploy osd create --data /dev/vdb cpu-05

Once we have our OSDs provisioned we can create an RBD 'pool' for storing actual data, refer to this page for more in-depth information about all the different configuration options. I had to run these commands on one of my CEPH manager nodes;
ceph tell mon.\* injectargs '--osd_pool_default_size=3'
ceph osd pool create rbd 512

This means we have 5 OSDs and a pool named 'rbd'. Let's install the rexray/rdb storage plugin for Docker. I used salt-ssh to do this for me because I'm lazy, you should too;
salt-ssh "cpu-0?" cmd.run 'docker plugin install --grant-all-permissions rexray/rbd RBD_DEFAULTPOOL=rbd'

Let's create some Docker volumes using our new driver. Note the --opt=size directive measures the resulting volume in gigabytes. I mistakenly set --opt=size=1024 as my first volume and attempted to create a volume with a size of 1TB!
ssh cpu-03 docker volume create --driver rexray/rbd --opt=size=1 --name rbd.testvol1 \
           && docker volume create --driver rexray/rbd --opt=size=2 --name rbd.testvol2

More information regarding the creation of rexray/rbd volumes can be found here. Extended and highly useful information regarding creation of volumes in other CEPH pools can be found here.
A docker volume ls from any node gives us an output like this;
DRIVER              VOLUME NAME
rexray/rbd:latest   testvol1
rexray/rbd:latest   testvol2

We can also probe rbd itself for info about these new volumes;
kane@cyberia:~|⇒  ssh cpu-05 sudo rbd ls           
testvol1
testvol2
kane@cyberia:~|⇒  ssh cpu-05 sudo rbd info testvol2
rbd image 'testvol2':
	size 2GiB in 512 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.a496b74b0dc51
	format: 2
	features: layering
	flags: 
	create_timestamp: Tue Dec 25 12:21:02 2018


Let's try and use one of these volumes to deploy a PostgreSQL service, note - I had to do modprobe rbd on each of my servers before they were able to make use of the CEPH volumes;
docker service create -e POSTGRES_USER=docker \
                      -e POSTGRES_PASSWORD=password \
		      --mount type=volume,source=testvol2,destination=/var/lib/postgresql/data \
		      postgres:11.1-alpine

Useful Links

These are resources I discovered during my Christmas adventures, they detail various caveats, issues and bugs when doing this deployment. You'll probably need these too.

https://tracker.ceph.com/issues/37712#note-5
https://geek-cookbook.funkypenguin.co.nz/ha-docker-swarm/shared-storage-ceph/
https://ceph-users.ceph.narkive.com/P0B1gUV4/errors-when-creating-new-pool#post3
https://ceph.com/community/new-luminous-crush-device-classes/