This document goes into depth on my experiments with deploying a Docker Swarm with a CEPH block storage cluster as a backend for volumes and persistent data. We'll be using what appears to be the latest stable release of CEPH, v13.2.2 "Mimic". This was a fairly mind-numbing procedure to learn and perfect, CEPH's documentation is quite complicated (maybe I'm just dumb).
Here's what I deployed in the morning before setting up the cluster;
- x5 Debian 9.6 VMs each with 1 vCPU and 1024MB of RAM
- each VM is named sequentially, cpu-01 through to cpu-05
- preconfigure your ssh
config
and Saltroster
files if you need to - a spare,
/dev/vdb
10GB block device attached to each VM - Docker Swarm deployed with 3 managers and 2 workers
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
ixezgkrbnhbrrkwyotydfuzpc * cpu-01 Ready Active Reachable 18.09.0
hvzdgrzvhe7x8bbk8aaxywwia cpu-02 Ready Active 18.09.0
wehvcsct2ixoqbhlsld7n4ey1 cpu-03 Ready Active Leader 18.09.0
i9ov3qg0fgioao9v5ydfj6swu cpu-04 Ready Active 18.09.0
r3fmz5ty1rhdcsy3yp3tb9boe cpu-05 Ready Active Reachable 18.09.0
ceph-deploy
installed locally on my laptop, instructions here- passwordless
sudo
and SSH access via keys predeployed to the VMs
Each of these VMs also had three VLAN devices attached, a swarmnet
(range 10.0.1.0/24
), cephnet-public
(range 192.168.100.0/24
) and a cephnet-private
(range 192.168.200.0/24
). These are supposed to represent physical NIC devices in your dedicated servers. We use dedicated networks/devices to prevent one service saturating the overall connectivity of the server. You can see more about CEPH network configuration here. Information regarding Docker Swarm networking can be found here.
According to the 'Create a Cluster' document, we first need to do this;
ceph-deploy new --public-network=192.168.100.0/24 \
--cluster-network=192.168.200.0/24 \
cpu-0{1..5}
Which gives us an output that looks like this for each server;
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy new cpu-01 cpu-02 cpu-03 cpu-04 cpu-05
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
[ceph_deploy.new][DEBUG ] Monitor initial members are ['cpu-01', 'cpu-02', 'cpu-03', 'cpu-04', 'cpu-05']
[ceph_deploy.new][DEBUG ] Monitor addrs are ['10.0.0.11', '10.0.0.12', '10.0.0.13', '10.0.0.14', '10.0.0.15']
[ceph_deploy.new][DEBUG ] Creating a random mon key...
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
This seems to just prepare the local ceph.conf
file with the details of your initial nodes, next we need to run;
ceph-deploy install cpu-0{1..5}
That command takes a while to run, depending on the number of nodes in your cluster, probably a good time to get a snack. Now it's time to "deploy the initial monitors" to our cluster, so go ahead and do that;
ceph-deploy mon create-initial
Next, we need to deploy the administration keys to each of the nodes in the cluster using the command below;
ceph-deploy admin cpu-0{1..5}
Now we can make three of our five nodes into 'managers', I believe these operate in the same way as Docker Swarm managers and maintain a quorum of sorts amongst the nodes. For that reason it's recommended to have an odd number of managers;
ceph-deploy mgr create cpu-0{1,3,5}
Hopefully, your VMs each have an unused block device attached as /dev/vdb
, if not - do that now! We're going to create our initial OSD (object storage device) from it;
ceph-deploy osd create --data /dev/vdb cpu-01
ceph-deploy osd create --data /dev/vdb cpu-02
ceph-deploy osd create --data /dev/vdb cpu-03
ceph-deploy osd create --data /dev/vdb cpu-04
ceph-deploy osd create --data /dev/vdb cpu-05
Once we have our OSDs provisioned we can create an RBD 'pool' for storing actual data, refer to this page for more in-depth information about all the different configuration options. I had to run these commands on one of my CEPH manager nodes;
ceph tell mon.\* injectargs '--osd_pool_default_size=3'
ceph osd pool create rbd 512
This means we have 5 OSDs and a pool named 'rbd'. Let's install the rexray/rdb storage plugin for Docker. I used salt-ssh
to do this for me because I'm lazy, you should too;
salt-ssh "cpu-0?" cmd.run 'docker plugin install --grant-all-permissions rexray/rbd RBD_DEFAULTPOOL=rbd'
Let's create some Docker volumes using our new driver. Note the --opt=size
directive measures the resulting volume in gigabytes. I mistakenly set --opt=size=1024
as my first volume and attempted to create a volume with a size of 1TB!
ssh cpu-03 docker volume create --driver rexray/rbd --opt=size=1 --name rbd.testvol1 \
&& docker volume create --driver rexray/rbd --opt=size=2 --name rbd.testvol2
More information regarding the creation of rexray/rbd
volumes can be found here. Extended and highly useful information regarding creation of volumes in other CEPH pools can be found here.
A docker volume ls
from any node gives us an output like this;
DRIVER VOLUME NAME
rexray/rbd:latest testvol1
rexray/rbd:latest testvol2
We can also probe rbd
itself for info about these new volumes;
kane@cyberia:~|⇒ ssh cpu-05 sudo rbd ls
testvol1
testvol2
kane@cyberia:~|⇒ ssh cpu-05 sudo rbd info testvol2
rbd image 'testvol2':
size 2GiB in 512 objects
order 22 (4MiB objects)
block_name_prefix: rbd_data.a496b74b0dc51
format: 2
features: layering
flags:
create_timestamp: Tue Dec 25 12:21:02 2018
Let's try and use one of these volumes to deploy a PostgreSQL service, note - I had to do modprobe rbd
on each of my servers before they were able to make use of the CEPH volumes;
docker service create -e POSTGRES_USER=docker \
-e POSTGRES_PASSWORD=password \
--mount type=volume,source=testvol2,destination=/var/lib/postgresql/data \
postgres:11.1-alpine
These are resources I discovered during my Christmas adventures, they detail various caveats, issues and bugs when doing this deployment. You'll probably need these too.