Part of collection: Hyper-converged Homelab with Proxmox
One of the objectives, of building my Proxmox HA Cluster, was to store persistent Docker volume data inside CephFS folders.
There are many different options to achieve this; via Docker Swarm in LXC using Bind Mounts, Docker Third Party Plugins that are hard to use and often outdated.
Another option for Docker Volumes was running GlusterFS, storing the disks on local NVMe storage and not using CephFS. Although appealing, it's adding complexity and unnecessary resource consumption, while I already have a High Available File System (CephFS) running!
Evaluating all the available options, it became clear to me that Docker already has everything onboard what I need! With VirtioFS I already mount CephFS volumes in all my Docker Swarm VM's.
mnt_pve_cephfs_docker 9.1T 198G 9.0T 3% /srv/cephfs-mounts/docker
I just needed something to connect 'plain' Docker Volumes to the VirtioFS CephFS mounts on my systems.
Fortunately it's possible, using the Docker driver: local
option, to let docker when it's creating a volumes, to redirect the data to a CephFS folder and not store it on the local filesystem. So effectively it's a local volume that points to a VirtioFS mount in the VM connected to CephFS.
Regardless if Docker Swarm or Host mode is used, it's necessary to manually create the folder for the Docker volume on CephFS.
mkdir /srv/cephfs-mounts/docker/<volumename>
In this setup, the container can freely move to other Swarm Nodes. No matter where it 'lands' it creates a local Docker Volume or uses an already existing one that is pointing to the VirtioFS mount. No matter on which host, the volume has access to the same folder /srv/cephfs-mounts/docker/<volumename>
.
version: '3.8'
services:
web:
hostname: <hostname>
image: nginx
volumes:
- <volumename>:/var/www/html
volumes:
<volumename>:
name: <volumename> # Control the name of the volume
driver: local
driver_opts:
o: bind
device: /srv/cephfs-mounts/docker/<volumename>
type: none
If using just one Docker Host or for testing purposes, this method is better and more straightforward.
Create the volume beforehand.
docker volume create \
--driver local \
--opt type=none \
--opt o=bind \
--opt device=/srv/cephfs-mounts/docker/<volumename> \
<volumename>
Now the volume can be used in Docker Compose and with an external reference.
version: '3.8'
services:
web:
hostname: <hostname>
image: nginx
volumes:
- web_data:/var/www/html
volumes:
<volumename>:
external: true
root@dswarm01:~# docker volume inspect root_web_data
[
{
"CreatedAt": "2023-10-05T11:21:59+02:00",
"Driver": "local",
"Labels": {
"com.docker.compose.project": "root",
"com.docker.compose.version": "2.21.0",
"com.docker.compose.volume": "web_data"
},
"Mountpoint": "/var/lib/docker/volumes/root_web_data/_data",
"Name": "root_web_data",
"Options": {
"device": "/srv/cephfs-mounts/docker/web_data",
"o": "bind",
"type": "none"
},
"Scope": "local"
}
]
Inside the container, the mnt_pve_cephfs_docker 9.1T 197G 9.0T 3% /var/www/html
is connected.
root@4e4aa5c02bc8:/# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 20G 2.6G 16G 14% /
tmpfs 64M 0 64M 0% /dev
shm 64M 0 64M 0% /dev/shm
/dev/sda1 20G 2.6G 16G 14% /etc/hosts
tmpfs 3.9G 0 3.9G 0% /proc/acpi
tmpfs 3.9G 0 3.9G 0% /sys/firmware
mnt_pve_cephfs_docker 9.1T 197G 9.0T 3% /var/www/html
I'm using the 'long syntax' Yaml syntax, because like that, all the logic to run a container is inside the docker-compose.yml files.
The Docker Swarm host will be pinned to a Proxmox host and won't migrate, only the Docker containers will move.
After I create the folder for a Docker Stack Container on CephFS, Yaml does the rest, without needing any intervention.
Pools
Regarding 'replicated' pools, my systems have three SSD disks each:
WD-Blue NVMe's
Those are local drives without Ceph pools holding Proxmox and pinned VM's like the Docker Swarm Nodes. They don't need to migrate because the Docker Containers move!
Samsung 980 Pro NVMEe's
These have a single replicated Ceph pool holding my VM's and LXC's. This is setup for speed and redundancy. Each Proxmox host has a full copy of all VM's and LXC's!
WD-Green SSD's
These have various erasure coded pool, setup to maximize the amount of usable storage, I have multiple backups (SSD, Backblaze of this data) no issue if I might lose it!