This (and related gists) captures how i created my docker swarm architecture. This is intended mostly for my own notes incase i need to re-creeate anything later! As such expect some typos and possibly even an error...
Each major task has its own gist, this is to help with maitainability long term.
- Install Debian VM for each docker host
- install Docker
- Configure Docker Swarm
- Install Portainer
- Install KeepaliveD
- glusterFS disk prep, install & config
- gluster FS plugin for docker (optional )
- example stack templates:
- adguard 2 node + adguard settings sync
- cloudflare Dynamic DNS Updater
- infinitude carrier infinity thermostat control
- Mosquitto MQTT
- Nginx Proxy Manager (NPM)
- ouath2-proxy manager
- migrate portainer agent to be managed by portainer not recommeded
- shepherd to update swarm images
- traefik
- uPoller (unifi poller)
- watchtower
- wordpress - todo
- portception (portainer deployed by portainer - do not attempt)
- auto lable nodes with name of running containers
- ensure every container stays running if any of the following fail (one VM, one hypervisor, one docker service)
- remove chance of blackhole requests (aka eliminate the use of DNS round robin to address the service)
- enable the use of replicated state so any container can start on any single docker swarm node and fail between nodes and see the data it needs to
- enable safe replicated shared volume across all nodes that allow state to be replicated and accessible from all nodes and allows for use of datatbases like mariadb which will corrupt if placed on NFS or CIFS/SMB shares across the network
- make it easy to backup with my synology (this model enabled me to easily backup using active backup for business)
- all seems to be functioning nearly a year layte
- I switched fully from native nginx container to NPM
- i elimnated NFS and iSCSI and moved all containers with state to running on GlusterFS inlcuding things with databases like wordpress
- i plan to move the VMs from Hyper-V to my new proxmox cluster
- I wanted to continue to use docker, docker-compose, docker swarm & portainer due to existing skills
- I have no interest at this time in k8s (i don't use it at work and never will)
- Start simple, even if that means i do what i shouldn't (this is just a home network)
- This is small, the containers include (nginx reverse proxy, oauth2-proxy, wordpress site + database, mqtt, upoller, cloudflare ddns) so bear that in mind, this isn't designed for super throuput or scale - its designed for some resilliency.
- I want to deploy all services (containers) with stack templates and possibly contribute back to portainer template repo
- The clustered file system must support databases on it (like mariadb)
- Debian for my docker host VMs - i seem to gel with debian and it (and other debian derivatives) seems to play nice with most contaniners
- I will only use package versions included in the debian distri (bullseye stable)
- I chosee glusterfs as my clustered, replicated file system
- Gluster volumes will be deployed in dispersed mode
- I mapped seperate VHDs into the docker hosts one for OS and one for gluster - this is to prevent risk of infinite boot loops
- my gluster service will be installed on the docker host VMs. Best practice dicates they should be seperate VMs for scale. But as all VMs share the same host CPU this really gives no benefit. If this turns out to be bad decision i will change.
- I wont tear down my current NFS and iSCSI mapped volumes (not shown) until glusterfs has been shown to run ok and survive reboots etc
Docker containers are ephemeral and generally loose all their data when they are stopped. For most docker containers there is some level of confguration state you need to pass to the container (variable, file, folders of data). Simillarly many containers want to persist data state (databases, files etc)
On a single node docker most people map a directory or file on the host into the container as a volumen or bind mount. We also see the following more advanced techniques used:
- mount a shared CIFS or NFS volume at bootime on the docker hosts
- defining a CIFS volume and mapping it into the container at runtime (this avoids editing fstab on the host)
- same as aove but with NFS
- using configs - if you have just a single, readi only, confg file that needs to be read this can be defined.
In a swarm where you want a container to run on any node you need to find a way to make the data available on all nodes in a safe effective way.
If you have a simple container that only needs environment variables to be cofigure you can do that directly when you deploy the portainer template as a portaineer stack. See this cloudflare dynamic dns updater as an example.
- Only #4 offers a safe way to make this happen (the 'config' is available to all nodes) - but this is super restrictive and doesn't help with containers that need to store more state and read/write that state. See this mosquitto mqtt example
- #1 this can work and you can mount the shares to multiple nodes via fstab. Typically databases cannot be placed on these shares and will ultimately corrupt. You do have to be careful to only have one container writing to any given file to avoid potentials issues.
- #2 and #3 - thishas the advantage of not being generall mounted to the host OS, but mount on demand by the container, this reduced all the tedious mucking about is
hyperspacefstab. You do need to use the volumes UI in portaine for this.
and for nost folks NFS/CIFS shares are not replicated for high availability.
This is why in this architecture i have chose to see if I can overcome these limitations uings glusterfs.
I have ceph up and running on the proxmox hosts node and mounted, i store VM disks on this.
I have a cephFS pool and assume this would be the ideal place to store ~50GB bind mounts for each container.
I am struggling to understand, but let me try, are you saying the Debian docker VMs should have ceph installed on them and then mount over the network?
AKA
docker VM1 uses auto mount to mount a ceph network target on host1
docker VM2 uses auto mount to mount a ceph network target on host2
etc
and then I pin VM1 to host1 (never let it migrate), VM2 to host2 (etc)
lastly i don't get why i would use automount and not directly do this by fstab on the debian docker vm?
https://docs.ceph.com/en/nautilus/cephfs/fstab/