Minio distributed setup can be configured to use upto 16 disks. These disks may be spread across upto 16 nodes (agents in Mesos). To setup distributed Minio, we need to know the (internal) IP/hostname of all the container instances before scheduling them. E.g, in a Minio distributed setup with 4 disks spread across 4 nodes with IPs IP1, IP2, IP3 and IP4 respectively, the command to run in each container is,
minio server http://IP1:9000/disk http://IP2:9000/disk http://IP3:9000/disk http://IP4:9000/disk
Marathon sets up some environment variables for each task it launches in addition to those set by Mesos. Currently, it sets up host ports, one for each assigned resource port. These are named PORT0
through PORT{N-1}
, where N
is the number of assigned ports.
We propose that Marathon should similarly set up task environment variables HOST0
through HOST{N-1}
with the IP addresses of the agents where the containers will be scheduled, where N
is the number of nodes. E.g, in a distribued Minio setup when we launch 4 containers, Marathon should set up HOST0
through HOST3
with IPs of corresponding agents where minio server containers are scheduled.
The command section of Minio's marathon.json would look like,
"args": [
"server",
"$HOST0:9000/disk",
"$HOST1:9000/disk",
"$HOST2:9000/disk",
"$HOST3:9000/disk"
]
Kubernetes provides constructs called StatefulSets
that offer predetermined unique identity to each of the requested pods. While all the
pods are still free to be scheduled wherever Kubernetes scheduler deems it suitable, application knows deterministically about the hosts
that are running its pods.
For example in Minio's case, we know the hostnames as defined by the StatefulSet documentation.
Here is how we pass the Minio server arguements in Kubernetes .yaml files.
args:
- server
- http://minio-0.minio.default.svc.cluster.local/data
- http://minio-1.minio.default.svc.cluster.local/data
- http://minio-2.minio.default.svc.cluster.local/data
- http://minio-3.minio.default.svc.cluster.local/data
Docker Swarm lets you use service or container names directly as hostnames. So, as soon as we create a service we already know the name. This makes it easy to deploy distributed Minio on Docker Swarm.