this is a short guide for testing concurrent HA cluster join with kinder. currently it requires patching both kubeadm and kinder.
you need:
- go 1.12+
- docker 18.0{6|9} (known to work)
- clones of
kuberentes/kubeadm
andkubernetes/kubernetes
- a host machine with a good chunk of RAM and 2 CPU cores
PR to add retry for the kubeadm etcd client's MemberAdd
.
neolit123/kubernetes#2
this patch fixes errors when the etcd clusters attempts to grow concurrently. you can choose not to apply this yet if you wish to see the actual errors.
cd kuberenetes
# apply patch
wget https://github.com/neolit123/kubeadm/pull/1.diff
git apply 1.diff
PR to enable kinder do
with --parallel
:
neolit123/kubeadm#1
this allows concurrent operations like kubeadm join
on multiple nodes at once.
cd kubeadm
# apply patch
wget https://github.com/neolit123/kubernetes/pull/2.diff
git apply 2.diff
to build kinder:
# build kinder
cd kubeadm
GO111MODULE=on go build
# resulting binary is `kinder`
# symlink or add to PATH
each time you make kubernetes changes (e.g. patching kubeadm) you need to build a new kind(er) node-image:
cd kubernetes
kinder build node-image --image kinder/node:latest
to create a kinder cluster (node provisioning):
kinder create cluster --image kinder/node:latest --control-plane-nodes 3 --worker-nodes 3
# adjust the number of CP and W nodes if needed
note that you if start getting api-server pod crashes the host might be out of RAM. 16GB works for a 3CP,3W setup in my testing, but i'm getting errors if i try 4CP. probably a good idea to monitor the memory usage.
to init the primary control plane and join the rest of the nodes:
# init primary CP
kinder do kubeadm-init
# set the KUBECONFIG to the kinder cluster
# technically you can do this once as long as the cluster re-uses the same name (by default "kind")
export KUBECONFIG=$(kinder get kubeconfig-path)
# join the worker and CP nodes.
kinder do kubeadm-join --parallel
in my testing i tried firing the join
process right after init
(without waiting) to stress test the components.
it holds up pretty well.
note that --parallel
dumps the output of the joining nodes all at once and the stdout of those will be mixed.
in a separate tab i have kubectl get po --all-namespaces
to watch for failing pods.
without the kubeadm patch a CP node join can flake with:
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: etcdserver: unhealthy cluster
with or without the kubeadm patch a worker node can flake with:
error execution phase kubelet-start: error uploading crisocket: error patching node "kind-worker3" through apiserver: etcdserver: request timed out
to destroy the cluster:
kinder delete cluster
you can also try to reset the nodes instead of re-creating the cluster:
kinder do kubeadm-reset
more docs on kinder: https://github.com/kubernetes/kubeadm/blob/master/kinder/doc/test-HA.md
there is a separate issue (the one we call "the configmap/load-balancer" issue) where kind serial CP node join can fail. kubernetes-sigs/kind#588
a couple of differences between kind and kinder:
- kind uses nginx as the LB and also containerd as the CR on the nodes.
- kinder uses haproxy as the LB and docker on the nodes.
write this config to a file called config.yaml
:
# a cluster with 3 control-planes and 3 workers
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker
the same system specs as the above kinder testing scenarios are required.
clone kuberentes-sigs/kind
and kubernetes/kubernetes
.
build kind and a node image:
cd kind
GO111MODULE=on go build
# install the kind binary to PATH or symlink
cd kubernetes
kind build node-image --kube-root=$(pwd)
create a cluster:
kind create cluster --config=<path-to-config.yaml> --image kindest/node:latest
this can flake with the following error:
I0604 19:15:10.310249 760 round_trippers.go:438] GET https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 401 Unauthorized in 1233 milliseconds
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized
to delete the cluster:
kind delete cluster