Skip to content

Instantly share code, notes, and snippets.

@neolit123
Last active July 5, 2020 21:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save neolit123/4ecd657175f17df0a8207e7ffb2c0ec4 to your computer and use it in GitHub Desktop.
Save neolit123/4ecd657175f17df0a8207e7ffb2c0ec4 to your computer and use it in GitHub Desktop.
testing-ha-kinder

testing concurrent join failures with kinder

this is a short guide for testing concurrent HA cluster join with kinder. currently it requires patching both kubeadm and kinder.

you need:

  • go 1.12+
  • docker 18.0{6|9} (known to work)
  • clones of kuberentes/kubeadm and kubernetes/kubernetes
  • a host machine with a good chunk of RAM and 2 CPU cores

PR to add retry for the kubeadm etcd client's MemberAdd. neolit123/kubernetes#2

this patch fixes errors when the etcd clusters attempts to grow concurrently. you can choose not to apply this yet if you wish to see the actual errors.

cd kuberenetes
# apply patch
wget https://github.com/neolit123/kubeadm/pull/1.diff
git apply 1.diff

PR to enable kinder do with --parallel: neolit123/kubeadm#1

this allows concurrent operations like kubeadm join on multiple nodes at once.

cd kubeadm
# apply patch
wget https://github.com/neolit123/kubernetes/pull/2.diff
git apply 2.diff

to build kinder:

# build kinder
cd kubeadm
GO111MODULE=on go build
# resulting binary is `kinder`
# symlink or add to PATH

each time you make kubernetes changes (e.g. patching kubeadm) you need to build a new kind(er) node-image:

cd kubernetes
kinder build node-image --image kinder/node:latest

to create a kinder cluster (node provisioning):

kinder create cluster --image kinder/node:latest --control-plane-nodes 3 --worker-nodes 3
# adjust the number of CP and W nodes if needed

note that you if start getting api-server pod crashes the host might be out of RAM. 16GB works for a 3CP,3W setup in my testing, but i'm getting errors if i try 4CP. probably a good idea to monitor the memory usage.

to init the primary control plane and join the rest of the nodes:

# init primary CP
kinder do kubeadm-init

# set the KUBECONFIG to the kinder cluster
# technically you can do this once as long as the cluster re-uses the same name (by default "kind")
export KUBECONFIG=$(kinder get kubeconfig-path)

# join the worker and CP nodes.
kinder do kubeadm-join --parallel

in my testing i tried firing the join process right after init (without waiting) to stress test the components. it holds up pretty well.

note that --parallel dumps the output of the joining nodes all at once and the stdout of those will be mixed. in a separate tab i have kubectl get po --all-namespaces to watch for failing pods.

without the kubeadm patch a CP node join can flake with:

error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: etcdserver: unhealthy cluster

with or without the kubeadm patch a worker node can flake with:

error execution phase kubelet-start: error uploading crisocket: error patching node "kind-worker3" through apiserver: etcdserver: request timed out

to destroy the cluster:

kinder delete cluster

you can also try to reset the nodes instead of re-creating the cluster:

kinder do kubeadm-reset

more docs on kinder: https://github.com/kubernetes/kubeadm/blob/master/kinder/doc/test-HA.md

testing join failures with kind

there is a separate issue (the one we call "the configmap/load-balancer" issue) where kind serial CP node join can fail. kubernetes-sigs/kind#588

a couple of differences between kind and kinder:

  • kind uses nginx as the LB and also containerd as the CR on the nodes.
  • kinder uses haproxy as the LB and docker on the nodes.

write this config to a file called config.yaml:

# a cluster with 3 control-planes and 3 workers
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker

the same system specs as the above kinder testing scenarios are required.

clone kuberentes-sigs/kind and kubernetes/kubernetes.

build kind and a node image:

cd kind
GO111MODULE=on go build
# install the kind binary to PATH or symlink
cd kubernetes
kind build node-image --kube-root=$(pwd)

create a cluster:

kind create cluster --config=<path-to-config.yaml> --image kindest/node:latest

this can flake with the following error:

I0604 19:15:10.310249     760 round_trippers.go:438] GET https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 401 Unauthorized in 1233 milliseconds
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized 

to delete the cluster:

kind delete cluster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment