Most of this comes originally from here: https://computingforgeeks.com/deploy-kubernetes-cluster-on-ubuntu-with-kubeadm/
Do a clean install of ubuntu 20.04 server, then apply updates:
sudo apt update
sudo apt upgrade -y
Reboot if needed
Kubernetes won’t work on a system with swap enabled – I don’t actually know why, but kubeadm will refuse to install if it’s enabled. I trust there are reasons.
-
Edit /etc/fstab
-
Delete or comment out the swap line
-
Disable for the current session:
sudo swapoff -a
Docker itself is actually optional, but is a useful tool and we use the container.io package from its repo.
# Enable the repo
sudo apt -y install curl apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# update to get new packages
sudo apt update
# install the new packages
sudo apt -y install vim git curl wget kubelet kubeadm kubectl
# Prevent kubelet, kubeadm, and kubectl from upgrading automatically -- if you skip this then an apt upgrade could update things out of order and cause issues
sudo apt-mark hold kubelet kubeadm kubectl
# Enable the repo
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
# Update
sudo apt update
# Install
sudo apt install -y containerd.io docker-ce docker-ce-cli
# note the addition of max_user_watches
sudo tee /etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
fs.inotify.max_user_watches=524288
EOF
# Apply the updates
sudo sysctl --system
# Enable modules needed for containerd
sudo tee /etc/modules-load.d/containerd.conf <<EOF
overlay
br_netfilter
EOF
# load them for this session
sudo modprobe overlay
sudo modprobe br_netfilter
# we need to reset the containerd config to work with k8s
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
sudo systemctl enable containerd
# enable kubelet
sudo systemctl enable kubelet
# pull container images -- optional, kubeadm will do this during init/join if you skip this
sudo kubeadm config images pull
The control plane nodes run an https REST api for managing the cluster; since we want a highly available system we need an endpoint that will hit one that is alive even if a system goes down. Currently we’re using haproxy configured to do tcp (layer 4) load balancing on port 6443 at one of these locations:
Note that k8s uses its own CA and certificates, so it’s easiest not to try to do SSL termination at the haproxy level – managing the certificates would be annoying.
This is only needed if you’re setting up a new cluster!
-
Update the endpoint to the load balanced endpoint we set up above
-
Note that the hostname you provide here is the one that will be on the SSL certs used for the API on all control plane nodes
-
Be sure to specify the CRI socket else some versions may try to use docker instead of containerd
# Make sure you have the cri-socket so you don't use docker instead
# You need to set up a TCP load balancer which goes to this host and update the following endpoint
sudo kubeadm init \
--control-plane-endpoint=cluster.ut2.gradecam.net:6443 \
--cri-socket /run/containerd/containerd.sock
If you have missed anything it will tell you. If you make a mistake (which I usually do) you can reset it by running sudo kubeadm reset and then run init again.
If it is successful you’ll see something like the following:
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join cluster.ut2.gradecam.net:6443 --token kg5tek.20d4o4t6em6zyqb1 \
--discovery-token-ca-cert-hash sha256:5d3039e45b52303eaa8dcdac0a48913ab64fa32d20283308fe63fdb47257c984 \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join cluster.ut2.gradecam.net:6443 --token kg5tek.20d4o4t6em6zyqb1 \
--discovery-token-ca-cert-hash sha256:5d3039e45b52303eaa8dcdac0a48913ab64fa32d20283308fe63fdb47257c984
If you lose the token or it expires before you add a new node you can recreate these using instructions found by searching the web 🙂.
Finally, since we plan to run pods on our control plane nodes as well we need to untaint them.
kubectl taint nodes --all node-role.kubernetes.io/master-
At this point you have a cluster but no network overlay or ingresses or anything. Read on 🙂
It is possible to download the yaml file and customize calico – thus far there has been no need.
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
The metrics server allows you to run kubectl top nodes or kubectl top pods and is the most common source of metrics allowing horizontal pod autoscalers (HPAs) to work. Unfortunately by default the kubelet instances don’t have a valid TLS certificate and the metrics server can’t connect; you can fix it either by generating the tokens properly or by disabling strict checking. It’s internal, but I still prefer to fix the actual problem.
We will need to do this manually for the node we already created but we can make it happen automatically (ish) for new nodes.
First edit the kubeadm-config configmap:
kubectl -n kube-system edit cm kubeadm-config
There are two changes to make. For simplicity here is an example file with comments indicating the things to add:
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
extraArgs:
authorization-mode: Node,RBAC
# Add enable-aggregator-routing: true. I'm not 100% sure this is needed
enable-aggregator-routing: "true"
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: cluster.ut2.gradecam.net:6443
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.20.5
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
ClusterStatus: |
apiEndpoints:
node1-2:
advertiseAddress: 172.19.62.141
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus
# Add this kubelet config customization, which is definitely needed
KubeletConfiguration: |
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
# the following line is what makes the real difference
serverTLSBootstrap: t
kind: ConfigMap
Next update the current version kubelet-config. (I think this is needed but am not 100% sure):
kubectl -n kube-system edit cm kubelet-config-1.20
At the end of data → kubelet (after volumeStatsAggPeriod), add:
enable-aggregator-routing: "true"
serverTLSBootstrap: true
For the existing node (new control plane you made) you need to edit /var/lib/kubelet/config.yaml and add the same two lines, then restart kubelet:
sudo systemctl restart kubelet
Once that’s done, give it a minute to make the request and then approve the cert:
# check for the certificate request
kubectl get csr
# approve the request (replace with the correct id)
kubectl certificate approve csr-7vsz9
Install the metrics server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Give it a minute or two to spin up and you should be able to check node stats with:
kubectl top nodes
and see something like:
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node1-2 713m 5% 13473Mi 21%
node2-2 334m 4% 9320Mi 29%
For this it’s easiest to use helm – if you don’t have helm version 3 installed install it first. Then
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
Finally we install the chart with some customizations:
kubectl create namespace ingress-nginx
helm -n ingress-nginx install \
--set controller.autoscaling.enabled=1 \
--set controller.resources.requests.memory=768Mi \
--set controller.autoscaling.minReplicas=4 \
--set controller.service.type=NodePort \
--set controller.service.nodePorts.http=32080 \
--set controller.service.nodePorts.https=32443 \
--set controller.config.enable-brotli=true \
--set controller.config.use-gzip=true \
--set controller.config.http2-max-requests=4000 \
--set controller.config.load-balance=ewma \
--set controller.config.keep-alive_requests=200 \
--set controller.config.upstream-keepalive-connections=640 \
--set controller.config.max-worker-connections=32768 \
--set controller.config.use-geoip2=true \
--set controller.config.maxmind-license-key=fwqXJS9xopQ0EvBg \
--set controller.config.use-forwarded-headers=true \
--set controller.config.proxy-read-timeout=90 \
--set controller.config.proxy-connect-timeout=60 \
--set controller.kind=Deployment \
ingress-nginx ingress-nginx/ingress-nginx
To change parameters later rerun this but change install to upgrade. To remove it, run
helm -n ingress-nginx uninstall ingress-nginx
Do everything you did above for the control plane node up 'til you get to the kubeadm init step. If you have the tokens and such from when you did the init great, otherwise you need to find them.
Note:
-
The token and discovery token can be recoverd later if needed.
-
You need to override the cri-socket to make sure it uses containerd. It’s possible this isn’t needed anymore, but previously it would use docker by default if it found it (which it would).
kubeadm join cluster.ut2.gradecam.net:6443 --token kg5tek.20d4o4t6em6zyqb1 \
--discovery-token-ca-cert-hash sha256:5d3039e45b52303eaa8dcdac0a48913ab64fa32d20283308fe63fdb47257c984 \
--cri-socket /run/containerd/containerd.sock
This should join the node; run kubeadm get nodes to verify that it’s there. After it’s initialized you need to approve its certificate request:
kubectl get csr
kubectl certificate approve csr-1234 # replace with the right thingy
That should be it – you’re done 🙂