alekc/cka-study-notes.md

## cka-study-notes.md

      
    Raw
  

              cka-study-notes.md
            
          
    Exam areas


Cluster Architecture, Installation & Configuration, 25%
Workloads & Scheduling, 15%
Services & Networking, 20%
Storage, 10%
Troubleshooting, 30%

Cluster Architecture, Installation & Configuration (25%)

Data flow


Authentication (certs, password, tokens)
Authorization
Admission Control: modules which acts on objects being created, deleted, updated or connected (proxy), but not reads. Can refuse and/or modify the contents of object. Also RBAC is being checked.

RBAC

RBAC is implemented mostly by leveragin Roles (ClusterRoles), and RoleBindings
Docs: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
Roles

Are limited only to their namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: rbac-test-role
 namespace: rbac-test
rules:
 - apiGroups: [""]
   resources: ["pods"]
   verbs: ["get", "list", "watch"]
ClusterRoles

Has the same powers as roles, but on cluster level. As such it can also manage non namespaced (kubectl api-resources --namespaced=false) resources, i.e. storageClass, persistentVolume, node,users
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  # "namespace" omitted since ClusterRoles are not namespaced
  name: secret-reader
rules:
- apiGroups: [""]
  #
  # at the HTTP level, the name of the resource for accessing Secret
  # objects is "secrets"
  resources: ["secrets"]
  verbs: ["get", "watch", "list"]
RoleBindings / ClusterRoleBindings

It's what links the Roles/ClusterRoles to the actual objects. RoleBinding can reference the ClusterRole, but it will still be limited only to the namespace of the role itself.
apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "jane" to read pods in the "default" namespace.
# You need to already have a Role named "pod-reader" in that namespace.
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
# You can specify more than one "subject"
- kind: User
  name: jane # "name" is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  # "roleRef" specifies the binding to a Role / ClusterRole
  kind: Role #this must be Role or ClusterRole
  name: pod-reader # this must match the name of the Role or ClusterRole you wish to bind to
  apiGroup: rbac.authorization.k8s.io
Install cluster with kubeadm

Container runtime

https://kubernetes.io/docs/setup/production-environment/container-runtimes/
kubeadm,kubelet, kubectl

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
Create cluster with kubeadm

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
Manage a highly-available Kubernetes cluster

Stacked etcd


External etcd


Exercise:  Check how long certificates are valid

Check how long the kube-apiserver server certificate is valid on cluster2-master1. Do this with openssl or cfssl. Write the exipiration date into /opt/course/22/expiration.
Also run the correct kubeadm command to list the expiration dates and confirm both methods show the same date.
Write the correct kubeadm command that would renew the apiserver server certificate into /opt/course/22/kubeadm-renew-certs.sh.

on MASTER node
root@cluster2-master1:/etc/kubernetes/pki# ls -l
total 60
-rw-r--r-- 1 root root 1298 Aug  6 08:48 apiserver.crt
-rw-r--r-- 1 root root 1155 Aug  6 08:48 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Aug  6 08:48 apiserver-etcd-client.key
-rw------- 1 root root 1679 Aug  6 08:48 apiserver.key
-rw-r--r-- 1 root root 1164 Aug  6 08:48 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Aug  6 08:48 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1066 May  4 10:48 ca.crt
-rw------- 1 root root 1675 May  4 10:48 ca.key
drwxr-xr-x 2 root root 4096 May  4 10:48 etcd
-rw-r--r-- 1 root root 1078 May  4 10:48 front-proxy-ca.crt
-rw------- 1 root root 1679 May  4 10:48 front-proxy-ca.key
-rw-r--r-- 1 root root 1119 Aug  6 08:48 front-proxy-client.crt
-rw------- 1 root root 1679 Aug  6 08:48 front-proxy-client.key
-rw------- 1 root root 1679 May  4 10:48 sa.key
-rw------- 1 root root  451 May  4 10:48 sa.pub

use openssl to find out the expiration date:
openssl x509  -noout -text -in /etc/kubernetes/pki/apiserver.crt | grep Validity -A2
And we use the (still alpha) feature from kubeadm to get the expiration too:
kubeadm certs check-expiration | grep apiserver
Write command.
# /opt/course/22/kubeadm-renew-certs.sh
kubeadm certs renew apiserver
Assessing cluster health

kubectl get componentstatus  is deprecated as of 1.20. A suitable replacement includes probing the API server directly, For example, on a master node, run  curl -k https://localhost:6443/livez?verbose  which returns:
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
.....etc
Provision underlying infrastructure to deploy a Kubernetes cluster

The topology choices above will influence the underlying resources that need to be provisioned. How these are provisioned are specific to the underlying cloud provider. Some generic observations:

Disable swap.
Leverage cloud capabilities for HA - ie using multiple AZ's.
Windows can be used for worker nodes, but not control plane.

Perform version upgrade

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
Node maintenance

Draining a node.
kubectl drain  (--ignore-daemonsets)
Uncordon
kubectl uncordon 
ETCD backup and restore

https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster
Take snapshot
ETCDCTL_API=3 etcdctl snapshot save snapshot.db --cacert /etc/kubernetes/pki/etcd/server.crt --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key
Verify backup
sudo ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshot.db
Restore ETCDCTL_API=3 etcdctl snapshot restore snapshot.db
Workloads & Scheduling (15%)

Understand deployments and how to perform rolling update and rollbacks

Deployment

Deployments are intended to replace Replication Controllers. They provide the same replication functions (through Replica Sets) and also the ability to rollout changes and roll them back if necessary
kubectl create deployment nginx-deploy --replicas=3 --image=nginx:1.19
To update an existing deployment we can perform either:

Rolling Update: Pods will be gradually replaced. No downtime, old and new version coexist at the same time.
Recreate: pods will be deleted and recreated (it will involve downtime)

Check the rollout status
kubectl -n ngx rollout status deployment/nginx-deploy
deployment "nginx-deploy" successfully rolled out

kubectl -n ngx get deploy
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deploy   3/3     3            3           44s

Scale number of pods to 2
kubectl scale --replicas 2 deployment/nginx-deploy
Change the image tag to 1.20
kubectl edit deployment nginx-deploy
Verify that the replica set was created
╰─ k get rs                                                                                                                                  
NAME                      DESIRED   CURRENT   READY   AGE
nginx-deploy-57767fb8cf   0         0         0       4m47s
nginx-deploy-7bbd8545f9   2         2         2       82s

Check the history of deployment and rollback to previous version
k rollout history deployment nginx-deploy                                                                                                 ─╯
deployment.apps/nginx-deploy 
REVISION  CHANGE-CAUSE
1         <none>
2         <none>

k rollout undo deployment nginx-deploy
k rollout undo deployment nginx-deploy --to-revision 5
Use ConfigMaps and Secrets to configure applications

Create a pod with the latest busybox image running a sleep for 1 hour, and give it an environment variable named PLANET with the value blue
kubectl run hazelcast --image=busybox:latest --env="PLANET=blue" -- sleep 3600 
k exec -it hazelcast  -- env | grep PLANET

Create a configmap named space with two values planet=blue and moon=white.
cat << EOF > system.conf
planet=blue
moon=white
EOF

kubectl create configmap space-system --from-file=system.conf

-   Mount the configmap to a pod and display it from the container through the path  /etc/system.conf
kubectl run nginx --image=nginx -o yaml --dry-run=client > pod.yaml

piVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  volumes:
    - name: config-volume
      configMap: 
        name: space
  containers:
  - image: nginx    
    name: nginx
    volumeMounts:
      - name: config-volume
        mountPath: /etc/system.conf
        subPath: system.conf
you can also mount individual keys as from configmaps as env var
apiVersion: v1
kind: Pod
metadata:
 name: config-test-pod
spec:
 containers:
 - name: test-container
   image: busybox
   command: [ "/bin/sh", "-c", "env" ]
   env:
     - name: BLOG_NAME
       valueFrom:
         configMapKeyRef:
           name: vt-cm
           key: blog
 restartPolicy: Never
-   Create a secret from 2 files username and a password.
echo -n 'admin' > username
echo -n 'admin-pass' > password

kubectl create secret generic admin-cred --from-file=username --from-file=password

Create a pod with 2 env vars (USERNAME and PASSWORD) and mount secret's values in those vars
apiVersion: v1
kind: Pod
metadata:
  name: secret1
spec:
  containers:
  - env:
    - name: USERNAME
      valueFrom:
        secretKeyRef:
          name: admin-cred
          key: username
    - name: PASSWORD
      valueFrom:
        secretKeyRef:
          name: admin-cred
          key: password
    image: nginx
    name: secret1

**-   Mount the secrets to a pod to  /admin-cred/  folder and display it. **
apiVersion: v1
kind: Pod
metadata:
  name: secret2
spec:
  containers:
  - image: nginx
    name: secret2
    volumeMounts:
      - name: admin-cred
        mountPath: /admin-cred/
  volumes:
    - name: admin-cred
      secret:
        secretName: admin-cred
  restartPolicy: Always
display
k exec secret2 -- ls -l /admin-cred/                                                                                                                                                                   
total 0
lrwxrwxrwx 1 root root 15 Aug  4 14:34 password -> ..data/password
lrwxrwxrwx 1 root root 15 Aug  4 14:34 username -> ..data/username
Understand the primitives used to create robust, self-healing, application deployments

Deployments facilitate this by employing a reconciliation loop to check the number of deployed pods matches what’s defined in the manifest. Under the hood, deployments leverage ReplicaSets, which are primarily responsible for this feature.
If deployment is using a volume, then multiple pods will have access to the same volume (no isolation), and as consequence there will be data race in case of multiple writes.
Stateful Sets are similar to deployments, for example they manage the deployment and scaling of a series of pods. However, in addition to deployments they also provide guarantees about the ordering and uniqueness of Pods. A StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
StatefulSets are valuable for applications that require one or more of the following.

Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.

Each pod created by the StatefulSet has an ordinal value (0 through # replicas - 1) and a stable network ID (which is statefulsetname-ordinal) assigned to it
Understand how resource limits can affect Pod scheduling

At a namespace level, we can define resource limits.
This enables a restriction in resources, especially helpful in multi-tenancy environments and provides a mechanism to prevent pods from consuming more resources than permitted, which may have a detrimental effect on the environment as a whole.
We can define the following:

Default memory / CPU  requests & limits  for a namespace.
If a container is created in a namespace with a default request/limit value and doesn't explicitly define these in the manifest, it inherits these values from the namespace
Minimum and Maximum memory / CPU  constraints  for a namespace.
If a pod does not meet the range in which the constraints are valued at, it will not be scheduled.
Memory/CPU  Quotas  for a namespace.
Control the total amount of CPU/memory that can be consumed in the namespace as a whole.

Exercise:

Create a new namespace called "tenant-b-100mi"
Create a memory limit of 100Mi for this namespace
Create a pod with a memory request of 150Mi, ensure the limit has been set by verifying you get a error message.

kubectl create ns tenant-b-100mi

Create limit
apiVersion: v1
kind: LimitRange
metadata:
  name: tenant-b-memlimit
  namespace: tenant-b-100mi
spec:
  limits:
  - max:
      memory: 100Mi
    type: Container
Deployment
apiVersion: v1
kind: Pod
metadata:
  name: default-mem-demo
  namespace: tenant-b-100mi
spec:
  containers:
  - name: default-mem-demo
    image: nginx
    resources:
      requests:
        memory: 150Mi
It should give
The Pod "default-mem-demo" is invalid: spec.containers[0].resources.requests: Invalid value: "150Mi": must be less than or equal to memory limit
Awareness of manifest management and common templating tools

Kustomize
Helm
???
Services and networking - 20%

Each host is responsible for one subnet of the CNI range. In this example, the left host is responsible for 10.1.1.0/24, and the right host 10.1.2.0/24. The overall pod CIDR block may be something like 10.1.0.0/16.

Virtual ethernet adapters are paired with a corresponding Pod network adapter. Kernel routing is used to enable Pods to communicate outside the host it resides in.
Understand connectivity between Pods

Every Pod gets its own IP address which is shared between it's containers.
Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):

Pods on a node can communicate with all pods on all nodes without NAT
Agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node

Note: When running workloads that leverage  hostNetwork: Pods in the host network of a node can communicate with all pods on all nodes without NAT
Exercise:

Deploy  the following manifest
Using  kubectl, identify the Pod IP addresses
Determine the DNS name of the service.

kubectl apply -f https://raw.githubusercontent.com/David-VTUK/CKAExampleYaml/master/nginx-svc-and-deployment.yaml
kubectl get po -l app=nginx -o wide
$ k get svc
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
nginx-service   ClusterIP   172.20.252.175   <none>        80/TCP    2m54s


$ k run --restart=Never --image=busybox --rm -it  busybox -- nslookup 172.20.252.175
Server:		172.20.0.10
Address:	172.20.0.10:53

175.252.20.172.in-addr.arpa	name = nginx-service.exam-study.svc.cluster.local

pod "busybox" deleted

Understand ClusterIP, NodePort, LoadBalancer service types and endpoint

Since pods are effemeral, we need to have an entrypoint in front of them (service). They (services) can take following forms:

ClusterIP : not exposed, internal only
LoadBalancer: External, requires cloud provider or custom software implementation (I.e. metalLb), unlikely to come on CKA
NodePort: External, requires access to the nodes directly, ports starts from 32000
Ingress Resource: L7 An Ingress can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, and offer name based virtual hosting. An Ingress controller is responsible for fulfilling the Ingress, usually with a loadbalancer, though it may also configure your edge router or additional frontends to help handle the traffic. Note: ingress usually uses one of the services to provide the routing to the pods.


Exercise

Create three  deployments  of your choosing
Expose one of these deployments with a service of type  ClusterIP
Expose one of these deployments with a service of type  Nodeport
Expose one of these deployments with a service of type  Loadbalancer

Note, this remains in  pending  status unless your cluster has integration with a cloud provider that provisions one for you (ie AWS ELB), or you have a software implementation such as  metallb


kubectl create deployment nginx-clusterip --image=nginx --replicas 1
kubectl create deployment nginx-nodeport --image=nginx --replicas 1
kubectl create deployment nginx-loadbalancer --image=nginx --replicas 1

kubectl expose deployment nginx-clusterip --type="ClusterIP" --port="80"
kubectl expose deployment nginx-nodeport --type="NodePort" --port="80"
kubectl expose deployment nginx-loadbalancer --type="LoadBalancer" --port="80"

Know how to use Ingress controllers and Ingress resources

Ingress exposes HTTP and HTTPS routes from outside the cluster to services within a cluster. Ingress consists of two components. Ingress Resource is a collection of rules for the inbound traffic to reach Services. These are Layer 7 (L7) rules that allow hostnames (and optionally paths) to be directed to specific Services in Kubernetes. The second component is the Ingress Controller which acts upon the rules set by the Ingress Resource, typically via an HTTP or L7 load balancer. It is vital that both pieces are properly configured to route traffic from an outside client to a Kubernetes Service
kubectl create ingress ingressName --class=default --rule="foo.com/bar=svcName:80" -o yaml --dry-run=client
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingressName
spec:
  ingressClassName: default
  rules:
  - host: foo.com
    http:
      paths:
      - backend:
          service:
            name: svcName
            port:
              number: 80
        path: /bar
        pathType: Exact
Exercise
Create an  ingress  object named  myingress  with the following specification:

Manages the host  myingress.mydomain
Traffic to the base path  /  will be forwarded to a  service  called  main  on port 80
Traffic to the path  /api  will be forwarded to a  service  called  api  on port 8080

kubectl create ingress myingress --rule="myingress.mydomain/=main:80" --rule="myingress.mydomain/api=api:8080"

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myingress
spec:
  rules:
  - host: myingress.mydomain
    http:
      paths:
      - backend:
          service:
            name: main
            port:
              number: 80
        path: /
        pathType: Exact
      - backend:
          service:
            name: api
            port:
              number: 8080
        path: /api
        pathType: Exact
Know how to configure and use CoreDNS

As of 1.13, coredns has replaced kube-dns as the facilitator of cluster DNS and runs as pods.
Check configuration of a pod:
kubectl run busybox --image=busybox -- sleep 9000
kubectl exec -it busybox sh
/ # cat /etc/resolv.conf 
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local virtualthoughts.co.uk
options ndots:5
Common DNS FQDN:


Pods- [Pod IP separated by dashes].[Namespace].pod.cluster.local
Services - [ServiceName].[Namespace].svc.cluster.local

Headless services

apiVersion: v1
kind: Service
metadata:
  name: test-headless
spec:
  clusterIP: None
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: web-headless
Headless services are those without a cluster ip, but will respond with a list of IP’s of pods that are applicable at that particular moment in time. This is useful if your app needs to obtain (through dns) list of all pod ips running a particular service.
Override the dns config for a given pod

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 8.8.8.8
    searches:
      - ns1.svc.cluster.local
      - my.dns.search.suffix
    options:
      - name: ndots
        value: "2"
      - name: edns0
Coredns configuration

Coredns config can be found in a config map
kubectl get cm coredns -n kube-system -o yaml
Exercise:

Identify the configuration location of  coredns
Modify the coredns config file so DNS queries not resolved by itself are forwarded to the DNS server  8.8.8.8
Validate the changes you have made
Add additional configuration so that all DNS queries for  custom.local  are forwarded to the resolver  10.5.4.223

kubectl get cm coredns -n kube-system                                                
NAME      DATA   AGE
coredns   2      94d

kubectl edit cm coredns -n kube-system 

replace:
forward . /etc/resolv.conf

with
forward . 8.8.8.8

Add custom block
custom.local:53 {
        errors 
        cache 30
        forward . 10.5.4.223
        reload
    }

Choose an appropriate container network interface plugin

You must deploy a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed.
https://kubernetes.io/docs/concepts/cluster-administration/addons/#networking-and-network-policy
Storage 10%

Understand storage classes, persistent volumes

StorageClass

https://kubernetes.io/docs/concepts/storage/storage-classes/
A StorageClass provides a way for administrators to describe the "classes" of storage they offer
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: localdisk
reclaimPolicy: Delete
allowVolumeExpansion: true
provisioner: kubernetes.io/no-provisioner
allowVolumeExpansion: if not set to true, it will be impossibile to resize the pv
persistentVolumeReclamePolicy:

Retained: On deletion of pv,  data is not deleted.  Manual intervention is required for the storage release
Recycled (deprecated)
Deleted: delete both k8s object and cloud volume as well. (works only on clouds)

PersistentVolume

A persistentvolume object can be used to request storage from a storageclass and is typically part of a pod manifest. It usually specifies the capacity for the storage, access mode, and the type of volume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
spec:
  storageClassName: "localdisk"
  persistentVolumeReclaimPolicy: Delete
  capacity:
	  storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /var/output
Understand volume mode, access modes and reclaim policies for volumes

Volume modes:

Only 2 exists:

block: Mounted to a pod as a raw block device without a filesystem. The Pod / application needs to understand how to deal with raw block devices. Presenting it in this way can yield better performance, at the expense of complexity.
filesystem:  Mounted inside a pods' filesystem inside a directory. If the volume is backed by a block device with no filesystem, Kubernetes will create one. Compared to block devices, this method offers the highest compatibility, at the expense of performance.

AccessModes:

Three options exist:

ReadWriteOnce  – The volume can be mounted as read-write by a single node
ReadOnlyMany  – The volume can be mounted read-only by many nodes
ReadWriteMany  – The volume can be mounted as read-write by many nodes

Understand persistent volume claims primitive

A  PersistentVolume  can be thought of as storage provisioned by an administrator. Think of this as pre-allocation
A  PersistentVolumeClaim  can be thought of as storage requested by a user/workload.
To use pv (which are abstract), we leverage persistedVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  storageClassName: localdisk
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi
When pvc is created, it will look for an available pv which will satisfy it's criteria, if found it will be bounded to it.
Once a PV is bound to a PVC, that PV is essentially tied to the PVC’s project and cannot be bound to by another PVC. There is a one-to-one mapping of PVs and PVCs. However, multiple pods in the same project can use the same PVC
Know how to configure applications with persistent storage

apiVersion: v1
kind: PersistentVolume
metadata:
  name: task-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: task-pv-claim
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: task-pv-pod
spec:
  volumes:
    - name: task-pv-storage
      persistentVolumeClaim:
        claimName: task-pv-claim
  containers:
    - name: task-pv-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: task-pv-storage    
If leveraging a storageclass, a  persistentvolume  object is not required, we just need a  persistentvolumeclaim
Exercise:
In this exercise, we will  not  be using  storageClass  objects

Create a  persistentVolume  object of type  hostPath  with the following parameters:

1GB Capacity
Path on the host is /tmp
storageClassName  is Manual
accessModes  is  ReadWriteOnce


Create a  persistentVolumeClaim  to the aforementioned  persistentVolume
Create a  pod  workload to leverage this `persistentVolumeClaim

apiVersion: v1
kind: PersistentVolume
metadata:
  name: task-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/tmp"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: task-pv-claim
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: task-pv-pod
spec:
  volumes:
    - name: task-pv-storage
      persistentVolumeClaim:
        claimName: task-pv-claim
  containers:
    - name: task-pv-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/output"
          name: task-pv-storage
TroubleShooting 30%

Evaluate cluster and node logging


Node Status
Check the status of nodes. All of them must be in READY state.

kubectl get nodes
kubectl describe nodeName

Services
Verify that services (kubelet, docker) are up and running

systemctl status kubelet
systemctl start kubelet
systemctl enable kubelet
ALWAYS check that services are up, running, and ENABLED

System pods
If the cluster has been build with kubeadm, there are several pods which must be running in kube-system namespace.

kubectl get pods -n kube-system
kubectl describe pod podName -n kube-system

Logs

you can check logs for k8s components with journalctl
sudo journalcl -u kubelet/docker
(shift+G to jump to the end of the file)
You can also check for /var/log/kube-*.log, but with kubeadm cluster they wont be stored on the filesystem but on the system pods (available with kubectl logs xxx)
Container logs: kubectl logs [-f] podName [-c containerName
Etcd

Through componentstatuses (deprecated but still working)
kubectl get componentstatuses
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health":"true"}
or
./etcdctl cluster-health

member 17f206fd866fdab2 is healthy: got healthy result from https://master-0.etcd.cfcr.internal:2379

Wrong static manifests

/etc/kubernetes/manifest/
Examples of misconfigurations

kube-apiserver isn’t pointing to the correct mountPath or hostPath for certificates
Pods are getting scheduled because kube-scheduler.yaml is pointing to the wrong image
etcd isn’t working because there’s inconsistent TCP/IP ports
Container command sections have typos or not pointing to the correct TCP/IP port or configuration/certificate file path

Cluster logging

At a cluster level,  kubectl get events  provides a good overview.
Understand how to monitor applications

Kubernetes handles and redirects any output generated from a containers stdout and stderr streams. These get directed through a logging driver which influences where to store these logs. Different implementations of Docker differ in exact implementation (such as RHEL's flavor of Docker) but commonly, these drivers will write to a file in json format:
Troubleshoot application failure

This is a somewhat ambitious topic to cover as how we approach troubleshooting application failures varies by the architecture of that application, which resources/API objects we're leveraging, if the application contains logs. However, good starting points would include running things like:

kubectl describe <object>
kubectl logs <podname>
kubectl get events

Troubleshoot networking

DNS Resolution

Pods  and  Services  will automatically have a DNS record registered against  coredns  in the cluster, aka "A" records for IPv4 and "AAAA" for IPv6. The format of which is:
pod-ip-address.my-namespace.pod.cluster-domain.example  my-svc-name.my-namespace.svc.cluster-domain.example
To test resolution, we can run a pod with  nslookup  to test
Cni Issues

Mainly covered earlier in acquiring logs for the CNI. However, one issue that might occur is when a CNI is incorrectly, or not initialised. This may cause workloads to enter a  pending  status:
kubectl get po -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
nginx   0/1     Pending   0          57s   <none>   <none>   <none>           <none>

kubectl describe <pod>  can help identify issues with assigning IP addresses to nodes from the CNI
On connection

kubectl completion bash /etc/bash_completion.d/kubectl
alias k=kubectl
export do=--dry-run -o yaml

Lacking areas


List item
Etcd backup/restore

Installing cluster

https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/
kubeadm simplifies installation of kubernetes cluster
sudo tee -a /etc/modules-load.d/containerd.conf <<EOF
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
apt-get update && apt-get install -y containerd
mkdir -p /etc/containerd/
containerd config default | sudo tee /etc/containerd/config.toml
swapoff -a
sudo sed -i '/ swap / s/^(.*)$/#\1/g' /etc/fstab
apt install -y apt-transport-https curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet=1.20.1-00 kubeadm=1.20.1-00 kubectl=1.20.1-00
sudo apt-mark hold kubelet kubeadm kubectl
Master
kubeadm init --pod-network-cidr=192.168.0.0/16
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
kubeadm token create --print-join-command
Core concepts

Cluster Overview

Master nodes

Manages, plan, schedule, monitor of nodes

Etcd cluster
kube-apiserver
kube controller manager
kube-scheduler

Worker Nodes


kubelet
kube-proxy
Container runtime engine

Etcd

ETCD is a distributed reliable key-value store that is simple, secure & Fast. It stores information regarding the cluster such as Nodes, PODS, Configs, Secrets, Accounts, Roles, Bindings and Others.
Stacked etcd = etcd running on the same node as control plane
If you are using kubeadm, you will find etcd pods in kube-system namespace
You can run etcdctl commands directly inside etcd pod
kubectl exec etcd-master -n kube-system etcdctl get / --prefix -key
Kube api server

Kube-apiserver is responsible for authenticating, validating requests, retrieving and Updating data in ETCD key-value store. In fact kube-apiserver is the only component that interacts directly to the etcd datastore. The other components such as kube-scheduler, kube-controller-manager and kubelet uses the API-Server to update in the cluster in their respective areas
if using a cluster deployed by the kubeadm, the configuration for kube-api server is located in a static manifest /etc/kubernetes/manifests/kube-apiserver.yaml
Kube controller manager

Manages several controllers in cluster.
In case of kubeadm cluster, config is located in /etc/kubernetes/manifests/kube-controller-manager.yaml
Node controller


Replication controller

Monitors status of replicaset ensuring the desired number of pods

Other controllers


Kube scheduler

The kube-scheduler is only responsible for deciding which pod goes on which node. It doesn't actually place the pod on the nodes, that's the job of the kubelet
Kubelet

The kubelet works in terms of a PodSpec. A PodSpec is a YAML or JSON object that describes a pod. The kubelet takes a set of PodSpecs that are provided through various mechanisms (primarily through the apiserver) and ensures that the containers described in those PodSpecs are running and healthy. The kubelet doesn't manage containers which were not created by Kubernetes.
Kubeadm doesn't not deploy kubelet by default, we must manually download and install it.
Kube-proxy

The Kubernetes network proxy runs on each node. This reflects services as defined in the Kubernetes API on each node and can do simple TCP, UDP, and SCTP stream forwarding or round robin TCP, UDP, and SCTP forwarding across a set of backends

Upgrade cluster
apt-get install -y --allow-change-held-packages kubelet=1.20.2-00 kubectl=1.20.2-00
kubeadm upgrade plan v1.20.2
kubeadm upgrade node

Backing up etcd with etcdctl
etcd snapshot restore creates a new logical cluster
verify the connectivity
ETCDCTL_API=3 etcdctl get cluster.name --endpoints=https://10.0.1.101:2379 --cacert=/home/cloud_user/etcd-certs/etcd-ca.pem --cert=/home/cloud_user/etcd-certs/etcd-server.crt --key=/home/cloud_user/etcd-certs/etcd-server.key
ETCDCTL_API=3 etcdctl snapshot restore=backup.db --initial-cluster="etcd-restore=https://10.0.1.101:2380" --initial-advertise-peer-urls https://10.0.1.101:2380 --name etcd-restore --data-dir /var/lib/etcd
chown -R etcd:etcd /var/lib/etcd/

quick creation of yaml
kubectl create deployment my-dep --image=nginx --dry-run -o yaml
--record flag stores the kubectl command used as an annotation on the object
--- RBAC
Role / ClusterRole = objects defining set of permissions
RoleBinding / ClusterRoleBindings

service account = account used by container processes within pods to authenticate with k8s api
we can bind service accounts with cluster roles/cluster role bindings
--
kubernetes  metric server is an optional addon
kubectl top pod --sort-by xxx --selector
kubectl top pod --sort-by cpu
kubectl top node
raw access
kubectl get --raw /apis/metrics.k8s.io

configmap and secrets can be passed to containers as a env var or configuration volume, in that case each top-level key will appear as a file containing all keys below that top-level key
apiVersion: v1
kind: Pod
metadata:
name: env-pod
spec:
containers:

name: busybox
image: busybox
command: ['sh', '-c', 'echo "configmap: $CONFIGMAPVAR secret: $SECRETVAR"']
env:

name: CONFIGMAPVAR
valueFrom:
configMapKeyRef:
name: my-configmap
key: key1


name: SECRETVAR
valueFrom:
secretKeyRef:
name: my-secret
key: secretkey1


Resource requests allow you to define an amount of resources (cpu/memory) you expect a container to use. The scheduler will use that information to avoid scheduling on nodes which do not have enough available resources. ONLY affects the scheduling
cpu is expressed in 1/1000 of cpu. 250m = 1/4 cpu
containers:

name: nginx
resources:
requests:
xxx
limits:
cpu: 250m
memory: "128mi"


liveness / readiness probe
startupProbe
--
nodeselector - my label
spec:
nodeselector:
keylabel: "value"
spec:
nodeName: "nodename"

static pod = automatically created from yaml manifest files localted in the manifest path of the node.
mirror pod = kubeclet will create a mirror pod for each static pod to allow to see the status of the static pod via the api, but you cannot manage them through the api, it has to be managed directly through the kubelet
kubeadm default location /etc/kubernetes/manifests/

Deployment scalement

change replica attribute in the yaml
kubectl scale deployment.v1.apps/my-deployment --replicas=5

to check the status of a deployment
kubectl rollout status deployment.v1.apps/my-deployment

you can change the image
kubectl set image deployment/my-deployment container=image:tag --record

kubectl rollout history deployment/my-deployment

network policy = an object that allows you to control the flow of network communication to and from pods. It can be applied to ingress and/or egress
By default, pods are wide open. But if there is any policy attached, they are isolated and only whitelisted traffic is allowed
Available selectors:

podSelector
namespaceSelector
ipBlock

port

kubectl label namespace np-test team=tmp-test
Pod domain names are of the form pod-ip-address.namespace-name.pod.cluster.local.

Service is an abstraction layer permitting to our clients to interact with the service without need of knowing anything about underlying pods
Service routes traffic in load-balance manner
Endpoints are the backend entities, to which services route traffic. There is 1 endpoint for each pod
Service Types:
ClusterIP - Exposes applications inside the cluster networks
NodePort - Esposes application outside the cluster network
LoadBalancer - Exposes application to outside through the usage of cloud load balancer
ExternalName (*not in cka)
Service FQDN Dns
service-name.namespace.svc.cluster-domain.example
This fqdn can be used from any namespace, in the same namespace you can simply use the short svc name
Storage

Simple volumes

Volume types

hostPath
emptyDir

it can be mounted in a pod like a normal volume
apiVersion: v1
kind: Pod
metadata:
  name: pv-pod
spec:
  containers:
    - name: busybox
      image: busybox
      command: ["sh","-c", "while true; do echo Success! >> /output/success.txt;sleep 5; done"]
      volumeMounts:
      - mountPath: "/output"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: my-pvc
pvc can be extended, as long as storageClassName property allowVolumeExpansion allows it
TroubleShooting

Basics


Connection refused (kubectl)
If you cannot connect to the kube api server, it might be down. Verify that kubelet and docker services are up and running on control plane nodes.


Node Status
Check the status of nodes. All of them must be in READY state.


kubectl get nodes
kubectl describe nodeName

Services
Verify that services (kubelet, docker) are up and running

systemctl status kubelet
systemctl start kubelet
systemctl enable kubelet
ALWAYS check that services are up, running, and ENABLED

System pods
If the cluster has been build with kubeadm, there are several pods which must be running in kube-system namespace.

kubectl get pods -n kube-system
kubectl describe pod podName -n kube-system

Logs

you can check logs for k8s components with journalctl
sudo journalcl -u kubelet/docker
(shift+G to jump to the end of the file)
You can also check for /var/log/kube-*.log, but with kubeadm cluster they wont be stored on the filesystem but on the system pods (available with kubectl logs xxx)
Container logs: kubectl logs [-f] podName [-c containerName
Applications

You can run any command inside the pod by using kubectl exec podName [-c containerName] -- command
You can open a new session with
kubectl exec -it podName [-c containerName] -- bash
Force Killing pods
kubectl delete pod podName --force --grace-period=0
Networking

Check kube-proxy and dns pods in kube-system namespace
Useful container  image for debugging: nicolaka/netshoot
Extra

Cpu?
#########################
kubectl run nginx --image=nginx --restart=Never
kubectl delete po nginx --grace-period=0 --force
k get po redis -w
kubectl get po nginx -o jsonpath='{.spec.containers[].image}{"\n"}'
kubectl run busybox --image=busybox --restart=Never -- ls
kubectl logs busybox -p # previous logs
 k run --image busybox busybox --restart=Never -- sleep 3600    
 kubectl get pods --sort-by=.metadata.name
kubectl exec busybox -c busybox3 -- ls
kubectl get pods --show-labels
kubectl get pods -l env=dev
kubectl get pods -l 'env in (dev,prod)'
k create deploy deploy1 --image=nginx -oyaml --dry-run=client
k run tmp --rm --image=busybox -it -- wget -O- google.com
#####


Creating objects

Pod

kubectl run --image nginx --restart=Never mypod
Deployment

kubectl create deployment my-dep --image=nginx --replicas=2 --port=80
Create service

kubectl expose pod mypod--port 80 --target-port 80    
kubectl expose deployment my-dep