hansrajdas/kubernetes.md

## kubernetes.md

      
    Raw
  

              kubernetes.md
            
          
    Overview


Automatically deploying and managing container is called container orchestration
K8s is an container orchestration tool/technology
Other alternatives of K8s are docker swarm and mesos

Cluster architecture


K8s cluster is a set of machines (or nodes) running in sync
One of the node is master node, responsible for actual orchestration
kube-scheduler schedules pods on nodes based on node capacity, load on node and other policies. This runs in kube-system namespace
kubelet runs on worker node which listens for instructions from kube-apiserver and manages containers
kube-proxy enables communication within services within the cluster
kubectl tool is used to deploy an manage applications on k8s clusters
As k8s is container orchestration tool so we also need one container runtime engine like docker

ETCD


ETCD is a distributed reliable key-value store that is simple, secure and fast
K8s uses etcd cluster in master node to store information like which node is master, nodes, pods, configs, secrets, accounts, roles, bindings and other information
etdctl is a command line too which comes with ETCD server

./etcdctl set key1 value1
./etcdctl get key1

All the kubectl get command output comes from etcd server

Kube-API server


kube-apiserver is used for all management related communication in a cluster. It runs on master node
When we run a from kubectl, it reaches kube-api server which authenticates and validates the request and then interacts with etcd server and returns back the response
We don't necessarily need to use kubectl, we can directly make requests like POST request using curl to create a pod

Kube controller manager


Controller consistently monitors the state of the system and takes necessary action to bring back the system to normal state in case of problems
Kube controller interacts with kube-apiserver to get cluster info. This also runs on master node and this is the brain of the cluster
There are many controller in k8s, listed below are 2 of those

Node controller: Monitors which pod is down/unhealthy then takes necessary action to launch a new one if down
Replication controller: Ensures desired number of pods are running at all times within a set


Kube scheduler


Decides which pod goes to which node so that right container ends up on right node. It decides on basis of node CPU/mem available and required by container and finds best fit
This also runs on master node

Kubelet


This runs on worker nodes and responsible to getting gathering commands from kube apiserver and sending back the all reports for that worker node
kubelet needs to installed manually on worker nodes, this not installed automatically with kubeadm like other components

kube proxy


This runs as daemonset on each node
Manages networking in k8s cluster so that each pod in a cluster is able to communicate every other pod

Pods


POD is a single instance of an application
We add single container in a pod - this is recommendation but pod can have multiple containers in some cases like a container may have some helper containers which may go in same pod. Containers running in same pod can communicate using localhost itself

Setup


Install kubectl utility first to interact with k8s cluster

Minikube


minikube is the easiest way to install k8s cluster, it installs all components (etcd, container runtime, ...) on a single machine/node

minikube start                          # Start minikube
minikube stop                           # Stop minikube
minikube service appname-service --url  # Get external URL of appname
Kubeadm


kubeadm is more advanced tool to create multi-node k8s cluster
We can use tool like vgrant to create VMs on machine to have multiple nodes in k8s - master and worker(s)

YAML file


K8s works on yaml files, it expects 4 top level attributes

apiVersion
Kind
metadata
spec


Below is sample yaml file to deploy a pod with nginx container

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
    tier: frontend
spec:
  containers:
  - name: nginx
    image: nginx
Replication controllers


Controllers are the brain behind k8s cluster, they are the process which monitors k8s objects and takes desired action
Replication controller ensures, specified number of pods are running at all times. Also helps for load balancing across pods - scaling
kind = ReplicationController, pod and replicas info is present in spec section of yaml file

apiVersion: v1
kind: ReplicationController
metadata:
  name: ...
  labels:
    ...
spec:
  template:
      <pod-definition>
  replicas: ...
Replica sets


This serves the same purpose as replication controller but it's an older technology. Replica sets is the recommended way
apiVersion = apps/v1 and kind = ReplicaSet and spec section remains same as above and one more params called selector used to select pods for replication. Selector is used to match which pods to monitor and it can be possible that pods with given labels already exists (or some exists) then replica set won't create those pods but just monitor those to have desired number of pods

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: ...
  labels:
    ...
spec:
  template:
      <pod-definition>
  replicas: ...
  selectors:
      matchLabels:
        ...
Deployments


Provides capability for rolling updates, rollback, pausing and resume changes
Deployments come higher in the heirarchy than replica sets (pod > replica sets > deployments)
yaml file is almost same as of replicasets but kind = Deployment for deployment object
On deploying, it creates a new replica set which in turn creates pods

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ...
  labels:
    ...
spec:
  template:
      <pod-definition>
  replicas: ...
  selectors:
      matchLabels:
        ...
Namespaces(ns)


Default ns is automatically created when a cluster is setup
kube-system (for networking like DNS and security) and kube-public (for keeping public resources) are other ns created at cluster startup
Each ns has

Isolation: Each ns is isolated from other, we can have a cluster with 2 ns dev and prod. These 2 will be isolated from each other. We can access resources/services deployed in other ns using ns with service name like web-app.dev.svc...
Policies: Each ns has different policies
Resource limits: We can define different quota of resources in each ns


Services


It is a k8s virtual object which enables communication between various internal and external components like access from browser, b/w frontend and backend services
Enables loose coupling in our microservices in application

We have 3 types of services in k8s
NodePort


This service is used to make internal service(like webserver) accessible to the users(outside world) on a port. It exposes application on a port on all hosts
Has range of ports from 30000 to 30767

apiVersion: v1
kind: Service
metadata:
    ...
spec:
  type: NodePort
  ports:
    - targetPort: 80
      port: 80
      nodePort: 30008
  selectors:
    ...

This file has 3 ports - ports are wrt to service

targetPort: Pod port, this is actual port to access inside port
port: Service port
nodePort: Port exposed to external world


For multiple pods are running with given selector, it acts as load balancer and distributes traffic to various pods randomly
For multiple nodes in cluster, we can access application using any of the node port IP and port, nodePort service created spans across nodes in the cluster

ClusterIP


Creates an IP (and name) to communicate b/w services like from set of frontend and backend services. This is for internal access only(within cluster, not bound to specific node), different microservices communicate using ClusterIP service
This is the default service type
K8s creates one ClusterIP service by default named kubernetes

apiVersion: v1
kind: Service
metadata:
    name: backend
...
spec:
  type: ClusterIP
  ports:
    - targetPort: 80
      port: 80
  selectors:
    ...

Imperative way to create service

# Expose pod `messaging` on running on 6379 to 6379
# We can use deployment, rc or rs instead of pod
# We can also specify `targetPort` if want some other external port
k expose pod messaging --name messaging-service --port=6379
LoadBalancer


Used to create single endpoint like http://some-domain.com to access the application. The application may have multiple nodes running, the will help us create a common name to access it. Without this we will have to access the apps using specific nodeIP:port which is very hard to remember and will change when node restarts (may get new IP)

apiVersion: v1
kind: Service
...
spec:
  type: LoadBalancer
  ports:
    - targetPort: 80
      port: 80
      nodePort: 30008
  selectors:
    ...

Using LoadBalancer in cloud providers like AWS, GCP, k8s sends the request to cloud provider to provision a load balancer which can be used to access the application

Imperative vs declarative approach

Imperative


Providing instructions writing in english to do something
In k8s, anything done using kubectl command except apply is imperative approach like kubectl run, edit, expose, create, scale, replace, delete, ...
This is faster, we just have to run the right command - yaml file not always required. Use this in certification exam to save time

Declarative


Using tools like terraform, chef, puppet, ansible. This does lot of error handling and maintains state of steps done so far


In k8s, done using kubectl apply command, checks for what is the state of the system and performs relevant action only


It is recommended not to mix imperative and declarative approaches


Networking


Each Pod is assigned with an IP
K8s does not handle networking to communicate b/w pods so in multi node cluster, we has to setup the networking on our own using other networking softwares like vmware nsx, etc.

Scheduling


Scheduler assigns node for a pod, when we deploy a pod, property called nodeName (in spec section) is assigned to pod which has node name where this pod has to run
If pod doesn't get a nodeName assigned to it, pod remains in Pending state
We can also assign nodeName manually - by using this nodeName property set with our deployment yaml file
Note that we can't change the nodeName of a running instance of pod, to mimic this behaviour we use Binding object and send a POST request for this pod

Taint and tolerations


We can taint certain nodes so that only specific pods can be scheduled on those nodes. This is useful when we want to use some nodes for specific use case
For those specific pods which should be scheduled on tainted nodes we add tolerations for those pods which makes pods tolerant to the taint and gets scheduled on tainted nodes
Below command can be used to taint a node

k taint nodes node-name key=value:taint-effect  # Sample

k taint nodes node1 app=blue:NoSchedule  # Example


<taint-effect> specifies what happens to pod which do not tolerate this taint, it can have 3 values

NoSchedule Don't schedule those pods on this node
PreferNoSchedule System will try to avoid scheduling on this node but that's not guaranteed
NoExecute Don't schedule pods and existing pods which don't tolerate the taint will be evicted. This is possible if some pods got scheduled on nodes before they were tainted


We can add tolerations to pods in yaml definition file in spec section


...
spec:
  ...
  tolerations:
  - key: app
    operator: Equal
    value: blue
    effect: NoSchedule

When we create a cluster, taint is applied on master node so that no pod(workload) is scheduled on master nodes. Can be checked using below command

k describe node <master-node-name> | grep Taint

Untaint node

kubectl taint nodes <nodeName> node-role.kubernetes.io/master:NoSchedule-

Tainting nodes only restricts nodes from allowing certain pods to be scheduled on those nodes but it doesn't guarantee that a specific pod gets scheduled on specific node. A tolerant pod can be scheduled on any node in the system. If we have requirement to schedule some pods on specific nodes, this can be achieved using node affinity

Node selector


To schedule a pod on specific node we can use nodeSelector in spec section of pod definition yaml file

...
spec:
  ...
  nodeSelector:
    size: Large

Size given in above command is the label that we has to add on nodes using below command

k label nodes node-1 size=Large

Node selector have limitation like it doesn't support complex selection filters like schedule pod on medium or large nodes or don't schedule on small nodes, for these use cases we can use Node Affinity

Node affinity


We can add affinity in spec section of pod yaml definition file, to select specific nodes for scheduling pods

...
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: size
            operator: In
            values:
            - Large
            - Medium

Other operators we can use are Exists (doesn't need a value), NotIn, etc
Other node affinity preferredDuringSchedulingIgnoredDuringExecution and  preferredDuringSchedulingRequiredDuringExecution

Scheduling: Starting pod - assigning a node to pod
Execution: Pod is already scheduled and in running state. Considered when pod is already running on a node and someone changes the node labels


Resource requirements and limits


When scheduler tries to schedule a pod, k8s checks for pod's resource requirements and places on node which has sufficient resources
By default container requests for 0.5 CPU and 256 Mi of RAM for getting scheduled, this can be modified by adding resources section under spec of pod yaml definition

...
spec
  ...
  resources:
    requests:
      memory: "1Gi"
      cpu: 1

1 CPU = 1000m = 1 vCPU = 1 AWS vCPU = 1 GCP core = 1 Azure core = 1 Hyperthread. m is millicore
It can as low as 0.1 which is 100m
For memory

1 K (kilobyte) = 1,000 bytes
1 M = 1,000,000 bytes
1 G = 1,000,000,000 bytes
1 Ki (kibibyte) = 1,024 bytes
1 Mi = 1,048,576 bytes
...


While container is running it's resource requirements can go high so by default k8s sets a limit of 1 vCPU and 512 Mi to containers, this can also be changed by adding limits section under resources section

...
spec:
  resources:
    ...
    limits:
      memory: "2Gi"
      cpu: 2

If container tries to use more CPU then limits, then it is throttled and in case if memory exceeds container is terminated

Daemon sets


Daemon sets ensures that one copy of a Pod is always running in all nodes in the cluster, when a new node is added to cluster daemon set pod starts running on that node
Some application of using daemon set

Monitoring solution
Logs viewer


kube-proxy runs as daemon set
YAML definition of daemon set is similar to replica sets, change is in the kind only, other params are same

apiVersion: apps/v1
kind: DaemonSet
...
spec:
  ...
  template:
    ...

K8s (v1.12 onwards) uses nodeAffinity and default scheduler to deploy daemon sets in each node

Static pods


Suppose we don't have master node in cluster (which kube-api server, etcd server and other things), we only have worker nodes which has kubelet now we can't create resources as we don't have kube-api server which can give instructions to kubelet to create anything
In this scenario we place pod definition yaml files at pod-manifests-path which is by default /etc/kubernetes/manifets, and kubelet checks this path and creates any pod if it finds at this path, if later pod definition file is deleted pod also gets deleted. This way of creating pods is called static pods
kubelet only understands pod so we are only able to create only pods and not deployments or replica sets
pod-manifests-path or staticPodPath can be updated while running kubelet service. To know current path check -config option used in kubelet binary running (ps -eaf | grep kubelet), -config contains kubeconfig file having other details
Now static pod is created but we can't use kubectl get pods to check pods because it kubectl interacts with kube-api server which is not running so in this case we can use docker commands docker ps
This is used to deploy control plane components while creating k8s server - etcd, api-server, controller-manager, scheduler (these pods have nodeName - master or controlplane appended to there name)

Multiple schedulers


Apart from default k8s cluster scheduler running on master node, we can also deploy our own scheduler
Custom scheduler can be deployed just like any other pod with some name - image should be k8s.gcr.io/kube-scheduler:v1.20.0, has a command section which contains various options like leader-elect, port, ...
While deploying other service pod we can specify our custom scheduler using schedulerName option in spec section

Monitoring and logging


K8s has monitoring server called Metrics server which keeps all cluster metrics, this is an in memory solution so we won't get historical data
Kubelet running on each node has another component called cAdvisor or container advisor which retrieves performance metrics from pods and exposes them to metrics server through kubelet APIs
We can install metric server using

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

To view metrics, we can use below top commands

k top node  # See CPU and memory usage of all nodes
k top pod   # See CPU and memory usage of pods in current namespace

To view pod logs

k logs <podName>     # Get all logs from begining to now
k logs -f <podName>  # Stream logs

# If a pod has multiple containers in it, specfify container name also
k logs -f <podName> <containerName>

# Check logs from all containers in a pod
k logs -f <podName> --all-containers

# Previous pod logs
k logs -f <podName> --previous

# If core components are down then kubectl commands won't work, we can get use journalctl to see logs of components like kubelet
journalctl -u kubelet  # Get kubelet logs

# We also use docker commands to get logs if kubectl is not working
docker logs <docker-id>
Application lifecycle management

Rolling updates and rollback


When a deployment is created, a rollout is triggered and creates a revision of deployment. On every subsequent deployment this revision if updated, this helps us to rollback to a previous version if necessary
To see rollouts, use below commands

k rollout status deployment/myapp-deployment   # Get status of rollout
k rollout history deployment/myapp-deployment  # Get rollout history

K8s support 2 deployment strategies

Recreate: Delete all exiting deployments and then create new ones. This will cause some downtime and all new may have some issues. This is NOT the default deployment strategy
Rolling update: This deletes one object (or pod) at a time and deploy newer version one by one. This way application never goes down and upgrade is seamless. This is the default deployment strategy in k8s


These 2 strategies can also be seen by describing a deployment, we will get the strategy name and how pods got updated - rolling or recreate
Under the hood, deployment creates a replica set and creates required number of pods, when deployment is updated - new replica set is created and all new pods are started in it and stopping pods from existing replica sets. To see use k get replicasets
If we a notice problem after upgrading our application/deployment, we can undo the deployment and rollback to previous revision, it will destroy the pods in new replica set and bring older ones up in old replica set

k rollout undo deployment/myapp-deployment  # Rollback last deployment
Commands and arguments


In Dockerfile we have 2 fields

ENTRYPOINT: Specifies which command to run
CMD: Takes arguments which goes with above command given in entrypoint


In pod definition file, we can overwrite both the above options using command and args option respectively

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-sleeper
spec:
  containers:
  - name: ubuntu-sleeper
    image: ubuntu-sleeper
    command: ["my-sleep"]  # Run this command (overwritten `ENTRYPOINT`)
    args: ["10"]  # Pass argument 10 with above command (overwritten `CMD`)
Configure environment variables

We can specify environment variables for a pod in it's definition file using env or envFrom parameter. There are 3 ways get value for env vars
Directly env name and value

env:
  - name: APP_COLOR
    value: pink
ConfigMap

ConfigMaps are used to save all configurations required by application at central place, this can be referred in pod definition file and all name/value will be available

ConfigMaps can be created using imperative way or declarative way

# Imperative approach
k create configmap <cm-name> --from-literal=<key1>=<value1> --from-literal=<key2>=<value2>
k create configmap <cm-name> --from-file=<path-to-file>  # Can use file with all key/val also

k get configmaps
k describe configmaps

Declarative approach

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  APP_COLOR: blue
  APP_MODE: prod

To use above configMap in pod definition file use envFrom

envFrom:
  - configMapRef:
      name: app-config  # ConfigMap name

Above config injects all env vars from configMap to pod, we can also take selective

env:
  - name: APP_COLOR
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: APP_COLOR

ConfigMaps can also be mounted to volumes

volumes:
  - name: app-config-volume
    configMap:
      name: app-config
Secrets

Can be used to store any sensitive information. Same as configMaps but data is stored in encoded format. Secrets data is kept in encoded format base64

Imperative way to create secret

k create secret generic <secret-name> --from-literal=<key1>=<value1> --from-literal=<key2>=<value2>
k create secret generic <secret-name> --from-file=<path-to-file>  # Can use file with all key/val also

k get secrets
k describe secrets

Declarative way: To use declarative way values should be base64 encoded, if we don't want to encode to base64 we can use stringData field instead of data

apiVersion: v1
kind: Secret
metadata:
  name: app-secret
data:
  DB_Host: bXlzcWwK
  DB_User: cm9vdAo=
  DB_Passowrd: YWJjMTIzCg==

To use above secret in pod definition file use envFrom

envFrom:
  - secretRef:
      name: app-secret  # secret name

Above config injects all env vars from secret to pod, we can also take selective

env:
  - name: APP_COLOR
    valueFrom:
      secretKeyRef:
        name: app-secret
        key: DB_Passowrd

Secrets can also be mounted to volumes. This mounting creates files in container for each parameter one file is created.

volumes:
  - name: app-secret-volume
    secret:
      name: app-secret
# 3 files are created corresponding to 3 vars in secret
ls /opt/app-secret-volumes
DB_Host  DB_Passowrd  DB_User
Multi container pods


There can be cases when we need 2 services to work together - scale up/down, share same network (can be accessed using localhost), share same volume. Example would be web server and a logging service
Use 2 containers defined in containers section of spec

...
spec:
  containers:
  - name: sample-app
    image: sample-app:1.1
  - name: logger
    image: log-agent:1.5

3 multi container pods design patterns [discussed in CKAD course]

sidecar: For example using logging service with app container
adapter
ambassador


Init containers


Init containers are used for doing some task before actual container starts like some other task is done or checkout some source code from repository. This executes only once at the beginning
Similar to containers but defined under initContainers section in spec section - it is a list so can have multiple init containers and it executes in sequence as defined
If init container fails whole pod is restarted

spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
  - name: init-mydb
    image: busybox:1.28
    command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
Cluster maintenance

OS upgrades


If a node is down, then k8s waits for node eviction timeout (default=5 mins) before scheduling pods on that node to other nodes. In cases when node is down and comes up immediately, pods are scheduled on the same node
For maintenance purpose we can drain all pods on a node to get scheduled on other nodes. Pods which are not managed by deployment or replicaSets are lost and not scheduled on other nodes (this is warned and can deleted using --force option)

k drain <node-name>
k drain <node-name> --ignore-daemonsets

When node back again, we can uncordan to make this node available for scheduling new pods

k uncordon <node-name>

There is another command cordon which makes a node un-schedulable for new pods however existing pods remain running on that node

k cordon <node-name>
Kubernetes versions

# This gives client and server versions
# client = kubectl version
# server = kubernetes version
k version
k version --short

k get nodes  # Also gives kubelet version running

Version = x.y.x
x = Major version
y = Minor version
z = Patch version

Cluster upgrade process


kube-apiserver is the main component in k8s cluster, if this is at version X (minor version) then

controller-manager and kube-scheduler can at max one version lower than X
And kubelet and kube-proxy can be at max 2 versions lower than X
None of them can have higher version than X
However kubectl can have version between X - 1 to X + 1


At a given point in time, k8s community supports latest 3 versions (minor)
2 steps in upgrading cluster

First upgrade master nodes: While master node upgrade is in process, workloads on worker nodes will continue to work but management functions won't like we can't create or delete a pod or if pod crashes it won't be rescheduled
Then worker nodes: We have 3 strategies for this

Upgrade all worker nodes at the same time - requires downtime
Upgrade one node at a time - kind of rolling upgrade
Add new nodes with upgraded version then remove existing nodes


Recommended approach is to upgrade one minor version at a time - not to skip the versions

kubeadm upgrade plan

# This command does not upgrade kubelet, we has to upgrade kubelet by going(ssh) on each nodes and upgrading
kubeadm upgrade apply <version>

Follow this for complete steps: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
Note: While upgrading, all kubectl commands should be run on control plane nodes and NOT in worker nodes even when we are upgrading worker nodes

Backups and restore methods

We need to backup below componets in a cluster

Resource configs: Take backup of all resources deployed (either using imperative or declarative way).

We can use below command to get all resources deployed

k get all --all-namespaces -o yaml > all-resources.yaml

There are other solutions already build for this to take backup of all resources like velero by heptIO


ETCD cluster: Stores state of the cluster, this also runs as a static pod on master nodes

Taking backup of ETCD also gives all resource information
We can take backup of ETCD using etcdctl command

# trusted-ca-file, cert-file and key-file can be obtained from the description of the etcd Pod
ETCDCTL_API=3 etcdctl --endpoints=https://<IP>:2379 \
  --cacert=<trusted-ca-file> --cert=<cert-file> --key=<key-file> \
  snapshot save <backup-file-location>

# To restore this snapshot
# 1 - we can first stop kube api server
service kube-apiserver stop

# 2 -  then restore from backup
ETCDCTL_API=3 etcdctl restore <snapshot-name>.db --data-dir=/var/lib/etcd-from-backup

# 3 - update etcd with new path, etcd is static pod so update manifests file (by default here - /etc/kubernetes/manifests/etcd.yaml)

# 4 - reload service daemon and restart etcd service
systemctl daemon-reload
service etcd restart

# 5 - start kube apiserver
service kube-apiserver start

Volumes

Security

All communication between various k8s components are TLS based
Authentication


kube-apiserver serves all requests to the cluster so this is responsible to authenticating the requests. User can send request using kubectl command or curl
Authentication can be done below methods, it is configured while starting kube-apiserver
Basic authentication

Static password file: Using username and password from csv file. While requesting using curl we can use -u option to specify username/password
Static token file: Instead of using password, keeping token in file. This token can be sent in header of HTTP request
Note: Both above method are not recommended as they are not secure, so we use certificate based authentication


TLS


Symmetric encryption: Same key is used for encryption and decryption. Problem is sharing that key b/w client and server securely
Asymmetric encryption: Uses 2 keys

Public key: For encryption
Private key: For decryption


ssh also uses asymmetric encryption - ssh-keygen generates public and private keys. Private key is used to login to the server and public key is used to lock the access to severs

HTTPS flow


Key exchange - PKI (Public key infrastructure)

Server shares public key (certificate) to client
Client generates encryption key and sends back to server - this key is encrypted using public key which can only be decrypted using server using private key
Now both client and server has exchanged encryption key securely which can be used to encrypt further messages


Domain authorization

With public key (from server to client), a digital certificate is also sent which is signed/approved/authorized by a certificate authority (CA) to confirm that domain is actually what is says - like xyz.com is actually xyz.com and not someone else with fraud identity. Some popular CA are symantec, digicert, globalsign, ...
Domain owner has to generate a certificate signing request (CSR) and sent to CA, then CA verifies all details and sends back signed certificate
How CAs are validated? Each CA also have a pair of public and private key (this is called root certificates) and they sign the certificates using private key and there public keys are stored in each clinet like browsers so from there client verifies certificate is signed by authorized by CA
For interval usage, we can host our own CA also and sign certificates


Naming conventions

Public key(certificate): *.crt, *.pem
Private key: *.key, *-key.pem


Note: Private key can also be used to encrypt data and can be decrypted using public key, this is never done because anyone having public key will be able to decrypt it
Everything mentioned above is to verify if we are communicating to right server or not using it's certificate, there can be cases when server also needs to verify if it is communicating to correct client and can ask for client certificate from client

TLS in k8s

Bases on interaction, we can have server and client components in k8s. Each component will have it's own certificate

Server

kube-apiserver: apiserver.crt, apiserver.key
etcd: etcdserver.crt, etcdserver.key
kubelet: kubelet.crt, kubelet.key


Client: All below clients talks to kube-apiserver

User(admin): admin.crt, admin.key
kube-scheduler: scheduler.crt, scheduler.key
kube-controller-manager: controller-manager.crt, controller-manager.key
kube-proxy: kube-proxy.crt, kube-proxy.key


We also need at least one CA to generate certificates for all above components which also has certificates - ca.crt, ca.key

TLS in k8s - certificate creation


Generate CA self signed certificates - root certificates

# 1. Generate keys
openssl genrsa -out ca.key 2048

# 2. Certificate signing request
openssl req -new -key ca.key -subj "/CN=KUBERNETES-CA" -out ca.csr

# 3. Sign certificates
openssl x509 -req -in ca.csr -signkey ca.key -out ca.crt

# Now for all other certificates, we will use this key pair to sign them

Generate certificates for other components and sign using above CA - like admin user certificate

# 1. Generate keys
openssl genrsa -out admin.key 2048

# 2. Certificate signing request
openssl req -new -key admin.key -subj "/CN=kube-admin" -out admin.csr
openssl req -new -key admin.key -subj "/CN=kube-admin/O=system:masters" -out admin.csr  # Admin user

# 3. Sign certificates - using CA key pair
openssl x509 -req -in ca.csr -CA ca.crt -CAkey ca.key -out admin.crt

Now we have admin user certificate, we can use this in 3 ways

curl command

curl https://kube-apiserver:6443/api/v1/pods \
  --key admin.key --cert admin.crt
  --cacert ca.crt

kubectl command

kubectl get pods \
  --server kube-apiserver:6443 \
  --client-key admin.key
  --client-certificate admin.crt
  --certificate-authority ca.crt

Specifying certificates in each command is not very handy so add these information to kubeconfig file, then specify this file with command

kubectl get pods --kubeconfig ~/.kube/<config-name>

# By default kubeconfig file used is ~/.kube/config

Note: Each component should have root certificate file (ca.crt) present with them

View certificate details


We should know how cluster is setup, like if cluster is setup using kubeadm then all certificates are placed at /etc/kubernetes/pki/
If we want to know the details from a components certificate, we can use below command - will print details like expiry, issuer, alternate names, ...

openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout

# Decode CSR file
openssl req -in filename.csr -noout -text
Certificates API


kubeadm tool creates a pair of CA keys (public and private keys) and places them on master node so master is becomes our CA server. All new CSR will go to master to getting signed
When a new user wants to access the cluster he can create a CSR and send to admin, admin will then creates a CSR object using yaml manifests file - kind: CertificateSigningRequest

...
kind: CertificateSigningRequest
...
spec:
  ...
  request:
    <base64 encoded CSR>

Now admin can use kubectl commands to view/approve CSRs

k get csr                     # Get list of all CSRs
k certificate approve <name>  # Approve CSR
k get csr <name> -o yaml      # Gives user certificate in base64 format

On master node all certificate related operations is taken care by controller-manager - has csr-approving and csr-signing controllers.
To sign the CSR, controller manager should have root certificates (CA key pairs) - while starting controller manager is accepts root certificates in --cluster-signing-cert-file and --cluster-sigining-key-file

Kubeconfig


kubeconfig file has 3 sections

Clusters: List of cluster (dev, prod) with CA root certificates - ca.crt
Users: List of users (admin, readonly) with certificate key pairs (crt and key)
Contexts: Combination of above 2 - List of cluster and users like which cluster to user with which user - readonly@prod, admin@dev, ... At the top level of config file, we also have a default context to use if we don't explicitly chose one


k config view                       # See current kubeconfig file
k config use-context prod@readonly  # Change current context. This command updates the `current-context` field in kubeconfig file

# Use some other kubeconfig (default is ~/.kube/config)
export KUBECONFIG=/path/my-kube-config

# Set default context of given kubeconfig to context-1
k config --kubeconfig=/path/my-custom-config use-context context-1

We can also set the namespace in context section of kubeconfig file to point a specific namespace, by default it is pointed to default ns

# Set context to dev, from next commands we don't have specify ns name in commands
k config set-context --current --namespace=dev

To debug problems with kubeconfig file, we can use cluster-info command

# Use current kubeconfig
k cluster-info

# Use custom kubeconfig
k cluster-info --kubeconfig=/path/to/kubeconfig
API groups


Objects in k8s are categorised in different API groups

/metrics: Getting metrics
/healthz: Get health information
/version: Get cluster version
/api: Interact with various core resources like pods, configMaps, namespace, etc.
/apis: Named APIs, further categorized into below API groups

/apps: /v1/deployments, /v1/replicasets, /v1/statefulsets
/extensions
/networking.k8s.io: /v1/networkpolicies
/storage.k8s.io
/authentication.k8s.io
/certificates.k8s.io: /v1/certificatesigningrequests


/logs: For fetching logs


Verbs are operation of API groups like get, list, update, ...
To list all API groups we can do a curl on cluster domain name

curl http://<api-server>:6443

# Above command will fail we haven't specified the certificates so we can use `kubectl` to start a proxy client which will take certs from `kubeconfig` and run on localhost
kubectl proxy
Starting to serve on 127.0.0.1:8001

# Now we can access cluster using curl command via this proxy - will use credentials from kubeconfig and forward request to api server
curl http://localhost:8001  # List all API groups
curl http://localhost:8001/version
curl http://localhost:8001/api/v1/pods
Authorization


Once user/machine gains access to cluster what all things it can do is defined by authorization
Authorization mechanisms

Node: Used by agents inside cluster like kubelet, these requests are authorized by Node authorizer. In certificates if name has system like system:node then these are system components and authorized using node authorizer
ABAC: Attribute based access control, for external access

This associates user(s) to a set of permissions
We can create these policy using kind: Policy
Managment is harder because we has to update policy for each user when required to update permissions


RBAC: Role based

Instead of user(s) <> permission mapping, we create a role like developer, security-team and role has set of permissions then associate user to role


Webhook: Outsource authorization to other tools like open policy agent


We can provide authorization-mode in kube-apiserver (by default it is always-allow), it can have multiple values like Node,RBAC,Webhook - For access, check is made against all values specified till access if granted to chain ends

RBAC


To create a role, we create Role object. In rule section, we can add various access permissions. This has scope of namespace

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
  namespace: testing
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "get", "create", "update", "delete"]
- apiGroups: [""]
  resources: ["ConfigMaps"]
  verbs: ["create"]

Link user(s) to role - using RoleBinding object

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: devuser-developer-binding
subjects:
- kind: User
  name: dev-user
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

We can also check, we user(self, other) has access to perform some operation

k auth can-i create deployments
k auth can-i delete nodes
k auth can-i create pods --as dev-user

# Can dev-user has permission to create pod in test namespace
k auth can-i create pods --as dev-user --namespace test

We can also give access to specific resources, using resourceName field in rules

...
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "create", "delete"]
  resourceName: ["blue", "green"]

Imperative ways

k create role pod-reader --verb=get --verb=list --verb=watch --resource=pods
k create rolebinding pod-reader-binding --clusterrole=pod-reader --user=bob --namespace=acme
Cluster role and role bindings


Resources in k8s can be namespaced(pods, rs, cm, roles) or cluster scoped(nodes, clusterroles) - can get whole list using

k api-resources --namespaced=true   # Get all namespaced resources
k api-resources --namespaced=false

clusterrole and clusterrolebindings has cluster scope (remember role had ns scope) - this role created has cluster level access

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-administrator
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list", "get", "create", "delete"]

Link user(s) to cluster role - using ClusterRoleBinding object

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-role-binding
subjects:
- kind: User
  name: cluster-user
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-administrator
  apiGroup: rbac.authorization.k8s.io

Imperative ways

k create clusterrole pod-reader --verb=get,list,watch --resource=pods
k create clusterrolebinding pod-reader-binding --clusterrole=pod-reader --user=root

Although clusterRole is cluster scoped but we can create it for namespaced resources also - will access on all namespace for that object. For example if we create clusterRole for pods, then this role will have to pods across all namespaces

Service accounts


2 types of accounts

User: used by humans like Admin, developer
Service: used by machined like build tools, prometheus

k create serviceaccounts dashboard-sa
k get serviceaccounts
k describe serviceaccounts dashboard-sa

Imperative way

# Grant read-only permission within "my-namespace" to the "my-sa" service account
k create rolebinding my-sa-view \
  --clusterrole=view \
  --serviceaccount=my-namespace:my-sa \
  --namespace=my-namespace

Service account has a token which is used by any third party service to access cluster(kube-apiserver using curl command), this token is kept as secret. This sa can now be associated with a role using RBAC for specific access
If third party service is running in cluster itself like as a pod then we can mount this secret as volume and then pod can access it directly - use serviceAccountName field in spec section
A default service account is also created in each ns and moounted with each pod if don't specify any other
automountServiceAccountToken: false - don't mount service account token with pod

Image security


When we specify image in pod definition file, it follows docker naming convention - image: nginx actually becomes image: docker.io/library/nginx where

docker.io is the default registry to look for image
library is default user/account
nginx is the repository name for image


gcr.io is another public registry where all k8s related images stored, for end to end testing gcr.io/kubernetes-e2e-test-image/dnsutils
Public cloud providers also has container registry service like ECR is by AWS
Private repository: Store images which are not public, requires some credentials to access - using docker login

docker login private-registry.io
docker run private-registry.io/apps/internal-app

For using private registry in pod definition file, we has to create secret of type docker-registry and specify name in pod definition

k create secret docker-registry regcred \
    --docker-server=private-registry.io \
    --docker-username=registry-user \
    --docker-password=registry-password \
    --docker-email=registry-user@org.com
...
kind: Pod
spec:
  containers:
      image: private-registry.io/apps/internal-app
  imagePullSecrets:
  - name: regcred
...
Security context


Security context can be set at the pod and/or container level

Pod level: Applies to all containers defined in this pod definition

...
spec:
  securityContext:
    runAsUser: 1000  # Default is root, skip this if want to run as root
  containers:
    ...

Container level: Applies to specfic container. Note if applied at both pod and container level, container level is applicable

...
spec:
  containers:
    securityContext:
      runAsUser: 1000
      capabilities:
        add: ["MAC_ADMIN"]
    ...

We can also set container capabilities which can only be at container level (as in above example)

Network policy


2 types of traffic - Ingress and Egress
Ingress: Traffic coming into the server/network
Egress: Going out of server/network
Replying back to client does not matter - doesn't require egress configuration, this is enabled by default
Ingress or egress is always looked from that specific server perspective - like for DB we only require ingress traffic
K8s is by default configured with "All allow" means, any pod can communicate with any other pod/service within the cluster - using pod IP, name, etc.
To restrict traffic we apply network policies to pod, this is done using selectors using labels and using it in NetworkPolicy object. Below example shows to apply network policy on db so that only api-pods can connect to db on port 3306 - this will restrict others like web server pods from accessing db

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-policy
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          name: api-pod
    ports:
    - protocol: TCP
      port: 3306

Network policies are enforced by networking solutions implemented on k8s cluster and not all networking solutions support network policies. Solutions that support are - kube-router, calico, romana, weave-net
We can further filter down on whom to allow with namespace filter

...
  ingress:
  - from:
    - podSelector:
        matchLabels:
          name: api-pod
      namespaceSelector:
        matchLabels:
          name: prod
...

For situations like allowing backup server which is not deployed as cluster in pod we can allow specific IP address also

...
  ingress:
  - from:
    - podSelector:
        matchLabels:
          name: api-pod
    - ipBlock:
        cidr: 192.168.5.10/32
...

For configuring egress - from in ingress becomes to and rest remains same

Storage

Storage in docker


Volumes in docker: https://gist.github.com/hansrajdas/d950ffd99c3ae817b08fd11592dc82eb#file-system

Container storage interface


Intially k8s only used to work with docker runtime and it's code was also embedded into k8s but with other container runtimes coming in (like rkt, crio), docker was moved out of k8s and container runtime interface which developed
CRI governs the interface that when a new runtime is developed, how it will communicate with k8s so that k8s don't have to change to support it
Similar to CRI, container networking interface(CNI) and container storage interface(CSI) is developed. CSI is a standard followed by storage drivers to work with any orchestration tool, some of storage drivers are portworx, Amazon EBS, etc.
CSI defines set of RPCs (like createVolume, deleteVolume) which will be called orchestrator and must be implemented by these storage drivers

Volumes


Like in containers, pod data is also gets deleted when a pod is deleted so to persist the data, we use volumes and mounts
We can attach volume to pod using volumeMounts to refer one of volumes created

apiVersion: v1
kind: Pod
metadata:
  name: random-num
spec:
  containers:
  - image: alpine
    name: alpine
    command: ["/bin/sh", "-c"]
    args: ["shuf -i 0-100 -n 1 >> /opt/number.out;]
    volumeMounts:
    - mountPath: /opt
      name: data-volume
  volumes:
  - name: data-volume
    hostpath:
      path: /data
      type: Directory

Now pod /opt maps to host /data directory whatever pod writes on path /opt will be present on host /data directory even if pod dies
This approach is not recommended if we have multi node cluster because directory will be specific to node so we use external storage solutions like NFS, AWS EBS, etc and specific option instead of hostpath, for example for AWS EBS, we use awsElasticBlockStore

...
volumes:
- name: data-volume
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4
Persistent volumes(PV)


In above section, we saw how volumes can be created the problem is it created with each pod definition. If we have lot of pods, it is hard to add/manage volumes with each pod so we create a PersistentVolume and use it with pods using PersistentVolumeClaim to claim the volumes persistent

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-vol1
spec:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 1Gi
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4
Persistent volume claims(PVC)


Admin creates PV and user creates PVC to use the storage
When PVC is created is gets maps to one of the PV which matches the PVC claim criteria. If user want to bind to specific PV - can provide additional filters also

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Mi

We can now use PVC with pod (or replicasets, deployments) definition

apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  containers:
    - name: myfrontend
      image: nginx
      volumeMounts:
      - mountPath: "/var/www/html"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: myclaim

We cannot delete PVC if it used by any pod - if we try it will be in terminating state until pod is deleted

Storage classes


Before creating PV, we must create volume in provider we are using like with AWS, we must provision EBS first before PV - this is called static provisioning
To solve above dependency we use storageClasses which takes the provider name and creates PV automatically for us like on AWS or GCP - this is dynamic provisioning

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: google-storage
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  replication-type: regional-pd

Then in PVC, we can refer this storage using storageClassName: google-storage in spec section and rest PVC definition remains same

Networking

Switching routing


To connect 2 hosts, we need to connect boths hosts using swtich using host's interface. Using ip link command we can check interface(s) on host
Router connects 2 switches(networks) and creates. Router IP is the first one in the network
We can have several routers, so hosts should know to send a packet to a host in other network which router to use - for this we use gateways (if host is a room then gateway is the door). To configure a gateway on a host, we can use below command

# To reach any IP in network 192.168.2.0/24 use gateway(router) address 192.168.1.1
# Route should be added on all hosts to send packets to hosts on other n/w
ip route add 192.168.2.0/24 via 192.168.1.1

# We can add default route for all other IPs/NW which we don't know
# Any IP for which explicit route is not added, use 192.168.1.1
ip route add default via 192.168.1.1

# See routes added on host
ip route show
route

Using host as router: Linux by default doesn't forward packets received on one interface to other, this is disabled for security reasons. We can enable it using

echo 1 > /proc/sys/net/ipv4/ip_forward

Above setting is not retained across reboots, we can set net.ipv4.ip_forward=1 in /etc/sysctl.conf file

DNS


We can add custom IP to hostname mapping /etc/hosts file. This translation of hostname to IP is known as Name resolution
Managing host/IP mapping like above is hard when number of hosts increases (and IPs of host can also change) so we use DNS server for this and configure host to point to this DNS server for host to IP lookup
IP of DNS server can be added in /etc/resolv.conf file with field nameserver so when host doesn't know IP of a host it goes to this DNS server to get IP of a host
If entry for same hostname is present in /etc/hosts and nameserver(DNS sever) both then host first checks local /etc/hosts file if not found then goes to DNS server configured. This ordering can also be changed using /etc/nsswitch file
For public internet hosts like (google.com, fb.com, ...) we can configure global DNS servers like 8.8.8.8 (by google) to check for host IPs this can be added to /etc/hosts file or configure our local DNS server to check at 8.8.8.8 if not found
We can add another entry called search in /etc/hosts file which appends domain name with host we want to search like

...
search mycompany.com
...

# If we ping `gitlab`, it will change the domain name to `gitlab.mycompany.com` automatically if it exists
# We can have list in search to have multiple items

Record types

A: Maps IP to hostnames
AAAA: Maps IPv6 to hostnames
CNAME: Maps one name to another name (like fb.com is same as facebook.com)


Tools

ping: Simple, gives IP in ping traces
nslookup: Resolves using DNS server, it doesn't take into account local /etc/hosts mappings
dig: More detailed


Using hosts as DNS: We have various tools for this coreDNS is one of those. This runs on port 53, which is the default port of DNS server

Docker networking

Refer this section: https://gist.github.com/hansrajdas/d950ffd99c3ae817b08fd11592dc82eb#docker-networking
Cluster networking


In a k8s cluster we can have multiple nodes - master and workers with unique IPs and mac addresses, below are some ports required to be open for each component in a cluster
ETCD(on master node): Port 2379, all control plane components connect to
ETCD(on master node): Port 2380, is only for etcd peer-to-peer connectivity
kube-api(on master node): Port 6443
kubelet(on master and worker node): Port 10250
kube-scheduler(on master and worker node): Port 10250
kube-controller-manager(on master node): Port 10252
services(on master node): Port 30000-32767
NOTE: If things are not working, all ports are one the first things to verify

Pod networking and CNI


K8s don't have any networking solution but it requires that each IP gets an unique IP address and every pod is reachable from every other pod in a cluster (with multi node also) without having to configure any NAT rules. In smaller nodes with couple of nodes, we can configure networking/routing using scripts but for large cluster it becomes hard to manage so in for those we use networking solutions(plugins) available that does this like weaveworks, flannel, cilium, vmware nsx
We can specify the CNI/network-plugin options in kubelet component using below args

...
    --network-plugin=cni \
    --cni-bin-dir=/opt/cni/bin \
    --cni-conf-dir=/etc/cni/net.d \
...
CNI weaveworks


weavework agent runs on each node and communicate with each other regarding the nodes, networks and pod. Each agent stores the topology of the entire setup and know pods and there IPs on other nodes
weave creates its own bridge on each node and names it weave and assigns IP address to each n/w
Deployed as daemon set to run on each node

IP address management - weave


CNI plugin (like weave) assigns IPs to pods. In CNI config file /etc/cni/net.d/net-script.conf we specify IPAM configuration, subnets, routes, etc.
Weave creates interface on each host with name weave, use ifconfig command to check
Weave default subnet is 10.32.0.0/12 which is 10.32.0.1 to 10.47.255.254, around 1,048,574 IPs for pods

Service networking


For services refer this section. This section discusses about service networking
kube-proxy runs on each node which listens for changes from kubeapi server and everytime a new service is to created kube-proxy gets into action and assigns IP to the service. Unlike pod service spans across cluster
kube-proxy creates routing rules corresponding to each service created, in this routing rule port is also present like if packet comes on IP:PORT forward it to POD-IP. This routing can be set using 3 ways - userspace, iptables(default), ipvs, this can be configured by setting --proxy-mode in kube-proxy config
Service IP range is configured in kube-api-server

kube-api-server --service-cluster-ip-range ipNet  # Default 10.0.0.0/24

# We can see the rules from NAT tables using iptables
iptables -L -t nat | grep <service-name>

# Check kube-proxy logs for routing created and mode/proxier used
cat /var/log/kube-proxy.log
DNS in kubernetes


k8s deploys a built in DNS server by default when we setup is a cluster
All pods and services are reachable using IP address within the cluster
For each service k8s creates a DNS record by default which maps service name to service IP. Within same namespace, we can access the service using service names. From other namespace, we has to specify namespace also
All service names are sub domain under domain namespace name
All namespaces are sub domain under service svc
All svc are sub domain under root domain called cluster.local by default

service name: web-service
namespace: apps

# Within same namespace
curl http://web-service

# From other namespaces, we can use any
curl http://web-service.apps
curl http://web-service.apps.svc
curl http://web-service.apps.svc.cluster.local  # FQDN

DNS records for pods are not created by defualt but we can enable that, once enabled it's entry is made with dots replaced in IP with - to IP and not pod name to IP. If pod IP is 1.2.3.4 then entry would be 1-2-3-4 maps to 1.2.3.4

curl http://1-2-3-4.apps.pod.cluster.local
CoreDNS in kubernetes


Initial k8s DNS component was kube-dns but after v1.12 k8s recommended to use coreDNS
coreDNS is deployed as a replicaSet in cluster and takes a config using configMap, coreDNS config on host is placed at /etc/coredns/Corefile. coreDNS watches for any new service or pod (if enabled in coreDNS config file) created and adds an entry in its database
To access coreDNS, a service is also created with name kube-dns. Pods are configured (by kubelet) to have kube-dns IP in nameserver field in file /etc/resolv.conf. This file also has search fields to make FQDN from only service-name or service.namespace

Ingress


K8s object which acts as application load balancer(Layer 7) - directs request to different services based on URL path
This becomes single where SSL can be implemented - independent of all services
Ingress deployment - we need two things

Ingress controller: This is one of the third party solution like nginx, HA proxy, etc. K8s doesn't come with any default ingress controller so has to install one. We will use nginx as an example and see what all objects are required to deploy nginx igress controller

Deployment: Image used will modified version of nginx: quay.io/kubernetes-ingress-controller/nginx_ingress_controller
Service: Of type NodePort with selector of above ingress controller
ConfigMap: To store nginx config data
ServiceAccount: To access all objects - role, clusterBinding, roleBinding


Ingress resources: Configuration rules on ingress controller to route traffic to specific service based on URL like p1.domain.com should go to p1 service, p2... to p2 or domain.com/p1 to p1,  and so on. This rsource is created using below definition file

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-wear
spec:
  backend:
    serviceName: wear-service  # Route all traffic to wear service
    servicePort: 80

We can define rules (with paths) in ingress resources to map traffic from different URLs to specific service

...
spec:
  rules:
  - http:
      paths:
      - path: /wear
        backend:
          serviceName: wear-service
          servicePort: 80
      - path: /watch
        backend:
          serviceName: watch-service
          servicePort: 80

We can define rules (with host) in ingress resources to map traffic from different subdomains to specific service

...
spec:
  rules:
  - host: wear.my-online-store.com
    http:
      paths:
      - backend:
          serviceName: wear-service
          servicePort: 80
  - host: watch.my-online-store.com
    http:
      paths:
      - backend:
          serviceName: watch-service
          servicePort: 80

Imperative way of creating ingress resources

kubectl create ingress <ingress-name> --rule="host/path=service:port"

# Example
kubectl create ingress ingress-test --rule="wear.my-online-store.com/wear*=wear-service:80"


Designing a cluster


Designing a cluster would depend on what is the purpose of it, based on the purpose we can have design in different ways

minikube: Used to deploy single node cluster very easily. This provisions a VM and then runs k8s
kubeadm: Used to deploy multi node cluster. This expects VMs are already provisioned


There no solution available on windows to use k8s, we has to provision a linux based VM on windows to use k8s
For HA cluster, we use multiple master nodes, which is backed by a load balancer which directs the requests to one of the master nodes. Master nodes has below components running

API server: All API servers are active on all master nodes
Controller manager(replication & node): Only one is active others are on standby, active is elected using leader election
Scheduler: Only one is active others are on standby, active is elected using leader election
ETCD: It is distributed system so API server can reach to any of the ETCD instance running for read or write


ETCD runs with master node and generally on master node but for complex(and HA) clusters we can run ETCD on separate nodes and connect to master nodes
We can run cluster on-prem or cloud. In cloud, we have option to self manage cluster or use managed solutions like EKS(AWS), GKE(GCP), ...

ETCD in HA


ETCD is a distributed, reliable key value store that is simple, secure and fast
Client can connect to any instance of ETCD in cluster and perform read/write operation. If 2 writes come at the same time on 2 different ETCD instances then one is selected on the basis of leaders consent, write is complete when leader gets sconsent from other instanes in the instances
Leader is elected using raft algorithm - voting election kind of mechanism
Write is considered successful if quorum = N/2 + 1 has that write propogated, if cluster has instances less than quorum(majority nodes) then cluster will be down
It is recommended to have odd number of instances for better fault tolerance
For installation, we can download the latest binary from github. ETCDCTL utility can be used to access ETCD cluster

Install kubernetes the "kubeadm" way

Steps to setup cluster using kubeadm tool

Have multiple hosts to designate one or more as master nodes - we can also use vagrant for provision virtual machines, this vagrantfile provisions one master and 2 worker nodes
Install container runtime like docker on each host(master & worker)
Install kubeadm, kubelet and kubectl on all hosts(master & worker)
Initialze master nodes - setting up all master node components
Setup POD networking solution like calico, weave net, etc. on all nodes so all that all pods can communicate with each other
Join worker nodes to master nodes, command is printed on running kubeadm init to join master - run this command on each worker nodes
Launch applications - create pods

Debugging failures


Master nodes

Check kube-system pods are up and running if we unable to perform managment operations like scaling pods up/down


Worker nodes

Check node status
Describe nodes, if it's in Ready state
Check kubelet certificates, if they are not expired
Check kubelet status, if it is running service kubelet status
kubelet logs sudo journalctl -u kubelet


Network troubleshooting: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/24452872#content

JSON PATH


When dealing with cluster with large number of nodes and objects, it becomes hard to query each node/objests and check for relevant information. So we can get print only relevant result, filter and sort on specific field using jsonpath  option in kubectl command

k get nodes -ojsonpath='{.items[*].metadata.name}'        # Prints only node name
k get nodes -ojsonpath='{.items[*].status.capacity.cpu}'  # Prints cpu
...

# Print node name and cpu info
k get nodes -ojsonpath='{.items[*].metadata.name}{"\n"}{.items[*].status.capacity.cpu}'

# We can format output using loops
k get nodes -ojsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.capacity.cpu}{"\n"}{end}'

# Using custom columns is another way to printing required information - as above
k get nodes -ocustom-columns=<COLUMN NAME>:<JSON PATH>

# Print node name and CPU
k get nodes -ocustom-columns=NODE:.metadata.name,CPU:.status.capacity.cpu

# We can also use sort-by option to sort according to some value (using json path)
k get nodes --sort-by=.metadata.name

# Filter based on specfic condition - get context name for user `aws-user`
kubectl config view --kubeconfig=/root/my-kube-config -ojsonpath='{.contexts[?(@.context.user=="aws-user")].name}'
Other stuff

Delete resource stuck in Terminating state

# Example to delete a namespace
kubectl get namespace "ns1" -o json | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" | kubectl replace --raw /api/v1/namespaces/ns1/finalize -f -
Last applied configuration


Last applied configuration is also kept with live yaml configuration. This helps k8s figure out if something is deleted then delete it from deployed version. Like a label is deleted from in new file applied, it will be checked if it was present in last applied config then will be deleted from deployed version.
Last applied configuration is only stored when we use kubectl apply command, with kubectl create/replace command this info is not stored.
So 3 things are compared when using kubectl apply command

New yaml file
Deployed yaml version
Last applied configuration


Labels, selectors and annotations


Labels can be applied in k8s objects which can be used as selector for filtering required objects
Like labels we can also have annotations which holds metadata info like buildversion, etc

Deployments vs stateful sets


Deployment - You specify a PersistentVolumeClaim that is shared by all pod replicas. In other words, shared volume. The backing storage obviously must have ReadWriteMany or ReadOnlyMany accessMode if you have more than one replica pod.
StatefulSet - You specify a volumeClaimTemplates so that each replica pod gets a unique PersistentVolumeClaim associated with it. In other words, no shared volume. Here, the backing storage can have ReadWriteOnce accessMode. StatefulSet is useful for running things in cluster e.g Hadoop cluster, MySQL cluster, where each node has its own storage.
Read more here: https://stackoverflow.com/questions/41583672/kubernetes-deployments-vs-statefulsets

Commands

Note: We have used pod name as nginx in all commands, this should be replaced with specific pod name. We have aliased kubectl command to k
alias k=kubectl
Create/run

# Create a pod
k run nginx --image=nginx

# Create pod with label
k run nginx --image=nginx -l tier=msg

# Create pod and expose port
kubectl run httpd --image=httpd:alpine --port=80 --expose

k create deployment httpd-frontend --image=https:2.4-alpine
k create namespace dev  # Create 'dev' namespace

# Doesn't create the object, only gives the yaml file
k run nginx --image=nginx --dry-run=client -o yaml > pod-definition.yaml
k create deployment nginx --image=nginx nginx --dry-run=client -o yaml > nginx-deployment.yaml
k create service clusterip redis --tcp=6379:6379 --dry-run=client -o yaml > service-definition.yaml

# Run a pod to debug or run some command like checking nslook from a pod for a service - we can use busybox image
# --rm will delete pod once command is completed or we exit from shell prompt
kubectl run --rm -it debug1 --image=<image>  --restart=Never -- <command>
kubectl run --rm -it debug1 --image=busybox:1.28  --restart=Never -- sh  # Attach with shell
Deploy yaml file

k apply -f filename.yaml
k create -f filename.yaml

# Deploy this in given namespace. This ns info can also be added in yaml definition itself
# to avoid giving in command always, like when creating a pod, it can be added in metadata section
k create -f filename.yaml -n my-namespace
Get

k get all                       # Get all k8s objects deployed
k get pods                      # Get list of all pods in current namespace like default
k get pods -n kube-system       # Get list of all pods in 'kube-system' namespace
k get pods --all-namespaces     # Get pods in all ns
k get pods -o wide              # Gives more info like IP, node, etc.
k get pods nginx                # Get specific pod info
k get pods --show-labels        # Get labels column also
k get pods --no-headers         # Don't print header
k get pods -selector app=App1   # Get pods having "app=App1" label
k get pods -l app=App1          # -l is same as -selector

# Pods running on a node
k get pods -A --field-selector spec.nodeName=<nodeName>

# Using jq - this general command can be used to filter any other parameter
k get pods -A --field-selector spec.nodeName=<nodeName> -o json | jq -r '.items[] | [.metadata.namespace, .metadata.name] | @tsv'

k get replicationcontrollers  # Get list of replica controllers

k get replicaset
k get deployments

k get services

k get daemonsets

k get events
Describe

k describe pod
k describe pod nginx

k describe replicaset myapp-replicaset

k describe deployments
k describe services

k describe daemonsets <name>
Edit

k edit pod nginx  # Opens this pods yaml file in editor and we can make the changes

k edit replicaset myapp-replicaset
Delete

k delete pod nginx

k delete replicaset myapp-replicaseet
Scale replicaSets

k replace -f replicaseet-definition.yml  # Update num of replicas and deploy yaml file

k scale --replicas=6 -f replicaseet-definition.yml
k scale --replicas=6 replicaset myapp-replicaset

k scale deployment -replicas=3 httpd-frontend
Others


Update image in a deployment (but take care, deployment file will have different image version - originally specified)

k set image deployment/myapp-deployment nginx=nginx:1.9.1

See all options available for a resource

k explain <kind>           # Format
k explain pod              # See top level options
k explain pod --recursive  # See all options

# See all tolerations options
k explain pod --recursive  | grep -A5 tolerations

# Get node summary like free persistent volume(pv) space, which we can't find with other commands
kubectl get --raw /api/v1/nodes/ip-10-3-9-207.us-west-2.compute.internal/proxy/stats/summary
Certification tip


Use dry-run option: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/14937836#content
Imperative Commands with Kubectl: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/15018998#content
CKA practice tests

Lightning Lab - 1: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/18341304#content
3 Mock exams: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/15328838#content


https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/16103293#overview

References


K8s for absolute beginners: https://www.udemy.com/course/learn-kubernetes/
kubeclt cheat sheet: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
HTTPS: https://robertheaton.com/2014/03/27/how-does-https-actually-work/
Installing k8s, the hard way: https://www.youtube.com/watch?v=uUupRagM7m0&list=PL2We04F3Y_41jYdadX55fdJplDvgNGENo
Kodecloudhub CKA course: https://github.com/kodekloudhub/certified-kubernetes-administrator-course
End to End tests(removed from CKA exam): https://www.youtube.com/watch?v=-ovJrIIED88&list=PL2We04F3Y_41jYdadX55fdJplDvgNGENo&index=19
CKA with Practice Tests:

https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/
https://killer.sh/


CKA FAQs: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/15717196#overview

Use the code - DEVOPS15 - while registering for the CKA or CKAD exams at Linux Foundation to get a 15% discount


Good reads

https://www.infoworld.com/article/3713286/kubernetes-is-not-a-cost-optimization-problem.html
https://github.com/jamiehannaford/what-happens-when-k8s