prehnRA/learning-kubernetes.md

## learning-kubernetes.md

      
    Raw
  

              learning-kubernetes.md
            
          
    Learning Kubernetes

Questions / Resource Search


What are the good books?
Are there good online courses?
What about figuring out how things go wrong and fix them?
Anything for AppEng people?

Learn Docker


https://docs.docker.com/get-started/
https://www.katacoda.com/courses/docker/

Learn Kubernetes


https://www.udemy.com/learn-devops-the-complete-kubernetes-course/#reviews
https://www.udemy.com/learn-devops-advanced-kubernetes-usage/
https://kubernetes.io/docs/getting-started-guides/minikube/

https://kubernetes.io/docs/tutorials/kubernetes-basics/#basics-modules
https://www.katacoda.com/courses/kubernetes/playground
https://labs.play-with-k8s.com/
Kubernetes Basics


What is Kubernetes?

"Kubernetes coordinates a highly available cluster of computers that are connected to work as a single unit."
"Allows you to deploy containerized applications to a cluster without tying them specifically to individual machines." - Ok, that's better.
"Kubernetes automates the distribution and scheduling of application containers across a cluster in a more efficient way.


What's a master?

A master manages the cluster (it coordinates activities)-- scheduling applications, maintaining state, scaling, rolling out updates.
QUESTION: What happens if the master dies? What're the signs? How can this
be avoided?

It's bad.
The master is actually multiple parts, so it depends on what part dies.

If the API server dies, most kubectl commands won't work, and the
kubelets can get confused (they use the API to talk to master).
If etcd dies, you could lose cluster configuration / data. BAD.
If the scheduler dies, then you probably can't get new pods scheduled on
the cluster (deployments fail, apps go offline / get unhealthy as their
pods fail and can't be replaced)
If the controller manager fails, ReplicationControllers stop working
which means you probably won't get the right number of replicas of
deployments. Also, services & pods probably won't get joined right,
because the endpoint controller is in here.
If the cloud controller fails, you won't be able to interface with the
cloud provider that you are using, so any operations relying on that will
fail (if you use LoadBalancers provided by your cloud for instance--
you won't be able to create new ones)


The good news is you can configure a clustered master-- running master
components across multiple nodes, monitoring the health of these nodes, and
automatically replacing them as needed. https://kubernetes.io/docs/admin/high-availability/building/


What's a node?

A computer (VM or physical machine) that serves as a worker.
The "master" bosses these around.
Each one has a "kubelet" which talks to the master.
"The node should also have tools for handling container operations, such as Docker or rkt."

Docker is a container engine. So is rkt (seems to be CoreOS related and security-minded).


"A kubernetes cluster that handles production traffic should have a minimum of three nodes."
Nodes communicate with the master via the Kubernetes API.

TODO: Check out the kubernetes API and see if there is value in learning that directly too.


What's a process?

On Kubernetes, I think these are all containers, that get "scheduled" within nodes by the master.


What is minikube?

A way to run kubernetes directly on your machine with one node, in a VM.


If I have an app, and I want to push it to kubernetes: how?

Is this even a good question? Is this the right "layer" for that?

Kubernetes just cares about making deployments. You can make a deployment
with the CLI or via the API. A deployment is basically docker image + config.
So, to "push an app to kubernetes":

You bundle the app up as a docker image
You figure out what the necessary config is
You run the kubectl command to make a new deployment from that image
and that config.


How does "distribution" work?


How does "scheduling" work?


QUESTION: What is CoreOS?

It's an enterprise-y kubernetes/ops thing. RedHat owns it now.
A lot of it is open source, and the CoreOS team contributes to / maintains a lot of kubernetes related things.

Container Linux is an example-- a trimmed down linux designed to be the base of linux images.
Also etcd, dex, a bunch of operators


Installing Hyperkit

Hyperkit is only for Mac. Other platforms can probably use more reasonable hypervisors for their VM.


Check it out from source? Run make?


TODO: How do we have hyperkit installed? Does this happen automatically with the new docker cask on Mac?


Install Kubectl


TODO: How do we have kubectl installed?

Install minikube


This is like installing your gentoo from source or whatever. What if this were like software or something?


Install hyperkit driver:


curl -LO https://storage.googleapis.com/minikube/releases/latest/docker-machine-driver-hyperkit \
&& chmod +x docker-machine-driver-hyperkit \
&& sudo mv docker-machine-driver-hyperkit /usr/local/bin/ \
&& sudo chown root:wheel /usr/local/bin/docker-machine-driver-hyperkit \
&& sudo chmod u+s /usr/local/bin/docker-machine-driver-hyperkit

Love2curlBinariesOntoMySystemWithSudo.png.exe
Install minikube itself (https://github.com/kubernetes/minikube/releases):
curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.27.0/minikube-darwin-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/

Starting it up:
minikube start --vm-driver hyperkit # otherwise it tries vbox

NOTE: No space left on device error-- minikube tries to steal 64gb of disk so
you need lots clear. However, also this can happen when minikube gets into a
weird state. You may need to nuke your ~/.minikube/ directory and start again.
Stopping Minikube

systemctl stop localkube # maybe
minikube stop # maybe
docker rm -f $(docker ps -aq --filter name=k8s)

Interactive Tutorial

Module 1: Verifying Our Setup

minikube version
minikube start
kubectl version
kubectl cluster-info
kubectl get nodes

Starts by checking that we have minikube, then starting minikube. Then we check that we have kubectl, and check what cluster kubectl is pointing at (should be pointed at a 192.168.x.x which is minikube). Further, if we get the nodes (kubectl get nodes) we should see just the one (the minikube master).
Module 2: Creating a Deployment


What is a "Deployment" in k8 vocabulary? What are the "pieces" of a deployment?

I think a deployment is basically an image + a configuration (particularly how many copies of it to run aka how many pods)


How do we make a deployment deploy?

kubectl run NAMESPACE --image=URL_TO_IMAGE --port=PORT_TO_RUN_ON


kubectl run kubernetes-bootcamp --image=gcr.io/google-samples/kubernetes-bootcamp:v1 --port=8080

Creates a deployment: kubernetes-bootcamp = name?, image is a docker image, port= tells it what port to attach to?
It says it is running, but I don't see it. Oh, I can't reach it on the network maybe? Try kubectl proxy kubectl get nodes won't change because it is all running in a single node! (that's how minikube rolls!)
"Starting to server on 127.0.0.1:8001. Serve what? Looks like the k8 API?

Other things too supposedly, like each pod?

It's the k8 API. But there's also a proxy (part of the API?) that can be
used to reach pods


Yeah. There's proxy routes for each pod! Example:
http://localhost:8001/api/v1/namespaces/default/pods/$POD_NAME/proxy/

your pod name will vary!
Module 3: Pods


Kubernetes automatically made a pod for us before.
A Pod is an abstraction for one or more app containers (like Docker) & some shared resources for them (disk store, a network, etc), and config (e.g.  which ports).
Everything in a Pod is always scheduled / located together (never broken apart on different nodes) and has one IP.

Nodes


Pods run on Nodes. Nodes can have multiple Pods.

Master decides which Nodes to run Pods on based on "available resources"

QUESTION: How? What does this mean?

My impression: there's a bunch of complexity in the way this works by
default. One of those smart people doing lots of math things.

The kubernetes scheduler is a policy-rich, topology-aware,
workload-specific function that significantly impacts availability,
performance, and capacity. The scheduler needs to take into account
individual and collective resource requirements, quality of service
requirements, hardware/software/policy constraints, affinity and
anti-affinity specifications, data locality, inter-workload
interference, deadlines, and so on. Workload-specific requirements
will be exposed through the API as necessary.


There are two policies you can tweak if necessary: FitPredicate and PriorityFunction

FitPredicates are REQUIREMENTS: e.g. only run these pods on nodes matching these labels
PriorityFunctions determine which nodes, among the ones that are compatible (see FitPredicate) should the scheduler prioritize (the pod can only run on one node, so which is the best). The default is one that tries to put the pod on the node
with the lowest current resource utilization


As far as I can tell, these are changeable at the time the scheduler is compiled. Or at least that's when you have to build in the options. You can change config at runtime though (e.g. you can make the scheduler only schedule a pod on a certain label, but you couldn't introduce a while new type of predicate [making this up-- random assignment])


Nodes have a kubelet (talks to Master), and a runtime for containers (Docker or rkt)


Containers go together in a single pod is they are tightly coupled and need to share resources!


Troubleshooting Basics


kubectl get - list resources
kubectl describe - show detailed information about a resource
kubectl logs - print the logs from a container in a pod
kubectl exec - execute a command on a container in a pod

For example, your "deis run" function that you copied off of stackoverflow finds a particular container and pod that is running your app in deis, and does kubectl exec on it to execute the command.

kubectl get FOO where foo is a resource type-- so far we've seen pod and node as resource types

kubectl describe pods gives a bunch of information about our pods:
name, namespace, node, start time, labels, IP, controlled by (?), container(s) [with image, port, etc], conditions, volumes, recent events, "tolerations"


"Controlled by" indicates which ReplicationController (older) / ReplicaSet
(new preferred) is in charge of making sure the right number of pods are
running.


"The describe output is designed to be human readable, not to be scripted against."

QUESTION: What is to be scripted against? Well, I think you have options:

You can get YAML output for just about everything by adding -o yaml
You can do go templates for output too like -o go-template='{{(index .spec.ports 0).nodePort}}')
There's an API


kubectl logs POD_NAME gives logs of what has happened recently


kubectl exec POD_NAME -- COMMAND runs COMMAND in POD_NAME

e.g. kubectl exec kubernetes-bootcamp-5c69669756-kg2mk -- ls -la
-- is important to complex commands, otherwise kubectl will try to handle all the flags, switches, arguments, etc
kubectl exec -it kubernetes-bootcamp-5c69669756-kg2mk -- bash gets an interactive shell! (-it seems to let it be interactive, otherwise it returns immediately)


Jazz Break


So, for "our" environment to be like heroku, we'd want

one namespace per app? that's how deis does it
automatic determination of namespace / pod names for the current app
porcelin around logs, exec, etc
exec to automatically run in a one-off pod? Or at least automatically ID a pod to run in for you.
automatic wrapping of an app in a container maybe? Or do we just get over it? (forcing people to use docker is kinda the whole problem)
IDEA: can we make a docker image that leverages bin/setup, bin/server to always do the right thing?
IDEA: Map heroku's functionality / workflow & translate to k8.


Module 4: Services


What are services?


Pods die. When a Node dies, pods on it are lost.

"ReplicationController might then dynamically drive the cluster back to
desired state via creation of new Pods to keep your application running."
Each pod has an IP, so the cluster does need to keep track of what is
happening.


"A Service in Kubernetes is an abstraction which defines a logical set of Pods
and a policy by which to access them. Services enable a loose coupling between
dependent Pods. A Service is defined using YAML (preferred) or JSON, like all
Kubernetes objects. The set of Pods targeted by a Service is usually determined
by a LabelSelector (see below for why you might want a Service without including
selector in the spec)."


Ok, so what's a LabelSelector then?


Services are what allows your pods to be reached outside of the cluster!
There are different types of services:

ClusterIP (default): makes the service available inside the cluster by IP
NodePort: exposes the service using NAT to make it available outside with
NodeIP:NodePort map.
LoadBalancer - Creates an external load balancer in the current cloud and
gives an external IP to the service.
Expose the service via a CNAME record. Requires kube-dns 1.7+ externalName
determines the name


Services let your app keep ticking if pods / nodes die [because you can load
balance amongst all the others].


Labels and Selectors are things that Kubernetes uses to do operations on
groups of things.

Kind of like how CSS selectors let you do styles on certain nodes?

Yeah, ish. A way of querying, or SELECTING certain resources based on
labels.


Labels are key value pairs, and can be used for lots of stuff, e.g. app=Foo, labeling staging vs. prod,
or different versions, or just a tag.


NOTE: minikube makes a default service called kubernetes


kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080


kubectl get services


kubectl get services/kubernetes-bootcamp get the nodePort


kubectl get pods -l run=kubernetes-bootcamp lists all pods labeled run=kubernetes-bootcamp


kubectl get services -l run=kubernetes-bootcamp lists services from that label


kubectl delete service -l run=kubernetes-bootcamp deletes services by that label


QUESTION: Does minikube support LoadBalancer yet? How close is that?

No, not really, but these threads have good info, as well as minikube patches
that add this functionality

kubernetes/minikube#384
kubernetes/minikube#2834


Module 5: Scaling

https://kubernetes.io/docs/tutorials/kubernetes-basics/scale/scale-intro/
"Scaling out a Deployment will ensure new Pods are created and scheduled to Nodes
with available resources. "


Kubernetes supports autoscaling but it isn't covered in this section.

Here're the docs though http://kubernetes.io/docs/user-guide/horizontal-pod-autoscaling/


Services provide load balancing, so they can feed traffic to multiple pods.

They use endpoints to tell if a pod is up

Is this "health checks"?

Sort of. I think they are actually called "liveness and readiness probes"

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Liveness probes can be scripts that run (exit condition determines liveness),
HTTP requests (HTTP status code determines liveness), or a TCP connection
(whether it can connect on a given port determines liveness)
LIVENESS probes are used to determine if a pod should be killed and
replaced
READINESS probes are used to determine if a pod can accept traffic

Maybe a pod is overwhelmed, and you don't want to give it any more
requests, but you don't want to kill it either (because it'll
recover once it handles its workload)


You can do rolling updates without downtime


Under kubectl get deployments: "Replicas" <- copies of a deployment? "Current replicas"
is how many are running on the pods?


Scaling command example: kubectl scale deployments/kubernetes-bootcamp --replicas=4


Module 6: Updating

https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/

aka how to do a rolling update with deployments
Kubernetes will try to replace only some of your pods, leaving others running

By default, it does one at a time, but you can also do a higher number or
a percentage of the pods


The load balancer will not send traffic to the ones that are being changed out
You update deployments by changing the image (or config, I suppose)

Example: kubectl set image deployments/kubernetes-bootcamp kubernetes-bootcamp=jocatalin/kubernetes-bootcamp:v2


Handy: kubectl rollout status deployments/kubernetes-bootcamp tell whether
a rollout has succeeded or not.
ROLLBACK: kubectl rollout undo deployments/kubernetes-bootcamp

That's it for the "Kubernetes Basics" tutorial!!

Jazz Break: Questions Again


How is our etcd backed up? Are there other places cluster data is stored?
Are we running a single master? Or a clustered/distributed master?
Thought: we may have to learn some go at some point
IDEA: Evaluate tectonic (from CoreOS)-- what's the business model, what's the license, is it completely open source?

What to learn next?


Volumes! Persistent storage wasn't covered
What are operators?