This workshop will provide hands on experience on setting up and running an AWS Kubernetes cluster using EKS. We will use gitops, and explore kubernetes tools to make the cluster self-driving, with automated management and remedy of common cluster level problems. To achieve this, we will use eksctl, cluster-autoscaler, kube-prometheus (prometheus operator), node-problem-detector, draino, and node-local-dns-cache.
- High Availability with Health Checks and Automated Recovery
- Scalability: Automatic horizontal and vertical scaling
- Security + Policy
- Service discovery
- Observabilitiy: Metrics, Tracing, Logs, Alerts
- Traffic Management
- Retries
- Timeouts
- Load balancing
- Rate limiting
- Bulkheading
- Circuit breaking
- Actually useful business purpose �those things were for!
Hint: Kubernetes and Istio (or Linkerd) Service Mesh
You either need a rich execution environment allow your microservice to laser focus on business value, or you need to add so much overhead to your microservice that it winds up not very micro.
If you can build a straightforward monolithic app and never think about all this asynchronous stuff, go for it! If your system is big enough that you need to refactor into microservices for sanity’s sake, or you need to scale components independently to manage load, or you need to make temporary outages survivable, then microservices with a rich execution environment are a great way to go.
Many of the most powerful architectures involve combining diverse workloads like a stateless web application with a persistent, stateful database, along with periodically running to completion a finite task. Even if these pieces are packed in containers, they still need to be coordinated. We need to deploy, manage, and scale the disparate pieces in different ways. We also wish to span some pieces over many servers while looking like one single unit to other pieces. On top of this, managing persistent storage is a distinct problem from managing other computational resources.
There are many disparate technical solutions for managing each of the concerns of applications, computational resources, and storage resources. Kubernetes provides a single, common solution to these common problems.
As Paul Ingles said, one of Kubernetes’ greatest strengths is providing a ubiquitous language that connects applications teams and infrastructure teams. And, because it’s extensible, this can grow beyond the core concepts to more domain and business specific concepts.
We also found Kubernetes attractive because it allowed us to iterate quickly for a proof of concept, while giving us built-in resilience and an easy path to scale it in production.
Kubernetes extensible nature, first class support for healthchecks, detailed metrics, and automated recovery for applications make it very simple to automate maintenance and recovery of the Kubernetes platform and it's components.
This workshop will provide hands on experience on running an AWS Kubernetes cluster using EKS. We will use gitops, and explore kubernetes tools to make the cluster self-driving, with automated management and remedy of cluster level problems. To achieve this, we will use eksctl, cluster-autoscaler, external-dns, kube-prometheus (prometheus operator), node-problem-detector, draino, node-local-dns-cache, and dashboard.
While we won't focus on it, we will provide a continuous delivery pipeline on top of kubernetes.
Also Why do I need kubernetes and what can it do and (see why containers)
This workshop is intended to appeal primarily to four types of people:
- Application developers looking to get an AWS kubernetes cluster to experiment without a lot of infrastructure knowledge
- AWS DevOps people without a lot of kubernetes experience
- Kubernetes DevOps people without a lot of AWS experience
- Full-stack, Full-cycle developers in small or large teams.
- An AWS Account ($25 credit code will be given on day of the workshop)
- aws cli installed and configured to access account
- kubectl installed
- aws-iam-authenticator installed
- eksctl installed
- docker installed
This workshop expects you to create your own AWS account, but participants will be given a $25 cost code to cover any costs incurred during the workshop. A pre-existing VPC is not required. Participants should create their accounts ASAP. A small percentage of account may be pulled into a manual verification workflow. Also if any users have accounts that have been deactivated for non-payment it will take some time for them to reactive once a credit card is added.
💡 Tip: Your account must have the ability to create new IAM roles and scope other IAM permissions. |
-
If you don't already have an AWS account with Administrator access: create one now by clicking here
-
Create a billing alarm - Super important!
💡 Tip: After you have the AWS CLI installed (as below), You will need to have AWS API credentials configured. You can use ~/.aws/credentials file
or environment variables. For more information read AWS documentation. |
MacOS users can use Homebrew:
brew install awscli
and Windows users can use chocolatey:
chocolatey install awscli
If you already have pip and a supported version of Python, and ideally you know how to set up a virtual environment. You can install the AWS CLI by using the following command. If you have Python version 3+ installed, we recommend that you use the pip3 command.
pip3 install awscli --upgrade --user
Although it might provide an outdated version, Linux users can also use the default package managers for installing AWS CLI, e.g.:
$ sudo apt-get update awscli
$ sudo yum install awscli
Linux or Mac:
sudo curl --silent --location -o /usr/local/bin/kubectl curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
sudo chmod +x /usr/local/bin/kubectl
Or Download the Windows executable
If you have golang installed and your $PATH
includes $GOPATH/bin
:
go get -u -v github.com/kubernetes-sigs/aws-iam-authenticator/cmd/aws-iam-authenticator
Otherwise, download the Amazon EKS-vended aws-iam-authenticator binary from Github Releases:
To download the latest release, run:
curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
Alternatively, macOS users can use Homebrew:
brew tap weaveworks/tap
brew install weaveworks/tap/eksctl
and Windows users can use chocolatey:
chocolatey install eksctl
💡 Tip: Avoid Docker Toolbox and boot2docker. These are older packages that have been ceded by Docker for Mac.
brew cask install docker # Install Docker
open /Applications/Docker.app # Start Docker
docker.io
is available from the Ubuntu repositories (as of Xenial).
# Install Docker
sudo apt install docker.io
sudo apt install docker-compose
# Start it
sudo systemctl start docker
💡 Tip: If the `docker.io` package isn't available for you, see [Get Docker CE for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) for an alternative. |
Install Windows Subsystem for Linux and choose Ubuntu as your guest OS. Install Docker as you normally would on Ubuntu (see above). After that, see these instructions for info on how to get it running.
💡 Tip: Avoid _Docker for Windows_. While it works in most cases, you'll still face NTFS limitations without WSL (eg, lack of symlinks, which is needed for Yarn/npm to work). |
For other operating systems, see: https://www.docker.com/community-edition#download
for command in docker kubectl aws-iam-authenticator eksctl
do
which $command &>/dev/null && echo "$command in path" || echo "$command NOT FOUND"
done
Amazon EKS works by provisioning (starting) and managing the Kubernetes control plane for you. At a high level, Kubernetes consists of two major components – a cluster of 'worker nodes' that run your containers and the 'control plane' that manages when and where containers are started on your cluster and monitors their status.
Without Amazon EKS, you have to run both the Kubernetes control plane and the cluster of worker nodes yourself. With Amazon EKS, you provision your cluster of worker nodes and AWS handles provisioning, scaling, and managing the Kubernetes control plane in a highly available and secure configuration. This removes a significant operational burden for running Kubernetes and allows you to focus on building your application instead of managing AWS infrastructure.
At present the experience of creating a VPC, EKS cluster, and worker nodes using the web console or using the provided Amazon Machine Image (AMI) and AWS CloudFormation scripts leaves you with something that can be difficult to see how to operationalize or provides a frictionless developer experience.
Fortunately, both the Kubernetes Ecosystem and that of AWS are vast and dynamic, and tools have filled this gap.
While it started as a simple CLI for EKS, it's clear ambition is to serve both developer use-cases and operational best practices like GitOps. At present, it already does a pretty good job of both.
GitOps takes full advantage of the move towards immutable infrastructure and declarative container orchestration. In order to minimize the risk of change after a deployment, whether intended or by accident via "configuration drift" it is essential that we maintain a reproducible and reliable deployment process.
Our whole system’s desired state (aka "the source of truth") is described in Git. We use containers for immutability as well as different cloud native tools like Cloudformation and Terraform to automate and manage our configuration. These tools together with containers and declarative nature of Kubernetes provide what we need for a complete recovery in the case of an entire meltdown.
Meanwhile, Developers want a quick and easy way to spin up a flexible, friendly, and frictionless continuous delivery pipeline so they can focus on delivering business value (or just doing cool stuff) without getting bogged down in yak-shaving (i.e. details, details).
The eksctl
tool lets us spin up and manage a fully operational cluster with sensible defaults and a broad array of addons and configuration settings. You can choose to pass it flags on the cli (dev mode), or specify detailed configuration via yaml, and checked into git (ops mode). As it improves, it continues to better meet the use cases of both audiences.
The kube-prometheus stack is meant for cluster monitoring, so it is pre-configured to collect metrics from all Kubernetes components. The kube-prometheus stack includes a resource metrics API server for horizontal pod autoscaling, like the metrics-server does. In addition to that it delivers a default set of dashboards and alerting rules.
It provides Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.
It includes:
- The Prometheus Operator
- Highly available Prometheus
- Highly available Alertmanager
- Prometheus node-exporter
- Prometheus Adapter for Kubernetes Metrics APIs
- kube-state-metrics
- Grafana
Cluster Autoscaler is a standalone program that adjusts the size of a Kubernetes cluster to meet the current needs.
A node is a worker machine in Kubernetes. There are tons of node problems could possibly affect the pods running on the node such as:
- Infrastructure daemon issues: ntp service down;
- Hardware issues: Bad cpu, memory or disk, ntp service down;
- Kernel issues: Kernel deadlock, corrupted file system;
- Container runtime issues: Unresponsive runtime daemon;
If problems are invisible to the upstream layers in cluster management stack, Kubernetes will continue scheduling pods to the bad nodes. The daemonset Node Problem Detector collects node problems from various daemons and make them visible to the upstream layers. It is running as a Kubernetes Addon enabled by default in GCE clusters, but we need to manually install it in EKS.
Draino is intended for use alongside the Kubernetes Node Problem Detector and Cluster Autoscaler. The Node Problem Detector can set a node condition when it detects something wrong with a node - for instance by watching node logs or running a script. The Cluster Autoscaler can be configured to delete nodes that are underutilised. Adding Draino to the mix enables autoremediation:
- The Node Problem Detector detects a permanent node problem and sets the corresponding node condition.
- Draino notices the node condition. It immediately cordons the node to prevent new pods being scheduled there, and schedules a drain of the node.
- Once the node has been drained the Cluster Autoscaler will consider it underutilised. It will be eligible for scale down (i.e. termination) by the Autoscaler after a configurable period of time.
Inspired by Kubernetes DNS, Kubernetes' cluster-internal DNS server, ExternalDNS makes Kubernetes resources discoverable via public DNS servers. Like KubeDNS, it retrieves a list of resources (Services, Ingresses, etc.) from the Kubernetes API to determine a desired list of DNS records. Unlike KubeDNS, however, it's not a DNS server itself, but merely configures other DNS providers accordingly—e.g. AWS Route 53 or Google Cloud DNS.
In a broader sense, ExternalDNS allows you to control DNS records dynamically via Kubernetes resources in a DNS provider-agnostic way.
The node-local-dns runs a dns caching agent on cluster nodes as a Daemonset. Normally, pods in ClusterFirst DNS mode reach out to a coredns serviceIP for DNS queries. This is translated to a core endpoint via iptables rules added by kube-proxy. With node-local-dns, pods will reach out to the dns caching agent running on the same node, thereby avoiding iptables DNAT rules and connection tracking. The local caching agent will query coredns for cache misses of cluster hostnames(cluster.local suffix by default).
-
With the current DNS architecture, it is possible that pods with the highest DNS QPS have to reach out to a different node, if there is no local kube-dns instance.
Having a local cache will help improve the latency in such scenarios. -
Skipping iptables DNAT and connection tracking will help reduce conntrack races and avoid UDP DNS entries filling up conntrack table.
-
Connections from local caching agent to kube-dns can be upgraded to TCP. TCP conntrack entries will be removed on connection close in contrast with UDP entries that have to timeout (default
nf_conntrack_udp_timeout
is 30 seconds) -
Upgrading DNS queries from UDP to TCP would reduce tail latency attributed to dropped UDP packets and DNS timeouts usually up to 30s (3 retries + 10s timeout). Since the nodelocal cache listens for UDP DNS queries, applications don't need to be changed.
-
Metrics & visibility into dns requests at a node level.
-
Neg caching can be re-enabled, thereby reducing number of queries to kube-dns.
From the operator documentation:
An Operator is an application-specific controller that extends the Kubernetes API to create, configure and manage instances of complex stateful applications on behalf of a Kubernetes user. It builds upon the basic Kubernetes resource and controller concepts, but also includes domain or application-specific knowledge to automate common tasks better managed by computers.
From Kubernetes official documentation, Kube-controller-manager
In applications of robotics and automation, a control loop is a non-terminating loop that regulates the state of the system. In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the API server and makes changes attempting to move the current state towards the desired state. Examples of controllers that ship with Kubernetes today are the replication controller, endpoints controller, namespace controller, and serviceaccounts controller.
An operator is a combination of custom resource types and the controllers that take care of the reconciliation process.