Skip to content

Instantly share code, notes, and snippets.

@ssro
Last active August 4, 2020 18:14
Show Gist options
  • Save ssro/bbc1bd361c9bb224a48217cc8eb9a564 to your computer and use it in GitHub Desktop.
Save ssro/bbc1bd361c9bb224a48217cc8eb9a564 to your computer and use it in GitHub Desktop.
Setup Kubernetes on AWS using KOPS

Kubernetes 1.18.x setup (KOPS 1.18.x) in AWS VPC with RBAC

Development

Production

Weave Networking

Prerequisites

Before continuing with the setup, you need to have kops, kubectl, awscli and jq installed on your system. Skip this part if you already have those.

MacOS

Install brew:

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Install kops, kubectl, awscli and jq:

$ brew install kops kubectl awscli jq

Linux

Install awscli:

$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

For more (up to date) information please check AWS Documentation

NOTE: some distros have awscli in their repositories. Use the package manager for installation if that's more convenient

Install kops:

curl -LO https://github.com/kubernetes/kops/releases/download/$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)/kops-linux-amd64
chmod +x kops-linux-amd64
sudo mv kops-linux-amd64 /usr/local/bin/kops

Install kubectl:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x kubectl && sudo mv kubectl /usr/local/bin/

Install jq:

Get it from here: https://stedolan.github.io/jq/download/

NOTE: as an optional step, you can install terraform if you want to deploy kubernetes with terraform.

NOTE: you need to have access to amazonaws console or programatic access for new user creation.

Development Cluster

Kubernetes Setup

In this example I'm using ap-southeast-1 as AWS region. Change it accordingly along with VPC_ID. Also in this example there are 3 pre-created subnets, kubernetes_public_subnet_1a, kubernetes_public_subnet_1b, kubernetes_public_subnet_1c

Create user, group and policy for user kops (can be any user, but let's stick to intelligible things):

$ aws iam create-group --group-name kops
$ aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess --group-name kops
$ aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonRoute53FullAccess --group-name kops
$ aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --group-name kops
$ aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/IAMFullAccess --group-name kops
$ aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonVPCFullAccess --group-name kops
$ aws iam create-user --user-name kops
$ aws iam add-user-to-group --user-name kops --group-name kops
$ aws iam create-access-key --user-name kops

Last command will generate access keys for kops which we will use to configure awscli and input your credentials and region

$ aws configure --profile kops

Create an s3 bucket for kops state store and enable versioning on it:

NOTE: for non us-east-1 regions - see http://docs.aws.amazon.com/cli/latest/reference/s3api/create-bucket.html

$ aws s3api create-bucket \
  --bucket my-k8s-bucket \
  --create-bucket-configuration LocationConstraint=ap-southeast-1 \ # this is needed for non us-east-1 regions
  --region ap-southeast-1

$ aws s3api put-bucket-versioning \
	--bucket my-k8s-bucket \
	--versioning-configuration Status=Enabled

Create an SSH key for kops to use:

$ ssh-keygen -t rsa -b 4096 -C kops

Launching kubernetes on Container Linux (FLATCAR)

NOTE: If --image argument is not specified, the default OS will be Debian. The --image argument is optional

Export vars to the OS

# Add the corresponding names of your subnets `Name=tag:Name,Values=subnet_name`. Export as many as needed
export A="kubernetes_public_subnet_1a"
export AWS_ACCESS_KEY_ID="EXAMPLEACCESSKEY"
export AWS_SECRET_ACCESS_KEY="EXAMPLESECRETKEY"
export AWS_REGION="ap-southeast-1"
export NAME="kubernetes.example.com"
export KOPS_STATE_STORE="s3://my-k8s-bucket"
export AWS_PROFILE="kops"
export VPC_ID="vpc-12345678"
export NODE_SIZE=${NODE_SIZE:-t2.medium}
export NODE_VOL_SIZE="${NODE_VOL_SIZE:-60}"
export MASTER_SIZE=${MASTER_SIZE:-t2.medium}
export MASTER_VOL_SIZE="${MASTER_VOL_SIZE:-60}"
export ZONES=${ZONES:-"${AWS_REGION}a"}
export K8S_VER="1.18.6"

# Get the ID of subnets. This is development, we deploy into one zone
export SUBNET_A=$(aws ec2 describe-subnets --region=$AWS_REGION --filters "Name=vpc-id,Values=$VPC_ID,Name=tag:Name,Values=$A" | jq -r ".Subnets[].SubnetId")

# Get the CIDR of the $VPC_ID
export CIDR=$(aws ec2 describe-vpcs --region=$AWS_REGION --vpc-ids $VPC_ID | jq -r ".Vpcs[].CidrBlock")

# Get the latest stable FlatCar image
export FLATCAR=$(aws ec2 describe-images --region=$AWS_REGION --owners 075585003325 --filters "Name=name,Values=Flatcar-stable-*-hvm" --query "sort_by(Images,&CreationDate)[-1].{id:ImageLocation}" | jq -r ".id")

Proceed with cluster creation:

kops create cluster ${NAME} \
 --node-count 1 \
 --zones $ZONES \
 --node-size $NODE_SIZE \
 --node-volume-size="${NODE_VOL_SIZE}" \
 --master-size $MASTER_SIZE \
 --master-volume-size="${MASTER_VOL_SIZE}" \
 --networking="cni" \
 --ssh-public-key ~/.ssh/kops.pub \
 --vpc=${VPC_ID} \
 --state=${KOPS_STATE_STORE} \
 --authorization RBAC \
 --api-loadbalancer-type public \
 --kubernetes-version "${K8S_VER}" \
 --image "${FLATCAR}"

Create a template of the cluster (feel free to inspect the kops cluster file by doing kops get $NAME -o yaml > $(pwd)/$NAME-orig.yaml and compare it with the one we will create).

Please note the route53 hosted zone. Replace arn:aws:route53:::hostedzone/ABCDEFGHIJKLM with the one you have in your $(pwd)/$NAME-orig.yaml. These rules are for cluster-autoscaler addon and for appscode/voyager ingress

vi $(pwd)/$NAME.yaml

Add the following:

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: {{NAME}}
spec:
  docker:
    skipInstall: true
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: {{KOPS_STATE_STORE}}{{NAME}}
  # This will disable locksmithd on FLATCAR, so it won't reboot machines on OS updates
  updatePolicy: external
  additionalPolicies:
  # This is used for cluster autoscaler to shrink or expand the cluster as an additional policy.
  # Also this is used for voyager operator when deployed to masters (--run-on-master flag), to modify Route 53 zones when requesting SSL certificates (Let's Encrypt). If voyager is run on nodes, then include the route 53 policy in a separate node policy
    master: |
      [
          {
              "Effect": "Allow",
              "Action": [
                  "autoscaling:DescribeAutoScalingGroups",
                  "autoscaling:DescribeAutoScalingInstances",
                  "autoscaling:DescribeTags",
                  "autoscaling:SetDesiredCapacity",
                  "autoscaling:TerminateInstanceInAutoScalingGroup"
              ],
              "Resource": "*"
          },
          {
              "Effect": "Allow",
              "Action": [
                  "route53:GetChange",
                  "route53:ListHostedZonesByName"
              ],
              "Resource": [
                  "*"
              ]
          },
          {
              "Effect": "Allow",
              "Action": [
                  "route53:ChangeResourceRecordSets"
              ],
              "Resource": [
                  "arn:aws:route53:::hostedzone/ABCDEFGHIJKLM"
              ]
          }
      ]
  etcdClusters:
  - enableEtcdTLS: true
    enableTLSAuth: true
    etcdMembers:
    - instanceGroup: master-{{ZONES}}
      name: a
    manager: {}
    name: main
  - enableEtcdTLS: true
    enableTLSAuth: true
    etcdMembers:
    - instanceGroup: master-{{ZONES}}
      name: a
    manager: {}
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    allowPrivileged: true
    anonymousAuth: false
    apiAudiences:
    - api
    - istio-ca
    apiServerCount: 3
    auditLogMaxAge: 10
    auditLogMaxBackups: 5
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditPolicyFile: /srv/kubernetes/audit.yaml
    authorizationMode: RBAC
    bindAddress: 0.0.0.0
    cloudProvider: aws
    enableAdmissionPlugins:
    - NamespaceLifecycle
    - LimitRanger
    - ServiceAccount
    - PersistentVolumeLabel
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - MutatingAdmissionWebhook
    - ValidatingAdmissionWebhook
    - NodeRestriction
    - ResourceQuota
    - PodPreset
    runtimeConfig:
      settings.k8s.io/v1alpha1: "true"
    serviceAccountIssuer: kubernetes.default.svc
    serviceAccountKeyFile:
    - /srv/kubernetes/server.key
    serviceAccountSigningKeyFile: /srv/kubernetes/server.key
    # FLATCAR Dex settings
    # oidcIssuerURL: https://dex.example.com/dex
    # oidcClientID: kubernetes
    # oidcUsernameClaim: email
    # oidcUsernamePrefix: "oidc:"
    # oidcGroupsClaim: groups
    # oidcGroupsPrefix: "oidc:"
    # oidcCAFile: /srv/kubernetes/ca.crt
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    cpuCFSQuotaPeriod: 5ms
    evictionHard: imagefs.available<15%,memory.available<500Mi,nodefs.available<15%,nodefs.inodesFree<15%
    kubeReserved:
      cpu: 250m
      memory: 500Mi
    systemReserved:
      cpu: 250m
      memory: 500Mi
  kubeDNS:
    provider: CoreDNS
    nodeLocalDNS:
      enabled: true
  kubeProxy:
    ipvsScheduler: lc
    proxyMode: ipvs
  kubernetesApiAccess:
  - 10.0.0.0/8
  - 172.31.0.0/16
  kubernetesVersion: {{K8S_VER}}
  masterPublicName: api.{{NAME}}
  networkCIDR: {{CIDR}}
  networkID: {{VPC_ID}}
  networking:
    cni: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 10.0.0.0/8
  - 172.31.0.0/16
  subnets:
  - id: {{SUBNET_A}}
    name: {{ZONES}}
    type: Public
    zone: {{ZONES}}
  topology:
    dns:
      type: Public
    masters: public
    nodes: public
  fileAssets:
  - name: audit
    # Note if not path is specificied the default path it /srv/kubernetes/assets/<name>
    path: /srv/kubernetes/audit.yaml
    roles: [Master] # a list of roles to apply the asset to, zero defaults to all
    content: |
      apiVersion: audit.k8s.io/v1beta1
      kind: Policy
      rules:
        - level: None
          resources:
            - group: ""
              resources:
                - endpoints
                - services
                - services/status
          users:
            - 'system:kube-proxy'
          verbs:
            - watch
        - level: None
          resources:
            - group: ""
              resources:
                - nodes
                - nodes/status
          userGroups:
            - 'system:nodes'
          verbs:
            - get
        - level: None
          namespaces:
            - kube-system
          resources:
            - group: ""
              resources:
                - endpoints
          users:
            - 'system:kube-controller-manager'
            - 'system:kube-scheduler'
            - 'system:serviceaccount:kube-system:endpoint-controller'
          verbs:
            - get
            - update
        - level: None
          resources:
            - group: ""
              resources:
                - namespaces
                - namespaces/status
                - namespaces/finalize
          users:
            - 'system:apiserver'
          verbs:
            - get
        - level: None
          resources:
            - group: metrics.k8s.io
          users:
            - 'system:kube-controller-manager'
          verbs:
            - get
            - list
        - level: None
          nonResourceURLs:
            - '/healthz*'
            - /version
            - '/swagger*'
        - level: None
          resources:
            - group: ""
              resources:
                - events
        - level: Request
          omitStages:
            - RequestReceived
          resources:
            - group: ""
              resources:
                - nodes/status
                - pods/status
          users:
            - kubelet
            - 'system:node-problem-detector'
            - 'system:serviceaccount:kube-system:node-problem-detector'
          verbs:
            - update
            - patch
        - level: Request
          omitStages:
            - RequestReceived
          resources:
            - group: ""
              resources:
                - nodes/status
                - pods/status
          userGroups:
            - 'system:nodes'
          verbs:
            - update
            - patch
        - level: Request
          omitStages:
            - RequestReceived
          users:
            - 'system:serviceaccount:kube-system:namespace-controller'
          verbs:
            - deletecollection
        - level: Metadata
          omitStages:
            - RequestReceived
          resources:
            - group: ""
              resources:
                - secrets
                - configmaps
            - group: authentication.k8s.io
              resources:
                - tokenreviews
        - level: Request
          omitStages:
            - RequestReceived
          resources:
            - group: ""
            - group: admissionregistration.k8s.io
            - group: apiextensions.k8s.io
            - group: apiregistration.k8s.io
            - group: apps
            - group: authentication.k8s.io
            - group: authorization.k8s.io
            - group: autoscaling
            - group: batch
            - group: certificates.k8s.io
            - group: extensions
            - group: metrics.k8s.io
            - group: networking.k8s.io
            - group: policy
            - group: rbac.authorization.k8s.io
            - group: scheduling.k8s.io
            - group: settings.k8s.io
            - group: storage.k8s.io
          verbs:
            - get
            - list
            - watch
        - level: RequestResponse
          omitStages:
            - RequestReceived
          resources:
            - group: ""
            - group: admissionregistration.k8s.io
            - group: apiextensions.k8s.io
            - group: apiregistration.k8s.io
            - group: apps
            - group: authentication.k8s.io
            - group: authorization.k8s.io
            - group: autoscaling
            - group: batch
            - group: certificates.k8s.io
            - group: extensions
            - group: metrics.k8s.io
            - group: networking.k8s.io
            - group: policy
            - group: rbac.authorization.k8s.io
            - group: scheduling.k8s.io
            - group: settings.k8s.io
            - group: storage.k8s.io
        - level: Metadata
          omitStages:
            - RequestReceived

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: {{NAME}}
  name: master-{{ZONES}}
spec:
  image: {{FLATCAR}}
  machineType: {{MASTER_SIZE}}
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-{{ZONES}}
  role: Master
  rootVolumeSize: {{MASTER_VOL_SIZE}}
  subnets:
  - {{ZONES}}

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: {{NAME}}
  name: nodes
spec:
  image: {{FLATCAR}}
  machineType: {{NODE_SIZE}}
  maxSize: 5
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  rootVolumeSize: {{NODE_VOL_SIZE}}
  subnets:
  - {{ZONES}}

Replace vars:

# MacOS sed
sed -i '' -e "s@{{AWS_REGION}}@$AWS_REGION@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{NAME}}@$NAME@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{KOPS_STATE_STORE}}@$KOPS_STATE_STORE@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{ZONES}}@$ZONES@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{VPC_ID}}@$VPC_ID@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{FLATCAR}}@$FLATCAR@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{MASTER_SIZE}}@$MASTER_SIZE@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{NODE_SIZE}}@$NODE_SIZE@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{K8S_VER}}@$K8S_VER@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{CIDR}}@$CIDR@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{SUBNET_A}}@$SUBNET_A@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{MASTER_VOL_SIZE}}@$MASTER_VOL_SIZE@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{NODE_VOL_SIZE}}@$NODE_VOL_SIZE@g" $(pwd)/$NAME.yaml

# GNU sed (Linux)
sed -i "s@{{AWS_REGION}}@$AWS_REGION@g" $(pwd)/$NAME.yaml
sed -i "s@{{NAME}}@$NAME@g" $(pwd)/$NAME.yaml
sed -i "s@{{KOPS_STATE_STORE}}@$KOPS_STATE_STORE@g" $(pwd)/$NAME.yaml
sed -i "s@{{ZONES}}@$ZONES@g" $(pwd)/$NAME.yaml
sed -i "s@{{VPC_ID}}@$VPC_ID@g" $(pwd)/$NAME.yaml
sed -i "s@{{FLATCAR}}@$FLATCAR@g" $(pwd)/$NAME.yaml
sed -i "s@{{MASTER_SIZE}}@$MASTER_SIZE@g" $(pwd)/$NAME.yaml
sed -i "s@{{NODE_SIZE}}@$NODE_SIZE@g" $(pwd)/$NAME.yaml
sed -i "s@{{K8S_VER}}@$K8S_VER@g" $(pwd)/$NAME.yaml
sed -i "s@{{CIDR}}@$CIDR@g" $(pwd)/$NAME.yaml
sed -i "s@{{SUBNET_A}}@$SUBNET_A@g" $(pwd)/$NAME.yaml
sed -i "s@{{MASTER_VOL_SIZE}}@$MASTER_VOL_SIZE@g" $(pwd)/$NAME.yaml
sed -i "s@{{NODE_VOL_SIZE}}@$NODE_VOL_SIZE@g" $(pwd)/$NAME.yaml

Replace whole cluster file on the S3 bucket

kops replace -f $(pwd)/$NAME.yaml

Update cluster configuration:

kops update cluster $NAME

# For terraform
kops update cluster ${NAME} --target=terraform [--out=/path/to/folder] # defaults to $(pwd)/out/terraform`

Create cluster:

kops update cluster $NAME --yes

# Terraform
cd /path/to/folder
terraform init
terraform plan # inspect what will be created/modified
terraform apply # apply the changes

Wait for a few minutes for everything to be up & running, then deploy the overlay network (weave in this case):

$ kubectl apply -f weave-ds.yaml # if you have a custom yaml

or directly from weave:

$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

After this is applied, check the cluster:

$ kubectl get nodes

NAME                        STATUS    ROLES     AGE       VERSION
ip-10-4-1-229.example.com   Ready     node      12h       v1.18.6
ip-10-4-4-27.example.com    Ready     master    12h       v1.18.6

Validate your cluster with kops

After your nodes are up and running, validate the cluster

$ kops validate cluster ${NAME}

Delete the cluster

If you deployed your cluster with terraform run:

cd /path/to/folder
terraform plan -destroy # review what will be destroyed/terminated
terraform destroy 	# destroy/terminate the resources

If you deployed your cluster with kops then execute the following:

export AWS_ACCESS_KEY_ID=EXAMPLEACCESSKEY
export AWS_SECRET_ACCESS_KEY=EXAMPLESECRETKEY
export NAME=kubernetes.example.com
export KOPS_STATE_STORE=s3://my-k8s-bucket
export AWS_PROFILE=kops

$ kops delete cluster ${NAME} # preview what will be terminated
$ kops delete cluster ${NAME} --yes # destroy cluster

NOTE: Destroying resources with terraform will NOT clean ~/.kube/config and you will still have references to the deleted cluster in the kube config file. In this case, after destroying with terraform, please run the above kops section.


Production Cluster

Assuming that you already have a development cluster, we continue from exporting the variables to the cluster:

# Add the corresponding names of your subnets `Name=tag:Name,Values=subnet_name`. Export as many as needed
export A="kubernetes_public_subnet_1a"
export B="kubernetes_public_subnet_1b"
export C="kubernetes_public_subnet_1c"
export AWS_ACCESS_KEY_ID="EXAMPLEACCESSKEY"
export AWS_SECRET_ACCESS_KEY="EXAMPLESECRETKEY"
export AWS_REGION="ap-southeast-1"
export NAME="k8sproduction.example.com"
export KOPS_STATE_STORE="s3://my-k8s-bucket"
export AWS_PROFILE="kops"
export VPC_ID="vpc-12345678"
export ZONES="${AWS_REGION}a","${AWS_REGION}b","${AWS_REGION}c"
export NODE_SIZE="${NODE_SIZE:-m5.large}"
export NODE_VOL_SIZE="${NODE_VOL_SIZE:-120}"
export MASTER_SIZE="${MASTER_SIZE:-m5.large}"
export MASTER_VOL_SIZE="${MASTER_VOL_SIZE:-80}"
export NODE_ZONES=${ZONES:-"${AWS_REGION}a","${AWS_REGION}b","${AWS_REGION}c"}
export MASTER_ZONES=${ZONES:-"${AWS_REGION}a","${AWS_REGION}b","${AWS_REGION}c"}
export K8S_VER="1.18.6"

# Get the ID of subnets
export SUBNET_A=$(aws ec2 describe-subnets --region=${AWS_REGION} --filters "Name=vpc-id,Values=$VPC_ID,Name=tag:Name,Values=$A" | jq -r ".Subnets[].SubnetId")
export SUBNET_B=$(aws ec2 describe-subnets --region=${AWS_REGION} --filters "Name=vpc-id,Values=$VPC_ID,Name=tag:Name,Values=$B" | jq -r ".Subnets[].SubnetId")
export SUBNET_C=$(aws ec2 describe-subnets --region=${AWS_REGION} --filters "Name=vpc-id,Values=$VPC_ID,Name=tag:Name,Values=$C" | jq -r ".Subnets[].SubnetId")

# Get the CIDR of the $VPC_ID
export CIDR=$(aws ec2 describe-vpcs --region=${AWS_REGION} --vpc-ids $VPC_ID | jq -r ".Vpcs[].CidrBlock")
export FLATCAR=$(aws ec2 describe-images --region=${AWS_REGION} --owners 075585003325 --filters "Name=name,Values=Flatcar-stable-*-hvm" --query "sort_by(Images,&CreationDate)[-1].{id:ImageLocation}" | jq -r ".id")

Proceed with the cluster creation

kops create cluster ${NAME} \
  --node-count=1 \
  --zones="${NODE_ZONES}" \
  --node-size="${NODE_SIZE}" \
  --node-volume-size="${NODE_VOL_SIZE}" \
  --master-size="${MASTER_SIZE}" \
  --master-volume-size="${MASTER_VOL_SIZE}" \
  --master-zones="${MASTER_ZONES}" \
  --networking="cni" \
  --ssh-public-key="~/.ssh/kops.pub" \
  --vpc="$VPC_ID" \
  --state="${KOPS_STATE_STORE}" \
  --api-loadbalancer-type public \
  --kubernetes-version "${K8S_VER}" \
  --encrypt-etcd-storage \
  --authorization RBAC \
  --image "${FLATCAR}"

Create a template of the cluster (feel free to inspect the kops cluster file by doing kops get $NAME -o yaml > $(pwd)/$NAME-orig.yaml and compare it with the one we will create).

Please note the route53 hosted zone. Replace arn:aws:route53:::hostedzone/ABCDEFGHIJKLM with the one you have in your $(pwd)/$NAME-orig.yaml. These rules are for cluster-autoscaler addon and for appscode/voyager ingress

vi $(pwd)/$NAME.yaml

Add the following:

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: {{NAME}}
spec:
  docker:
    skipInstall: true
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: {{KOPS_STATE_STORE}}{{NAME}}
  # This will disable locksmithd on FLATCAR, so it won't reboot machines on OS updates
  updatePolicy: external
  additionalPolicies:
  # This is used for cluster autoscaler to shrink or expand the cluster as an additional policy.
  # Also this is used for voyager operator when deployed to masters (--run-on-master flag), to modify Route 53 zones when requesting SSL certificates (Let's Encrypt). If voyager is run on nodes, then include the route 53 policy in a separate node policy
    master: |
      [
          {
              "Effect": "Allow",
              "Action": [
                  "autoscaling:DescribeAutoScalingGroups",
                  "autoscaling:DescribeAutoScalingInstances",
                  "autoscaling:DescribeTags",
                  "autoscaling:SetDesiredCapacity",
                  "autoscaling:TerminateInstanceInAutoScalingGroup"
              ],
              "Resource": "*"
          },
          {
              "Effect": "Allow",
              "Action": [
                  "route53:GetChange",
                  "route53:ListHostedZonesByName"
              ],
              "Resource": [
                  "*"
              ]
          },
          {
              "Effect": "Allow",
              "Action": [
                  "route53:ChangeResourceRecordSets"
              ],
              "Resource": [
                  "arn:aws:route53:::hostedzone/ABCDEFGHIJKLM"
              ]
          }
      ]
  kubeAPIServer:
    allowPrivileged: true
    anonymousAuth: false
    apiAudiences:
    - api
    - istio-ca
    apiServerCount: 3
    auditLogMaxAge: 10
    auditLogMaxBackups: 5
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditPolicyFile: /srv/kubernetes/audit.yaml
    authorizationMode: RBAC
    bindAddress: 0.0.0.0
    cloudProvider: aws
    enableAdmissionPlugins:
    - NamespaceLifecycle
    - LimitRanger
    - ServiceAccount
    - PersistentVolumeLabel
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - MutatingAdmissionWebhook
    - ValidatingAdmissionWebhook
    - NodeRestriction
    - ResourceQuota
    - PodPreset
    runtimeConfig:
      settings.k8s.io/v1alpha1: "true"
    serviceAccountIssuer: kubernetes.default.svc
    serviceAccountKeyFile:
    - /srv/kubernetes/server.key
    serviceAccountSigningKeyFile: /srv/kubernetes/server.key
    # auditLogFormat: "legacy"
    # FLATCAR Dex settings
    # oidcIssuerURL: https://dex.example.com/dex
    # oidcClientID: kubernetes
    # oidcUsernameClaim: email
    # oidcUsernamePrefix: "oidc:"
    # oidcGroupsClaim: groups
    # oidcGroupsPrefix: "oidc:"
    # oidcCAFile: /srv/kubernetes/ca.crt
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    cpuCFSQuotaPeriod: 5ms
    evictionHard: imagefs.available<15%,memory.available<500Mi,nodefs.available<15%,nodefs.inodesFree<15%
    kubeReserved:
      cpu: 250m
      memory: 500Mi
    systemReserved:
      cpu: 250m
      memory: 500Mi
  kubeDNS:
    provider: CoreDNS
    nodeLocalDNS:
      enabled: true
  kubeProxy:
    ipvsScheduler: lc
    proxyMode: ipvs
  fileAssets:
  - name: audit
    # Note if not path is specificied the default path it /srv/kubernetes/assets/<name>
    path: /srv/kubernetes/audit.yaml
    roles: [Master] # a list of roles to apply the asset to, zero defaults to all
    content: |
      apiVersion: audit.k8s.io/v1beta1
      kind: Policy
      rules:
        - level: None
          resources:
            - group: ""
              resources:
                - endpoints
                - services
                - services/status
          users:
            - 'system:kube-proxy'
          verbs:
            - watch
        - level: None
          resources:
            - group: ""
              resources:
                - nodes
                - nodes/status
          userGroups:
            - 'system:nodes'
          verbs:
            - get
        - level: None
          namespaces:
            - kube-system
          resources:
            - group: ""
              resources:
                - endpoints
          users:
            - 'system:kube-controller-manager'
            - 'system:kube-scheduler'
            - 'system:serviceaccount:kube-system:endpoint-controller'
          verbs:
            - get
            - update
        - level: None
          resources:
            - group: ""
              resources:
                - namespaces
                - namespaces/status
                - namespaces/finalize
          users:
            - 'system:apiserver'
          verbs:
            - get
        - level: None
          resources:
            - group: metrics.k8s.io
          users:
            - 'system:kube-controller-manager'
          verbs:
            - get
            - list
        - level: None
          nonResourceURLs:
            - '/healthz*'
            - /version
            - '/swagger*'
        - level: None
          resources:
            - group: ""
              resources:
                - events
        - level: Metadata
          omitStages:
            - RequestReceived
  etcdClusters:
  - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-{{AWS_REGION}}a
      name: a
      # volumeSize: 40
    - encryptedVolume: true
      instanceGroup: master-{{AWS_REGION}}b
      name: b
      # volumeSize: 40
    - encryptedVolume: true
      instanceGroup: master-{{AWS_REGION}}c
      name: c
      # volumeSize: 40
    enableEtcdTLS: true
    enableTLSAuth: true
    name: main
  - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-{{AWS_REGION}}a
      name: a
      # volumeSize: 40
    - encryptedVolume: true
      instanceGroup: master-{{AWS_REGION}}b
      name: b
      # volumeSize: 40
    - encryptedVolume: true
      instanceGroup: master-{{AWS_REGION}}c
      name: c
      # volumeSize: 40
    enableEtcdTLS: true
    enableTLSAuth: true
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - 10.0.0.0/8
  - 172.31.0.0/16
  kubernetesVersion: {{K8S_VER}}
  masterPublicName: api.{{NAME}}
  networkCIDR: {{CIDR}}
  networkID: {{VPC_ID}}
  networking:
    cni: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 10.0.0.0/8
  - 172.31.0.0/16
  subnets:
  - id: {{SUBNET_A}}
    name: {{AWS_REGION}}a
    type: Public
    zone: {{AWS_REGION}}a
  - id: {{SUBNET_B}}
    name: {{AWS_REGION}}b
    type: Public
    zone: {{AWS_REGION}}b
  - id: {{SUBNET_C}}
    name: {{AWS_REGION}}c
    type: Public
    zone: {{AWS_REGION}}c
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: {{NAME}}
  name: master-{{AWS_REGION}}a
spec:
  image: {{FLATCAR}}
  machineType: {{MASTER_SIZE}}
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-{{AWS_REGION}}a
  role: Master
  rootVolumeSize: {{MASTER_VOL_SIZE}}
  subnets:
  - {{AWS_REGION}}a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: {{NAME}}
  name: master-{{AWS_REGION}}b
spec:
  image: {{FLATCAR}}
  machineType: {{MASTER_SIZE}}
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-{{AWS_REGION}}b
  role: Master
  rootVolumeSize: {{MASTER_VOL_SIZE}}
  subnets:
  - {{AWS_REGION}}b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: {{NAME}}
  name: master-{{AWS_REGION}}c
spec:
  image: {{FLATCAR}}
  machineType: {{MASTER_SIZE}}
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-{{AWS_REGION}}c
  role: Master
  rootVolumeSize: {{MASTER_VOL_SIZE}}
  subnets:
  - {{AWS_REGION}}c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: {{NAME}}
  name: nodes
spec:
  image: {{FLATCAR}}
  machineType: {{NODE_SIZE}}
  maxSize: 15
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  rootVolumeSize: {{NODE_VOL_SIZE}}
  subnets:
  - {{AWS_REGION}}a
  - {{AWS_REGION}}b
  - {{AWS_REGION}}c

Replace vars:

# MacOS sed
sed -i '' -e "s@{{NAME}}@$NAME@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{KOPS_STATE_STORE}}@$KOPS_STATE_STORE@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{VPC_ID}}@$VPC_ID@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{FLATCAR}}@$FLATCAR@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{MASTER_SIZE}}@$MASTER_SIZE@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{NODE_SIZE}}@$NODE_SIZE@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{K8S_VER}}@$K8S_VER@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{CIDR}}@$CIDR@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{SUBNET_A}}@$SUBNET_A@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{SUBNET_B}}@$SUBNET_B@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{SUBNET_C}}@$SUBNET_C@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{AWS_REGION}}@$AWS_REGION@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{MASTER_VOL_SIZE}}@$MASTER_VOL_SIZE@g" $(pwd)/$NAME.yaml
sed -i '' -e "s@{{NODE_VOL_SIZE}}@$NODE_VOL_SIZE@g" $(pwd)/$NAME.yaml

# GNU sed (Linux)
sed -i "s@{{NAME}}@$NAME@g" $(pwd)/$NAME.yaml
sed -i "s@{{KOPS_STATE_STORE}}@$KOPS_STATE_STORE@g" $(pwd)/$NAME.yaml
sed -i "s@{{ZONES}}@$ZONES@g" $(pwd)/$NAME.yaml
sed -i "s@{{VPC_ID}}@$VPC_ID@g" $(pwd)/$NAME.yaml
sed -i "s@{{FLATCAR}}@$FLATCAR@g" $(pwd)/$NAME.yaml
sed -i "s@{{MASTER_SIZE}}@$MASTER_SIZE@g" $(pwd)/$NAME.yaml
sed -i "s@{{NODE_SIZE}}@$NODE_SIZE@g" $(pwd)/$NAME.yaml
sed -i "s@{{K8S_VER}}@$K8S_VER@g" $(pwd)/$NAME.yaml
sed -i "s@{{CIDR}}@$CIDR@g" $(pwd)/$NAME.yaml
sed -i "s@{{SUBNET_A}}@$SUBNET_A@g" $(pwd)/$NAME.yaml
sed -i "s@{{SUBNET_B}}@$SUBNET_B@g" $(pwd)/$NAME.yaml
sed -i "s@{{SUBNET_C}}@$SUBNET_C@g" $(pwd)/$NAME.yaml
sed -i "s@{{AWS_REGION}}@$AWS_REGION@g" $(pwd)/$NAME.yaml
sed -i "s@{{MASTER_VOL_SIZE}}@$MASTER_VOL_SIZE@g" $(pwd)/$NAME.yaml
sed -i "s@{{NODE_VOL_SIZE}}@$NODE_VOL_SIZE@g" $(pwd)/$NAME.yaml

Replace whole cluster file on the S3 bucket

kops replace -f $(pwd)/$NAME.yaml

Update cluster configuration:

kops update cluster $NAME

# For terraform
kops update cluster ${NAME} --target=terraform [--out=/path/to/folder] # defaults to $(pwd)/out/terraform`

Create cluster:

kops update cluster $NAME --yes

# Terraform
cd /path/to/folder
terraform init
terraform plan # inspect what will be created/modified
terraform apply # apply the changes

Install some goodies

Cluster autoscaler

(Please see here for updated documentation - https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md)

vi $(pwd)/cluster-autoscaler.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
  name: cluster-autoscaler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
  - apiGroups: [""]
    resources: ["events", "endpoints"]
    verbs: ["create", "patch"]
  - apiGroups: [""]
    resources: ["pods/eviction"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["pods/status"]
    verbs: ["update"]
  - apiGroups: [""]
    resources: ["endpoints"]
    resourceNames: ["cluster-autoscaler"]
    verbs: ["get", "update"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["watch", "list", "get", "update"]
  - apiGroups: [""]
    resources:
      - "pods"
      - "services"
      - "replicationcontrollers"
      - "persistentvolumeclaims"
      - "persistentvolumes"
    verbs: ["watch", "list", "get"]
  - apiGroups: ["extensions"]
    resources: ["replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["policy"]
    resources: ["poddisruptionbudgets"]
    verbs: ["watch", "list"]
  - apiGroups: ["apps"]
    resources: ["statefulsets", "replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["batch", "extensions"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch", "patch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["create","list","watch"]
  - apiGroups: [""]
    resources: ["configmaps"]
    resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]
    verbs: ["delete", "get", "update", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      nodeSelector:
        kubernetes.io/role: master
      containers:
        - image: {{IMAGE}}
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 300m
              memory: 600Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider={{CLOUD_PROVIDER}}
            - --skip-nodes-with-local-storage=false
            - --expander=random
            - --balance-similar-node-groups=true
            - --nodes={{MIN_NODES}}:{{MAX_NODES}}:{{GROUP_NAME}}
          env:
            - name: AWS_REGION
              value: {{AWS_REGION}}
          volumeMounts:
            - name: ssl-certs
              mountPath: {{SSL_CERT_PATH}}
              readOnly: true
          imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: {{SSL_CERT_PATH}}
CLOUD_PROVIDER=aws
IMAGE="gcr.io/google-containers/cluster-autoscaler:v1.18.2" # we have k8s v1.18
MIN_NODES=1
MAX_NODES=15
AWS_REGION=ap-southeast-1
GROUP_NAME="nodes.$NAME"
SSL_CERT_PATH="/etc/ssl/certs/ca-certificates.crt" # (/etc/ssl/certs for gce)

Replace vars:

# MacOS sed
sed -i '' -e "s@{{CLOUD_PROVIDER}}@$CLOUD_PROVIDER@g" $(pwd)/cluster-autoscaler.yaml
sed -i '' -e "s@{{IMAGE}}@$IMAGE@g" $(pwd)/cluster-autoscaler.yaml
sed -i '' -e "s@{{MIN_NODES}}@$MIN_NODES@g" $(pwd)/cluster-autoscaler.yaml
sed -i '' -e "s@{{MAX_NODES}}@$MAX_NODES@g" $(pwd)/cluster-autoscaler.yaml
sed -i '' -e "s@{{AWS_REGION}}@$AWS_REGION@g" $(pwd)/cluster-autoscaler.yaml
sed -i '' -e "s@{{GROUP_NAME}}@$GROUP_NAME@g" $(pwd)/cluster-autoscaler.yaml
sed -i '' -e "s@{{SSL_CERT_PATH}}@$SSL_CERT_PATH@g" $(pwd)/cluster-autoscaler.yaml

# GNU sed (Linux)
sed -i "s@{{CLOUD_PROVIDER}}@$CLOUD_PROVIDER@g" $(pwd)/cluster-autoscaler.yaml
sed -i "s@{{IMAGE}}@$IMAGE@g" $(pwd)/cluster-autoscaler.yaml
sed -i "s@{{MIN_NODES}}@$MIN_NODES@g" $(pwd)/cluster-autoscaler.yaml
sed -i "s@{{MAX_NODES}}@$MAX_NODES@g" $(pwd)/cluster-autoscaler.yaml
sed -i "s@{{AWS_REGION}}@$AWS_REGION@g" $(pwd)/cluster-autoscaler.yaml
sed -i "s@{{GROUP_NAME}}@$GROUP_NAME@g" $(pwd)/cluster-autoscaler.yaml
sed -i "s@{{SSL_CERT_PATH}}@$SSL_CERT_PATH@g" $(pwd)/cluster-autoscaler.yaml
kubectl apply -f $(pwd)/cluster-autoscaler.yaml

NOTE: for horizontal pod autoscaler (hpa) to work in k8s > 1.9 you need to deploy metrics-server

Sealed secrets

Deploy sealed-secrets-controller onto the cluster. Please see https://github.com/bitnami-labs/sealed-secrets. The documentation is very clear and explains in detail how to create the sealed secrets.

Weave Networking

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.WEAVE_MTU=8912&env.IPALLOC_RANGE=100.96.0.0/11"

Make use of CoreDNS autopath feature

Edit the configmap of coredns

kubectl edit cm/coredns -n kube-system

Replace pods insecure with pods verified and add autopath @kubernetes on the same level as cache line.

Kubernetes on spot instances

Firstly let's create an instance group that will be using spot machines

Export all relevant data for kops if you haven't done so for your specific cluster:

export AWS_ACCESS_KEY_ID="EXAMPLEACCESSKEY"
export AWS_SECRET_ACCESS_KEY="EXAMPLESECRETACCESSKEY"
export AWS_REGION="ap-southeast-1"
export NAME="kubernetes.example.com"
export KOPS_STATE_STORE="s3://my-k8s-bucket"
export AWS_PROFILE="kops"

Create the instance group

kops create instancegroup nodes-spot

You will be presented with a file to edit which after being edited will look like so:

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: cluster.example.com
  name: nodes-spot
spec:
  image: $FLATCAR
  machineType: m5.large # The type of machine
  maxPrice: "0.12" # I've added the max price of ondemand value
  maxSize: 6
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-spot
    node-role.kubernetes.io/spot-worker: "true" # Added one more annotation which helps k8s-spot-rescheduler
  role: Node
  # In case nodes are running in more than one AZ suspend AWS AZRebalance. See https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#suspending-scaling-processes-on-aws-autoscaling-groups
  suspendProcesses:
  - AZRebalance
  subnets:
  - ap-southeast-1c

Save the file and let's update our existing worker nodes. We need to ad a taint, which will "try" and prevent pod scheduling on those nodes but not forbid it. The instance group is called nodes. Yours might vary. Please check them out by running kops get ig

kops edit ig nodes

The final config should look like this:

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: cluster.example.com
  name: nodes
spec:
  image: $FLATCAR
  machineType: m5.large
  maxSize: 5
  minSize: 0
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
    node-role.kubernetes.io/worker: "true" # Added one more annotation which helps k8s-spot-rescheduler
  role: Node
  rootVolumeSize: 80
  subnets:
  - ap-southeast-1c
  taints:
  - node-role.kubernetes.io/worker=true:PreferNoSchedule # Added the taint. The important bit is PreferNoSchedule

Save the file and let's update our cluster.

kops update cluster $NAME --yes
kops rolling-update cluster --yes

It will take some time, depending on the number of the machines running to be rolled over

After the rolling update is finished and cluster is validated, we need to clone k8s-spot-rescheduler

git clone https://github.com/pusher/k8s-spot-rescheduler.git

Modify deploy/deployment.yaml and uncomment the line #serviceAccountName: k8s-spot-rescheduler (if you use RBAC, which I hope you do), and use the latest docker image (quay.io/pusher/k8s-spot-rescheduler:v0.2.0)

Then launch everything in that folder to your cluster:

kubectl apply -f deploy/

This deployment will reschedule all your pods on the spot instances based on the node labels which I've mentioned that it will help k8s-spot-rescheduler to do its magic.

The on demand instances will scale down to the minimum allowed by the minSize of cluster autoscaler (which should be the same as your kops instance group configuration)

The spot instances will scale up, based on your workload.

If your spot instances terminate (price goes over maxPrice), all pods will be rescheduled to ondemand instances.

As best practices, have some ondemand instances up and ready to mitigate too much downtime and/or create at least one more spot instance group with similar specs (eg. m4.large and m5.large)

Spot pros:

Big $$$ savings

Spot cons:

Besides the well known ones, there are cases when the request for spot instances fails - this is due mostly because AWS has no more capacity (capacity oversubscribed) for that particular spot instance type in that AZ or region. Mitigation of this is to create more similar spot nodes instance groups (as mentioned above).


Refs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment