pwillis-els/AdminBestPractice.md

## AdminBestPractice.md

      
    Raw
  

              AdminBestPractice.md
            
          
    Kubernetes Administration Best Practice

Reliability

Networking Reliability


AWS VPC Container Network Interface (CNI) for Kubernetes has an inherent limit on number of pods per instance, due to using one ENI per pod per instance. The workaround is to use Calico CNI. The Calico CNI can be deployed in EKS to run alongside the VPC CNI, providing Kubernetes Network Policies support.

Manifest Reliability


Lint your K8s manifests as part of a CI process.

https://www.kubeval.com/
https://github.com/yannh/kubeconform


Use Hierarchical Namespaces


Uses the Hierarchical Namespace Controller (HNC)
Allows cascading of policies, better expressing ownership, admin delegation
Adds new capabilities on top of traditional namespaces

Delegate subnamespace creation without cluster privileges
Cascading policies, secrets, configmaps
Trusted labels for policies


Security

Securing Authentication


Use an admission controller as an extra layer of authZ validation.
Make sure that users are assigned proper roles. Assign namespace-specific permissions instead of cluster-wide privileges to minimize exposure.

Securing Kubernetes Core Components


AKS allows setting a cluster as either public or private. It may be necessary to build your AKS/EKS cluster to be “private” so that it is not accessible by routable IPs (by default).
Network access to the Kubernetes components API server (control plane), etcd (persistent storage database), kubelet agent, kubedns must use an encrypted connection. Ensure that Kubernetes clusters have end-to-end TLS enabled.
Lock down all unsecured API server ports. Use a bastion host, configure  VPN, or use an internal network to access the nodes and other infrastructure resources. 
Limit exposure of Kubernetes Dashboard. Disable public access via the internet.  Ensure the Dashboard Service Account is not open and accessible to users. Configure the login page and enable RBAC.

Securing Images


Use private container registries and tagged container images, keeping tagged images immutable
Continuously scan Docker container images that run your application for CVE using scanning tools.

https://github.com/anchore/anchore-engine


Update them using Kubernetes rolling updates when vulnerabilities are found.

Securing Deployments


Scan a K8s manifest for potential security problems before deployments.

https://github.com/zegl/kube-score


Kubernetes can be configured to send all deployment requests to a dynamic admission controller for approval before they go to the Kubernetes scheduler, which is responsible for placing pods on nodes. If the deployment fails the controller’s configured tests, the pod won’t get run at all.

https://github.com/anchore/kubernetes-admission-controller
https://kubernetes.io/blog/2019/03/21/a-guide-to-kubernetes-admission-controllers/


Container runtimes start by adding a securityContext to every deployment that allocates a non-root user and disables privilege escalation. You can enforce these limitations in your cluster by using PodSecurityPolicies.
Make sure you give each application in your cluster its own Kubernetes service account.

Securing Pods

Network Access


By default, all pods can communicate with one another. Every Pod gets its own IP address. This means you do not need to explicitly create links between Pods and you almost never need to deal with mapping container ports to host ports. To lock down access you can either use a Network Policy or use Service Mesh. The entities that a Pod can communicate with are identified through a combination of the following 3 identifiers:

Other pods that are allowed (exception: a pod cannot block access to itself)
Namespaces that are allowed
IP blocks (exception: traffic to and from the node where a Pod is running is always allowed, regardless of the IP address of the Pod or the node)


Within Azure there are two main approaches to Network Policies.

Azure’s own implementation, called Azure Network Policies. (Supports Linux) supported by Azure support team
Calico Network Policies, an open-source network and network security solution founded by Tigera. (Supports Linux and Windows) and Kubenet CNI


User Access


Limit the number of users who can create pods and try not to use unknown or unmaintained libraries.

Securing Namespaces


Use different namespaces for different environments like (dev/test/stage/prod). 

Securing Resources


By default, all resources are created with unbounded CPU and memory limits. To prevent “noisy neighbors” and potential DoS (denial of service) situations, do not let containers run without an upper bound on resources. You can assign resource quota policies at the namespace level, in order to limit overconsumption of the CPU and memory resources a pod is allowed to consume.

Securing Kubernetes Instances


Protect the Identity and Access Management (IAM) credentials of your nodes’ IAM Instance Role. If you don’t use kube2iam or kiam, which both work by intercepting calls to the metadata endpoint and issuing limited credentials back to the pod based on your configuration, install the Calico CNI so you can add a Network Policy to block access to the metadata IP, 169.254.169.254.

Links


https://msandbu.org/securing-azure-kubernetes-and-workloads/
https://kodekloud.com/kubernetes-security-best-practices/
https://kubernetes.io/docs/reference/access-authn-authz/authentication/
https://kubernetes.io/docs/tasks/administer-cluster/declare-network-policy/
https://www.stackrox.io/blog/amazon-eks-security-best-practices/
https://cloud.redhat.com/blog/11-kubernetes-admission-controller-best-practices-for-security
https://semaphoreci.com/blog/kubernetes-deployments
https://aws.amazon.com/blogs/containers/policy-based-countermeasures-for-kubernetes-part-1/
https://aws.amazon.com/blogs/containers/policy-based-countermeasures-for-kubernetes-part-2/


## AdminNotes.md

      
    Raw
  

              AdminNotes.md
            
          
    Kubernetes Administration Notes

Connect to an EKS Cluster with Kubectl


Authenticate to AWS (CLI) and connect to the AWS region your cluster is in.
This example uses saml2aws to authenticate to AWS using ADFS SSO.
$ cat >> ~/.aws/config <<'EOAWSCONF'
[profile non-prod]
output = json
credential_process = saml2aws login --skip-prompt --quiet --credential-process --role arn:aws:iam::ACCOUNT-ID:role/ROLE-NAME --profile non-prod-saml2aws
EOAWSCONF

$ cat >> ~/.saml2aws <<'EOSAMLCONF'
[default]
app_id               =
url                  = https://my-domain/adfs/ls/IdpInitiatedSignOn.aspx
username             = myuser@my-domain
provider             = ADFS
mfa                  = Auto
skip_verify          = false
timeout              = 0
aws_urn              = urn:amazon:webservices
aws_session_duration = 3600
# aws_session_duration = 28800
aws_profile          = default
resource_id          =
subdomain            =
#role_arn             = arn:aws:iam::ACCOUNT-ID:role/ROLE-NAME
region               =
http_attempts_count  =
http_retry_delay     =
credentials_file     =
EOSAMLCONF

$ saml2aws login
# <NOTE: Make sure saml2aws can connect to your system keychain to
# store your username and password>

$ aws --profile non-prod --region us-east-2 sts get-caller-identity
{
    "UserId": "ABCDEFGHIJKLMNOPQRSTUV:myuser@my-domain",
    "Account": "1234567890",
    "Arn": "arn:aws:sts::MY-ACCOUNT:assumed-role/MY-ROLE/myuser@my-domain"
}


Look up the name of your EKS cluster
$ aws eks list-clusters
{
    "clusters": [
        "dev"
    ]
}


Generate the kubeconfig for the cluster
$ aws eks update-kubeconfig --name nonprod
Updated context arn:aws:eks:MY-REGION:MY-ACCOUNT:cluster/dev in /home/myuser/.kube/config


Test kubectl connection
$ kubectl cluster-info
Kubernetes control plane is running at https://aaaaaaaaaaaaaaaaaaaaaaa.gr7.us-east-2.eks.amazonaws.com
CoreDNS is running at https://aaaaaaaaaaaaaaaaaaaaaaa.gr7.us-east-2.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
metrics-server is running at https://aaaaaaaaaaaaaaaaaaaa.gr7.us-east-2.eks.amazonaws.com/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy


To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Create a namespace per deployment group


By default, kubernetes will use the default namespace. This becomes difficult to manage over time
as more and more components are piled together. Best practice is to create a new namespace specific
to what you deploy.
Due to security limitations in Kubernetes, you should create a unique namespace for every one of your
applications. Otherwise you will not be able to control some of the interactions between applications
in the same namespace.
You can add hierarchical information to your namespaces, such as your Product, Subproduct, and Environment.
You might not have a Subproduct, but it can also be a general term for any grouping of services, like a database and web server.
It is recommended not to include an organization or team name, as when those names inevitably change as organizations get
reorganized, the names will become inaccurate.


Create a product/subproduct/environment/application namespace. You can do this manually in kubectl:
$ kubectl create namespace product/subproduct/dev/appname

Or you can do this using a version-controlled file:
$ cat > create-namespace.yml <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: product/subproduct/dev/appname
EOF
$ kubectl apply -f create-namespace.yml


Switch your context to use the new namespace
$ kubectl config set-context --current --namespace=product/subproduct/dev/appname


General Kubernetes guidance

Namespace


When you create an object, without specifying the namespace, it will be created in current Namespace which, if not changed manually to a custom Namespace, is default Namespace.
You can specify a namespace for an object in two ways: by either using namespace field in specification YAML file for object, or using namespace flag in command line.
Kubernetes namespaces must be an RFC 1123 label (MUST consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character).
You can connect to Kubernetes services running in a specific namespace using the fully-qualified DNS address:
<service name>.<namespace name>.svc.cluster.local
Network policies may limit access to services between namespaces.

Create a namespace

kubectl create namespace test1

Get namespaces

kubectl get namespaces -o yaml
kubectl get namespace test1 -o yaml

Describe namespace

kubectl describe namespace test1

Set namespace

kubectl config set-context --current --namespace=test1

Find what namespace your context is set to

kubectl config get-contexts
kubectl config current-context

Create a pod in a specific namespace


Option 1: Specify namespace in a YAML config (this cannot be overridden)

apiVersion: v1
kind: Pod
metadata:
  name: mypod-1
  namespace: test1
  labels:
    name: mypod-1
spec:
  containers:
    - name: mypod-1
      image: bash


Option 2: Specify namespace in command-line

kubectl apply -f pod-create.yaml –namespace=test1

Delete a namespace

kubectl delete namespace test1

What resources are and aren’t namespaced?

kubectl api-resources --namespaced=true | rev | awk '{print $1}' | rev tail -n +2 | xargs | fold -s -w 100
Binding ConfigMap ControllerRevision CronJob DaemonSet Deployment Endpoints EndpointSlice Event 
Event GenerateRequest HorizontalPodAutoscaler Ingress Ingress Job Lease LimitRange 
LocalSubjectAccessReview NetworkPolicy NetworkPolicy NetworkSet PersistentVolumeClaim Pod 
PodDisruptionBudget PodMetrics PodTemplate Policy PolicyReport ReplicaSet ReplicationController 
ReportChangeRequest ResourceQuota Role RoleBinding Secret SecurityGroupPolicy Service 
ServiceAccount StatefulSet VerticalPodAutoscaler VerticalPodAutoscalerCheckpoint

kubectl api-resources --namespaced=false | rev | awk '{print $1}' | rev tail -n +2 | xargs | fold -s -w 100
APIService BGPConfiguration BGPPeer BlockAffinity CertificateSigningRequest ClusterInformation 
ClusterPolicy ClusterPolicyReport ClusterReportChangeRequest ClusterRole ClusterRoleBinding 
ComponentStatus CSIDriver CSINode CustomResourceDefinition ENIConfig FelixConfiguration FlowSchema 
GlobalNetworkPolicy GlobalNetworkSet HostEndpoint IngressClass Installation IPAMBlock IPAMConfig 
IPAMHandle IPPool KubeControllersConfiguration MutatingWebhookConfiguration Namespace Node 
NodeMetrics PersistentVolume PodSecurityPolicy PriorityClass PriorityLevelConfiguration 
RuntimeClass SelfSubjectAccessReview SelfSubjectRulesReview StorageClass SubjectAccessReview 
TigeraStatus TokenReview ValidatingWebhookConfiguration VolumeAttachment

Kubernetes Dashboard

Deploy dashboard

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml

Access dashboard

$ kubectl proxy
$ firefox http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

Service Accounts

Create a service account

$ kubectl create serviceaccount dashboard-kube-web-view --namespace=kube-system

Get a service account secret


Get the service account details

$ kubectl get serviceaccounts dashboard-kube-web-view --namespace=kube-system -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    meta.helm.sh/release-name: dashboard
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2021-09-03T16:11:10Z"
  labels:
    app.kubernetes.io/app: dashboard-kube-web-view
    app.kubernetes.io/instance: dashboard
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: dashboard-kube-web-view
    app.kubernetes.io/version: 21.2.0
    elsevier.systems/managed-by: core-elsevier-platform
    helm.sh/chart: kube-web-view-1.1.0
  name: dashboard-kube-web-view
  namespace: kube-system
  resourceVersion: "129852"
  uid: 72417d4f-b3fd-4171-aced-1bb61da909b6
secrets:
- name: dashboard-kube-web-view-token-glnbv


Get the secret value

$ kubectl get secret dashboard-kube-web-view-token-glnbv --namespace=kube-system  -o json \
    | jq -r .data.token | base64 -d ; echo ""

Kubectl tricks

Get specific path out of JSON output

$ kubectl get webhookrelayforwards.forward.webhookrelay.com forward-to-jenkins -o 'jsonpath={.status.publicEndpoints[0]}'

Miscellaneous tips

Inherent limits on pods in EKS


The number of Pods that can be scheduled in EKS using the AWS VPC CNI depends on the instance type. There is a limit of ENIs per instance: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI

All Kubernetes clusters have inherent static limitations


No more than 110 pods per node
No more than 5000 nodes
No more than 150000 total pods
No more than 300000 total containers

Security within Namespaces is DANGEROUS


Service Accounts and Secrets are freely usable within a namespace. Anyone with permission to deploy a Pod in a namespace can use any Service Account and any Secret.
Namespaces only isolate the control plane, not the data plane. Anything within a namespace can attack the cluster. Use Sandboxing (gVisor, Kata) to protect the data plane.

Labels are not secure


??? (read it on a presentation slide)


## DeveloperNotes.md

      
    Raw
  

              DeveloperNotes.md
            
          
    Kubernetes Developer Notes

This page will document some of the information most useful when developing apps to run on Kubernetes.
Liveness, Readiness, & Startup Probes

Your application should have a liveness, readiness, and startup probe configured before you deploy it to production. This ensures that Kubernetes won’t send it traffic until the application (and its dependencies) are ready to start serving traffic.
Links:

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes

Resource Limits

Your application should specify the amount of resources it needs to run. This has a couple purposes:

If you have a process that ends up taking up all the CPU available, it can impact other apps. CPU limits prevent one CPU-hungry app from limiting the entire cluster.
If you have a process with runaway memory growth, memory won’t be available for other apps, and it’s going to die eventually when all memory is exhausted.

Requests: The requests specification is used at pod placement time: Kubernetes will look for a node that has both enough CPU and memory according to the requests configuration.
Limits: This is enforced at runtime. If a container exceeds the limits, Kubernetes will try to stop it. For CPU, it will simply curb the usage so a container typically can't exceed its limit capacity ; it won't be killed, just won't be able to use more CPU. If a container exceeds its memory limits, it could be terminated.
Considerations:

If you under-size these limits, your application may end up running out of memory or running slowly.
If you over-size these limits, yours or someone else’s app may not deploy correctly if the Kubernetes cluster doesn’t have enough resources to schedule all the apps with their reserved limits. You may not notice a problem unless you are watching the deployment status of your app, and problems may only show themselves after apps re-start themselves when they fail.
Your Operations Engineers may impose resource quotas at the Namespace level to prevent apps from taking up too many resources (or too few). Nobody wants to be limited, but we do need to make sure all the apps can run.

Links:

https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits

Pod Settings

Always set your ImagePullPolicy to always:
spec:
  template:
    spec:
      containers:
      - name: demo-app

        # ... add this line to never cache the image
        imagePullPolicy: Always

Restrict network access to only what needs to access it. The following allows all access to the pod:
# pod network policies are a different resource
# add these lines at the end of deployment.yml
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all
spec:
  podSelector:
    matchLabels:
      app: demo-app
  ingress:
  - {}
  egress:
  - {}
  policyTypes:
  - Ingress
  - Egress

Restrict the security context of the container. You should restrict this as much as is needed for your app to still work. This example just makes the container filesystem read-only and sets a high user/group ID:
spec:
  template:
    spec:
      containers:
      - name: demo-app

        # add a security context for the container
        securityContext:
          runAsUser: 10001
          runAsGroup: 10001
          readOnlyRootFilesystem: true

Links

https://semaphoreci.com/blog/kubernetes-deployments