Skip to content

Instantly share code, notes, and snippets.

@liamkinne
Last active December 10, 2022 20:16
Show Gist options
  • Save liamkinne/dc6975a1c4fc513785be1408105a94a3 to your computer and use it in GitHub Desktop.
Save liamkinne/dc6975a1c4fc513785be1408105a94a3 to your computer and use it in GitHub Desktop.
A collection of OpenShift and Kubernetes best practices.

OpenShift Best Practices

A sort-of style guide and operational handbook for OpenShfit and k8s.

Inspired by the Uber Go Style Guide.

Semantics

Prefer generic resource names

Most resources only need to have unique names within namespaces. This makes de-duplication and templating easier because there is less variance to deal with.

This especially applies to the names of things inside resources. For example, always naming the port that exposes Prometheus metrics, metrics will mean you only need to match on one port name in your ServiceMonitor object for likely all of your pods in that namespace.

BadGood
apiVersion: v1
kind: Secret
metadata:
  name: dev-password
  namespace: dev
...
apiVersion: v1
kind: Secret
metadata:
  name: password
  namespace: dev
...

Don't store secrets in version control

Your Git repository may be private, but every developer syncing sensitive data to their devices to develop is not good practice.

Consider an external secret integration provider like the External Secrets Operator which supports which support most cloud providers, SaaS products and custom integrations.

Store both the username and password in the secret

This allows atomic updates to credentials in cases where both the username and password need to be updated at the same time. It can also be seen as part of security hardening to use randomised usernames and treat them as pseudo secretive data.

BadGood
apiVersion: v1
kind: Secret
metadata:
  name: credentials
  ...
data:
  password: aGVsbG93b3JsZA==
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: credentials
  ...
data:
  username: myusername
apiVersion: v1
kind: Secret
metadata:
  name: credentials
  ...
data:
  username: bXl1c2VybmFtZQ==
  password: aGVsbG93b3JsZA==

Workloads should have as little knowledge of underlying infrastructure as possible

You application will very likely have to be migrated from one cluster to another. Think carefully how configuration dependent on infrastructure is injected into resources. If the database host-name was to change, how many places would you have to update this information?

Most commonly infrastructure details reach applications via external secrets and GitOps. Use this to your advantage to have a single source of truth (even if this causes duplicate information inside OpenShift).

Performance

Always specify resource requests and limits on Pods.

Always define requests and limits for workloads to ensure they schedule correctly and do not over-use node resources.

You can use means such as Kyverno to automatically enforce this as a policy.

Always specify Pod liveness and readiness probes.

Most applications already support some kind of health checking through means of a health endpoint.

You can use means such as Kyverno to automatically enforce this as a policy.

Don't set the resource limits on Pods too low to start up correctly

While your application may only use 10m CPUs, many applications (especially those using the JVM) require 10x the average CPU usage during startup. Setting the resource limit too low can cause an application to crash loop in the event that cannot meet the startup deadline.

Security

Don't expose the default route

*.apps.<cluster name>.<cluster domain> should never be open to the internet as admission to this route is typically to relaxed to avoid accidentally exposing insecure applications.

If you do need an application exposed to the internet, use explicit ingress definitions and/or if the application is not intended to be public, use an OAuth proxy like OpenShift OAuth Proxy to gate access.

Only provision user access via groups

...

Subscribe to Red Hat security announcements

https://listman.redhat.com/mailman/listinfo/rhsa-announce

Doing so will mean you are notified when CVEs are made public and when updates with mitigations are released. This can also be integrated with your on-call paging system to avoid missing updates.

Monitoring

Use the default alert rule severity names

...and you will be rewarded. none, info, warning and critical are almost always granular enough for most use cases and means you can gracefully handle the predefined alert rules that ship with OpenShift.

Remember that the severity of an alert is just a label. You can define your own in you need additional metadata, rather than an ever expanding set of severities.

Use an Alertmanager heartbeat to monitor for connectivity outages

How do you know that your cluster has lost internet connectivity if it has no internet connection?

The following Alertmanager configuration will match on the Watchdog alert and will call a web-hook every minute.

route:
- receiver: watchdog
  repeat_interval: 60s
  match:
    alertname: Watchdog
receivers:
- name: watchdog
  webhook_configs:
  - url: 'https://example.com/alertmanager-heartbeat/ping'
    send_resolved: true
    http_config:
      basic_auth:
        password: ba8c85ae-6573-4590-82db-273bb7a3e63b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment