A sort-of style guide and operational handbook for OpenShfit and k8s.
Inspired by the Uber Go Style Guide.
Most resources only need to have unique names within namespaces. This makes de-duplication and templating easier because there is less variance to deal with.
This especially applies to the names of things inside resources. For example, always naming the port that exposes Prometheus metrics, metrics
will mean you only need to match on one port name in your ServiceMonitor
object for likely all of your pods in that namespace.
Bad | Good |
---|---|
apiVersion: v1
kind: Secret
metadata:
name: dev-password
namespace: dev
... |
apiVersion: v1
kind: Secret
metadata:
name: password
namespace: dev
... |
Your Git repository may be private, but every developer syncing sensitive data to their devices to develop is not good practice.
Consider an external secret integration provider like the External Secrets Operator which supports which support most cloud providers, SaaS products and custom integrations.
This allows atomic updates to credentials in cases where both the username and password need to be updated at the same time. It can also be seen as part of security hardening to use randomised usernames and treat them as pseudo secretive data.
Bad | Good |
---|---|
apiVersion: v1
kind: Secret
metadata:
name: credentials
...
data:
password: aGVsbG93b3JsZA==
---
apiVersion: v1
kind: ConfigMap
metadata:
name: credentials
...
data:
username: myusername |
apiVersion: v1
kind: Secret
metadata:
name: credentials
...
data:
username: bXl1c2VybmFtZQ==
password: aGVsbG93b3JsZA== |
You application will very likely have to be migrated from one cluster to another. Think carefully how configuration dependent on infrastructure is injected into resources. If the database host-name was to change, how many places would you have to update this information?
Most commonly infrastructure details reach applications via external secrets and GitOps. Use this to your advantage to have a single source of truth (even if this causes duplicate information inside OpenShift).
Always define requests and limits for workloads to ensure they schedule correctly and do not over-use node resources.
You can use means such as Kyverno to automatically enforce this as a policy.
Most applications already support some kind of health checking through means of a health endpoint.
You can use means such as Kyverno to automatically enforce this as a policy.
While your application may only use 10m CPUs, many applications (especially those using the JVM) require 10x the average CPU usage during startup. Setting the resource limit too low can cause an application to crash loop in the event that cannot meet the startup deadline.
*.apps.<cluster name>.<cluster domain>
should never be open to the internet as admission to this route is typically to relaxed to avoid accidentally exposing insecure applications.
If you do need an application exposed to the internet, use explicit ingress definitions and/or if the application is not intended to be public, use an OAuth proxy like OpenShift OAuth Proxy to gate access.
...
https://listman.redhat.com/mailman/listinfo/rhsa-announce
Doing so will mean you are notified when CVEs are made public and when updates with mitigations are released. This can also be integrated with your on-call paging system to avoid missing updates.
...and you will be rewarded. none
, info
, warning
and critical
are almost always granular enough for most use cases and means you can gracefully handle the predefined alert rules that ship with OpenShift.
Remember that the severity of an alert is just a label. You can define your own in you need additional metadata, rather than an ever expanding set of severities.
How do you know that your cluster has lost internet connectivity if it has no internet connection?
The following Alertmanager configuration will match on the Watchdog
alert and will call a web-hook every minute.
route:
- receiver: watchdog
repeat_interval: 60s
match:
alertname: Watchdog
receivers:
- name: watchdog
webhook_configs:
- url: 'https://example.com/alertmanager-heartbeat/ping'
send_resolved: true
http_config:
basic_auth:
password: ba8c85ae-6573-4590-82db-273bb7a3e63b