When $2.3M disappeared because someone forgot to enable network policies. A real Day 1 security story.
It was 2 AM when Sarah's phone exploded with alerts. As the security lead for a rapidly growing fintech startup, she'd seen her share of incidents. But nothing prepared her for this.
"We've been breached. They're draining customer accounts."
The post-mortem was brutal: A single misconfigured pod without network policies allowed lateral movement across the entire cluster. The attacker exploited a vulnerability in a third-party library, moved laterally to the database pods, and exfiltrated credentials. All because security was treated as a "Day 2 problem."
The lesson? Security isn't a feature you add later. It's the foundation you build on Day 1.
This is Part 5 of our Kubernetes Production Journey series, where we transform your cluster from a house of cards into Fort Knox. We'll cover everything from network segmentation to supply chain security, with battle-tested configurations from 50+ production deployments.
By the end of this article, you'll know how to:
- Layer 1: Implement zero-trust network policies with Cilium
- Layer 2: Enforce Pod Security Standards (restricted profile)
- Layer 3: Automate policy enforcement with OPA/Kyverno
- Layer 4: Manage secrets securely with HashiCorp Vault
- Layer 5: Detect runtime threats with Falco
- Layer 6: Implement least-privilege RBAC
- Layer 7: Secure your supply chain with Sigstore
- Layer 8: Maintain compliance with CIS benchmarks
All code examples are available in our GitHub repository.
Security in Kubernetes isn't a single checkbox - it's multiple layers that work together. Think of it like a medieval castle: walls, moats, guards, and inner keeps. If one layer fails, others protect you.
┌─────────────────────────────────────────┐
│ Layer 7: Supply Chain Security │ ← Cosign, SBOM, Provenance
├─────────────────────────────────────────┤
│ Layer 6: Access Control (RBAC) │ ← Who can do what
├─────────────────────────────────────────┤
│ Layer 5: Runtime Security │ ← Falco threat detection
├─────────────────────────────────────────┤
│ Layer 4: Secrets Management │ ← Vault, encryption
├─────────────────────────────────────────┤
│ Layer 3: Policy Enforcement │ ← OPA/Kyverno
├─────────────────────────────────────────┤
│ Layer 2: Pod Security │ ← PSS, security contexts
├─────────────────────────────────────────┤
│ Layer 1: Network Security │ ← Network policies, mTLS
└─────────────────────────────────────────┘
Let's build each layer, starting from the foundation.
The Principle: Never trust, always verify. Every pod-to-pod communication must be explicitly allowed.
In the breach story above, the attacker moved from the frontend pod → backend pod → database pod without any restriction. With proper network policies, that lateral movement would have been impossible.
Cilium provides powerful network policies with L7 (HTTP/gRPC) filtering, which standard Kubernetes NetworkPolicies can't do.
# File: 01-network-policies/cilium-policies.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: deny-all-default
namespace: production
spec:
endpointSelector: {}
ingress: []
egress: []This creates a default-deny posture. Now we selectively allow traffic:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: frontend-to-backend
namespace: production
spec:
endpointSelector:
matchLabels:
app: frontend
egress:
- toEndpoints:
- matchLabels:
app: backend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET|POST|PUT|DELETE"
path: "/api/.*"Key Features:
- L7 filtering: Only specific HTTP methods and paths allowed
- Explicit allow-listing: Frontend can only talk to backend on port 8080
- API path restrictions: Blocks access to admin endpoints
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
name: block-metadata-service
spec:
endpointSelector: {}
egressDeny:
- toCIDR:
- 169.254.169.254/32 # AWS/Azure metadata
- fd00:ec2::254/128 # AWS IPv6 metadataThis prevents pods from accessing cloud provider metadata services, which often contain sensitive credentials.
If you're using Istio, enable mutual TLS for encrypted pod-to-pod communication:
# File: 01-network-policies/istio-mtls.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default-strict-mtls
namespace: istio-system
spec:
mtls:
mode: STRICTThis encrypts all traffic in the mesh and verifies pod identities automatically.
The Principle: Run containers with the least privileges necessary. No root, no privilege escalation, read-only filesystems.
Kubernetes 1.25+ replaces PodSecurityPolicies with Pod Security Standards. There are three profiles:
- Privileged: Unrestricted (avoid in production)
- Baseline: Minimal restrictions
- Restricted: Hardened, best for production
# File: 02-pod-security/pss-baseline.yaml
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restrictedNow any pod that violates the restricted profile will be rejected.
Here's a golden template for secure pods:
# File: 02-pod-security/secure-pod-template.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
# Disable automounting service account tokens
automountServiceAccountToken: false
# Pod-level security context
securityContext:
runAsNonRoot: true
runAsUser: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:v1.0.0
imagePullPolicy: Always
# Container-level security context
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 10001
capabilities:
drop:
- ALL
# Resource limits (prevent resource exhaustion attacks)
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
# Liveness and readiness probes
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
# Volume mounts for writable directories
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}Security Features Explained:
- No Root User:
runAsNonRoot: trueprevents running as root - Read-Only Filesystem: Prevents attackers from modifying files
- Drop All Capabilities: Removes all Linux capabilities
- SecComp Profile: Restricts syscalls to prevent container escapes
- No Privilege Escalation: Prevents gaining higher privileges
- Resource Limits: Prevents resource exhaustion attacks
The Principle: Don't rely on developers to remember security best practices. Enforce them automatically.
Kyverno is easier to learn than OPA and perfect for Kubernetes-specific policies.
# File: 03-policy-enforcement/kyverno-policies.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-image-registries
spec:
validationFailureAction: enforce
background: true
rules:
- name: validate-registries
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Images must come from approved registries: registry.company.com or ghcr.io"
pattern:
spec:
containers:
- image: "registry.company.com/* | ghcr.io/*"apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-security-context
spec:
validationFailureAction: enforce
rules:
- name: check-security-context
match:
any:
- resources:
kinds:
- Pod
validate:
message: "All containers must define securityContext with runAsNonRoot and readOnlyRootFilesystem"
pattern:
spec:
containers:
- securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: falseKyverno can also mutate resources to add missing configurations:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-labels
spec:
rules:
- name: add-labels
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
mutate:
patchStrategicMerge:
metadata:
labels:
managed-by: kyverno
security-reviewed: "true"The Principle: Never store secrets in plain text. Use a centralized secrets manager with encryption, rotation, and audit logging.
Kubernetes Secrets are Base64-encoded, not encrypted. Anyone with etcd access can read them. For production, use HashiCorp Vault or cloud-native solutions (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager).
# Add Vault Helm repo
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
# Install Vault in HA mode
helm install vault hashicorp/vault \
--namespace vault \
--create-namespace \
--set server.ha.enabled=true \
--set server.ha.replicas=3 \
--set injector.enabled=true# Initialize Vault (save keys securely!)
kubectl exec vault-0 -n vault -- vault operator init
# Unseal Vault on all replicas
kubectl exec vault-0 -n vault -- vault operator unseal <key1>
kubectl exec vault-1 -n vault -- vault operator unseal <key1>
kubectl exec vault-2 -n vault -- vault operator unseal <key1>
# Enable Kubernetes auth
kubectl exec vault-0 -n vault -- vault auth enable kubernetes
# Configure Kubernetes auth
kubectl exec vault-0 -n vault -- vault write auth/kubernetes/config \
kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443"# Enable KV secrets engine
kubectl exec vault-0 -n vault -- vault secrets enable -path=secret kv-v2
# Store database credentials
kubectl exec vault-0 -n vault -- vault kv put secret/database/config \
username="dbadmin" \
password="super-secret-password"# File: 04-secrets-management/vault-injection.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-vault
namespace: production
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "myapp"
vault.hashicorp.com/agent-inject-secret-database: "secret/data/database/config"
vault.hashicorp.com/agent-inject-template-database: |
{{- with secret "secret/data/database/config" -}}
export DB_USERNAME="{{ .Data.data.username }}"
export DB_PASSWORD="{{ .Data.data.password }}"
{{- end }}
labels:
app: myapp
spec:
serviceAccountName: myapp
containers:
- name: app
image: myapp:v1.0.0
command: ["/bin/sh"]
args: ["-c", "source /vault/secrets/database && ./app"]What Happens:
- Vault Agent sidecar is automatically injected
- Agent authenticates with Kubernetes auth
- Secrets are fetched from Vault and written to
/vault/secrets/database - Application sources the file and uses environment variables
For cloud-native secrets (AWS Secrets Manager, Azure Key Vault), use External Secrets Operator:
# File: 04-secrets-management/external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secretsmanager
namespace: production
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager
kind: SecretStore
target:
name: database-secret
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: prod/database/credentials
property: username
- secretKey: password
remoteRef:
key: prod/database/credentials
property: passwordView secrets management configs →
The Principle: Detect and respond to suspicious behavior at runtime, not just at deploy time.
# Add Falco Helm repo
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
# Install Falco with Sidekick for alerts
helm install falco falcosecurity/falco \
--namespace falco-system \
--create-namespace \
--set falcosidekick.enabled=true \
--set falcosidekick.webui.enabled=true# File: 05-runtime-security/falco-rules.yaml
- rule: Detect Shell in Container
desc: Detect when a shell is spawned in a container
condition: >
spawned_process and
container and
proc.name in (shell_binaries)
output: >
Shell spawned in container (user=%user.name container=%container.name
shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
priority: WARNING
tags: [container, shell]
- rule: Write Below Binary Directory
desc: Detect attempts to write to system binary directories
condition: >
open_write and
container and
fd.directory in (/bin, /usr/bin, /sbin, /usr/sbin)
output: >
File write below binary directory (user=%user.name command=%proc.cmdline
file=%fd.name container=%container.name)
priority: ERROR
tags: [filesystem, container]
- rule: Outbound Connection to Known C2 Server
desc: Detect connections to known command and control servers
condition: >
outbound and
container and
fd.sip in (malicious_ips)
output: >
Outbound connection to suspicious IP (user=%user.name container=%container.name
ip=%fd.rip port=%fd.rport)
priority: CRITICAL
tags: [network, malware]
- rule: Read Sensitive File
desc: Detect reading of sensitive files like /etc/shadow
condition: >
open_read and
container and
fd.name in (/etc/shadow, /etc/sudoers, /root/.ssh/id_rsa)
output: >
Sensitive file read (user=%user.name file=%fd.name container=%container.name
command=%proc.cmdline)
priority: CRITICAL
tags: [filesystem, security]Route Falco alerts to Slack, PagerDuty, or your SIEM:
# File: 05-runtime-security/falcosidekick-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: falcosidekick-config
namespace: falco-system
data:
config.yaml: |
slack:
webhookurl: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
minimumpriority: "warning"
outputformat: "text"
pagerduty:
integrationkey: "YOUR_PAGERDUTY_KEY"
minimumpriority: "error"
elasticsearch:
hostport: "https://elasticsearch:9200"
index: "falco"
minimumpriority: "debug"View runtime security configs →
The Principle: Grant the minimum permissions necessary. No cluster-admin for applications.
- Never use cluster-admin for applications
- Use namespace-scoped Roles, not ClusterRoles
- Create service accounts per application
- Use Groups for human users
- Regularly audit permissions
# File: 06-rbac/developer-role.yaml
apiVersion: v1
kind: Namespace
metadata:
name: team-alpha
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: developer-sa
namespace: team-alpha
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: developer-role
namespace: team-alpha
rules:
# Read-only access to most resources
- apiGroups: ["", "apps", "batch"]
resources: ["pods", "deployments", "jobs", "services", "configmaps"]
verbs: ["get", "list", "watch"]
# Can view logs
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list"]
# Can exec for debugging (carefully consider this)
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
# Cannot modify critical resources
- apiGroups: [""]
resources: ["secrets", "persistentvolumes"]
verbs: []
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: developer-binding
namespace: team-alpha
subjects:
- kind: ServiceAccount
name: developer-sa
namespace: team-alpha
- kind: Group
name: developers
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: developer-role
apiGroup: rbac.authorization.k8s.io# File: 06-rbac/cicd-role.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cicd-deployer
namespace: production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cicd-deployer-role
namespace: production
rules:
# Can create and update deployments
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "create", "update", "patch"]
# Can create and update services
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "list", "create", "update", "patch"]
# Can create and update configmaps (not secrets!)
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "create", "update", "patch"]
# Can read secrets for validation only
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
# Can view pods and logs for debugging
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cicd-deployer-binding
namespace: production
subjects:
- kind: ServiceAccount
name: cicd-deployer
namespace: production
roleRef:
kind: Role
name: cicd-deployer-role
apiGroup: rbac.authorization.k8s.io#!/bin/bash
# File: 06-rbac/audit-rbac.sh
echo "=== Cluster-Wide RBAC Audit ==="
echo ""
echo "1. Users/ServiceAccounts with cluster-admin:"
kubectl get clusterrolebindings -o json | \
jq -r '.items[] | select(.roleRef.name=="cluster-admin") |
.metadata.name + " -> " + (.subjects[]?.name // "N/A")'
echo ""
echo "2. ServiceAccounts that can create pods (potential security risk):"
kubectl get rolebindings,clusterrolebindings -A -o json | \
jq -r '.items[] | select(.roleRef.name | contains("admin") or contains("edit")) |
.metadata.namespace + "/" + .metadata.name'
echo ""
echo "3. Overly permissive roles (wildcard permissions):"
kubectl get roles,clusterroles -A -o json | \
jq -r '.items[] | select(.rules[]? | .verbs[]? == "*") |
.metadata.namespace + "/" + .metadata.name'The Principle: Trust but verify. Only deploy signed, verified container images.
Recent attacks (SolarWinds, Log4Shell) showed attackers can compromise the build pipeline. Image signing ensures images haven't been tampered with.
# Install cosign
curl -O https://github.com/sigstore/cosign/releases/latest/download/cosign-linux-amd64
sudo mv cosign-linux-amd64 /usr/local/bin/cosign
sudo chmod +x /usr/local/bin/cosign# Generate keypair
cosign generate-key-pair
# Sign image
cosign sign --key cosign.key registry.company.com/myapp:v1.0.0
# Verify signature
cosign verify --key cosign.pub registry.company.com/myapp:v1.0.0# File: 07-supply-chain/verify-images-policy.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
spec:
validationFailureAction: enforce
background: false
webhookTimeoutSeconds: 30
rules:
- name: verify-signature
match:
any:
- resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "registry.company.com/*"
attestors:
- count: 1
entries:
- keys:
publicKeys: |-
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
-----END PUBLIC KEY-----Now any unsigned image will be rejected at admission time.
# Install Syft
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh
# Generate SBOM
syft registry.company.com/myapp:v1.0.0 -o json > sbom.json
# Attach SBOM to image
cosign attach sbom --sbom sbom.json registry.company.com/myapp:v1.0.0# Install Trivy
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh
# Scan image
trivy image --severity HIGH,CRITICAL registry.company.com/myapp:v1.0.0
# Fail CI/CD if vulnerabilities found
trivy image --exit-code 1 --severity CRITICAL registry.company.com/myapp:v1.0.0View supply chain security configs →
The Principle: Meet compliance requirements from Day 1. Don't scramble during audits.
# Run kube-bench
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
# View results
kubectl logs job/kube-bench
# Example output:
# [PASS] 1.2.1 Ensure that the --anonymous-auth argument is set to false
# [FAIL] 1.2.2 Ensure that the --basic-auth-file argument is not set
# [WARN] 1.2.3 Ensure that the --token-auth-file parameter is not set# File: 08-compliance/kube-bench-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: kube-bench-scan
namespace: security
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
hostPID: true
containers:
- name: kube-bench
image: aquasec/kube-bench:latest
command: ["kube-bench", "run", "--targets", "node,policies"]
volumeMounts:
- name: var-lib-etcd
mountPath: /var/lib/etcd
readOnly: true
- name: etc-kubernetes
mountPath: /etc/kubernetes
readOnly: true
restartPolicy: Never
volumes:
- name: var-lib-etcd
hostPath:
path: /var/lib/etcd
- name: etc-kubernetes
hostPath:
path: /etc/kubernetes| Check | Requirement | Implementation |
|---|---|---|
| 1.2.1 | Disable anonymous auth | --anonymous-auth=false in API server |
| 1.2.6 | Enable RBAC | --authorization-mode=RBAC |
| 3.2.1 | Restrict network access | Network policies |
| 4.2.1 | Run as non-root | runAsNonRoot: true |
| 5.1.5 | Secure secret encryption | Enable encryption at rest |
| 5.2.2 | Minimize privileges | Least-privilege RBAC |
# File: 08-compliance/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <BASE64_ENCODED_SECRET>
- identity: {}Then configure API server:
--encryption-provider-config=/etc/kubernetes/encryption-config.yamlView compliance configurations →
Use this checklist on Day 1 of every deployment:
- Review security architecture diagram
- Ensure all team members complete security training
- Prepare incident response runbook
- Set up security monitoring dashboards
- Deploy default-deny network policies
- Configure allow-list rules for inter-service communication
- Block cloud metadata service access
- Enable service mesh mTLS (if using Istio/Linkerd)
- Configure egress filtering
- Enable Pod Security Standards (restricted profile)
- Review and apply secure pod templates
- Disable privilege escalation
- Enable read-only root filesystem
- Configure SecComp profiles
- Set resource limits
- Install Kyverno/OPA
- Deploy image registry restriction policies
- Deploy security context enforcement policies
- Configure automatic label/annotation injection
- Set up policy violation alerts
- Deploy HashiCorp Vault or cloud secrets manager
- Configure Kubernetes authentication
- Migrate secrets from Kubernetes Secrets to Vault
- Enable secret rotation
- Configure audit logging
- Deploy Falco
- Configure custom detection rules
- Set up alert routing (Slack/PagerDuty)
- Test alert notifications
- Create incident response procedures
- Audit existing RBAC permissions
- Remove cluster-admin bindings
- Create namespace-scoped roles
- Set up service accounts per application
- Configure OIDC/LDAP integration for human users
- Enable audit logging
- Set up image signing with Cosign
- Generate and attach SBOMs
- Configure Kyverno image verification
- Integrate Trivy scanning in CI/CD
- Create vulnerability management process
- Run kube-bench CIS scan
- Address critical findings
- Enable encryption at rest
- Set up automated compliance scanning
- Document compliance posture
Here's a realistic timeline for implementing all security layers:
Morning (9 AM - 12 PM): Foundation
- 9:00 AM - Team briefing and security review
- 9:30 AM - Deploy network policies (Layer 1)
- 10:30 AM - Configure Pod Security Standards (Layer 2)
- 11:30 AM - Deploy Kyverno policies (Layer 3)
Afternoon (1 PM - 5 PM): Advanced Security
- 1:00 PM - Deploy and configure Vault (Layer 4)
- 2:30 PM - Install Falco and configure rules (Layer 5)
- 3:30 PM - Review and configure RBAC (Layer 6)
- 4:30 PM - Set up image signing (Layer 7)
Evening (5 PM - 6 PM): Validation
- 5:00 PM - Run CIS benchmarks (Layer 8)
- 5:30 PM - Security testing and validation
- 5:45 PM - Team handoff and documentation
Never treat security as "we'll add it later." The cost of retrofitting security is 10x higher than building it in from Day 1.
In every breach we've analyzed, multiple security controls failed. Layered defense saved the day.
Manual security reviews don't scale. Use policy engines (Kyverno/OPA) to enforce standards automatically.
Static scanning catches 70% of issues. Runtime monitoring (Falco) catches the remaining 30%.
Never put secrets in git, environment variables, or Kubernetes Secrets (unencrypted). Use Vault.
Start with zero permissions and add only what's needed. Never start with cluster-admin and try to remove permissions.
Most breaches happen through compromised dependencies. Sign images, generate SBOMs, and scan continuously.
Run chaos engineering tests that simulate breaches. Ensure your defenses actually work.
Impact: Lateral movement attacks, $2.3M breach (real story) Fix: Deploy default-deny policies on Day 1
Impact: Container escape vulnerabilities
Fix: Use runAsNonRoot: true and user ID > 10000
Impact: Secrets exposed via etcd access Fix: Use Vault or enable encryption at rest
Impact: Breaches detected weeks/months later Fix: Deploy Falco for real-time threat detection
Impact: Privilege escalation attacks Fix: Audit RBAC regularly, use namespace-scoped roles
Impact: Supply chain attacks, malicious images Fix: Implement Cosign signing and Kyverno verification
Impact: Inconsistent security across teams Fix: Deploy Kyverno/OPA with enforce mode
Impact: Failed audits, regulatory fines Fix: Run kube-bench regularly, maintain documentation
Monitor these metrics to maintain security posture:
- Security Policy Violations: Track Kyverno/OPA denials
- Falco Alerts: Monitor suspicious runtime behavior
- Vulnerability Count: Track CVEs in images (Critical/High)
- RBAC Permission Changes: Audit trail of access changes
- Failed Authentication Attempts: Detect brute force attacks
- Pod Security Standard Compliance: % of pods meeting restricted profile
- Image Signature Coverage: % of images signed
- Secret Rotation Rate: How often secrets are rotated
- CIS Benchmark Score: Track compliance improvements
- Network Policy Coverage: % of pods with network policies
# Policy violations per hour
rate(kyverno_policy_results_total{status="fail"}[1h])
# Falco alerts by severity
falco_alerts_total{priority="critical"}
# Unsigned images deployed
sum(kyverno_policy_results_total{policy_name="verify-image-signatures",status="fail"})
# Pods running as root
count(kube_pod_container_status_running{security_context_run_as_non_root="false"})
- Cilium Network Policies
- Kyverno Policy Engine
- HashiCorp Vault
- Falco Runtime Security
- Sigstore Cosign
- Trivy Vulnerability Scanner
- kube-bench CIS Benchmarks
- KCSA (Kubernetes and Cloud Native Security Associate)
- CKS (Certified Kubernetes Security Specialist)
- CNCF Security TAG
This completes our security foundation. In the next articles, we'll cover:
- Part 6: Monitoring & Observability - Prometheus, Grafana, and the three pillars
- Part 7: Disaster Recovery - Backup strategies and DR testing
- Part 8: Cost Optimization - FinOps practices for sustainable cloud economics
All code examples are available in the GitHub repository.
What security incident taught you the importance of Day 1 security? Share your story in the comments below.
Found this helpful? ⭐ Star the GitHub repo and share with your team!
Questions? Open an issue or start a discussion on GitHub.
This is Part 5 of the Kubernetes Production Journey series. Read Part 1: Infrastructure Provisioning to start from the beginning.
About the Author: Platform engineer with 8+ years building production Kubernetes platforms across fintech, healthcare, and e-commerce. Passionate about security, automation, and sharing real-world lessons.
🔐 Remember: Security is not a destination, it's a continuous journey. Start with these foundations on Day 1, and iterate as threats evolve.
Stay secure! 🛡️