Skip to content

Instantly share code, notes, and snippets.

@mhausenblas
Last active March 21, 2016 08:08
Show Gist options
  • Save mhausenblas/b74742ad10f756e680c5 to your computer and use it in GitHub Desktop.
Save mhausenblas/b74742ad10f756e680c5 to your computer and use it in GitHub Desktop.
Kubernetes debugging session leveraging labels

That's our RC:

$ cat ws-rc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  name: webserver-rc
spec:
  replicas: 5
  selector:
    app: webserver
    status: serving
  template:
    metadata:
      labels:
        app: webserver
        env: prod
        status: serving
    spec:
      containers:
      - image: nginx:1.9.7
        name: nginx

Now let's fire up some pods:

$ kubectl create -f ws-rc.yaml
replicationcontrollers/webserver-rc

All pods serving traffic, yes? Let's have a look:

$ kubectl get pods --selector="status=serving"
NAME                 READY     STATUS    RESTARTS   AGE
webserver-rc-baeui   1/1       Running   0          18s
webserver-rc-dgijd   1/1       Running   0          18s
webserver-rc-ii79i   1/1       Running   0          18s
webserver-rc-lxag2   1/1       Running   0          18s <-- THIS ONE GONE BAD
webserver-rc-x5yvm   1/1       Running   0          18s

OIC one pod, webserver-rc-lxag2, has gone bad. Let's isolate it:

$ kubectl label pods webserver-rc-lxag2 --overwrite status=troubleshooting
NAME                 READY     STATUS    RESTARTS   AGE
webserver-rc-lxag2   1/1       Running   0          45s

And how many pods do we now have serving traffic (remember, I asked the RC for 5 replicas):

$ kubectl get pods --selector="status=serving"
NAME                 READY     STATUS    RESTARTS   AGE
webserver-rc-baeui   1/1       Running   0          49s
webserver-rc-dgijd   1/1       Running   0          49s
webserver-rc-ii79i   1/1       Running   0          49s
webserver-rc-pwst1   0/1       Running   0          4s <-- BACKUP ALREADY UP AND RUNNING
webserver-rc-x5yvm   1/1       Running   0          49s

Sweet! Within 4s the RC has detected that I took the bad pod webserver-rc-lxag2 offline and has launched a backup somewhere. Now, how's my guinea pig doing?

$ kubectl get pods --selector="status!=serving"
NAME                 READY     STATUS    RESTARTS   AGE
webserver-rc-lxag2   1/1       Running   0          1m <-- HERE'S MY GUINEA PIG

Here is the bad pod webserver-rc-lxag2 that I can now live-debug, without impacting the overall operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment