Skip to content

Instantly share code, notes, and snippets.

@joejulian
Created April 30, 2021 17:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joejulian/2d79ecab29a76ab0df8711ffda34ac5d to your computer and use it in GitHub Desktop.
Save joejulian/2d79ecab29a76ab0df8711ffda34ac5d to your computer and use it in GitHub Desktop.
1. stop the cron-job that was restarting traefik
2. deploy traefik container patched by Jarred, decrease the amount of replicas to 1, enable debug mode and debug logging (add --loglevel=debug and -d to the argv list)
3. on each apiserver, done 1 by 1:
a. vi /etc/kubernetes/audit-policy/apiserver-audit-policy.yaml , add:
<right at the begining of rules section>
- level: RequestResponse
users: ["system:serviceaccount:kubeaddons:traefik-kubeaddons"]
b. restart apiserver
c. crictl ps | grep apise
d. crictl stop <id>
e. crictl rm <id>
4. identify a service that we will use during the testing, it should have just a single pod that can be resheduled on a different node
5. restart traefik
kubectl -n kubeaddons rollout restart deployment traefik-kubeaddons
6. immediately after start tailing traefik container logs, so that we are not bitten by logs rotation:
k get pods -n kubeaddons | grep traefik-kubeaddons
k logs -n kubeaddons traefik-kubeaddons-9fcd4f79c-m5skg --follow | tee ./traefik.log
7. confirm that service works
8. kill the service container, wait for it to be resheduled to some other node
9. confirm that the service does no long not work when accessing through traefik
10. fetch diagnostics bundle, add traefik pod logs we gathered in earlier step
11. restore cronjob, amount of replicas and disable debug logging, revert audit log changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment