Skip to content

Instantly share code, notes, and snippets.

@leonj1
Created August 22, 2020 01:34
Show Gist options
  • Save leonj1/287d71c5aa5a7b6b0ee1a76c3cd6c91e to your computer and use it in GitHub Desktop.
Save leonj1/287d71c5aa5a7b6b0ee1a76c3cd6c91e to your computer and use it in GitHub Desktop.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready master 421d v1.15.0
node2 NotReady <none> 302d v1.15.0
node3 NotReady master 421d v1.15.0
node4 Ready <none> 421d v1.15.0
node5 NotReady <none> 302d v1.15.0
node6 Ready master 55d v1.15.3
node7 NotReady master 421d v1.15.0
node8 NotReady <none> 421d v1.15.0
node9 NotReady <none> 421d v1.15.0
kubeadm alpha certs check-expiration
CERTIFICATE EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
admin.conf Jun 26, 2021 15:50 UTC 308d no
apiserver Jun 26, 2021 15:50 UTC 308d no
apiserver-etcd-client Jun 26, 2021 18:17 UTC 308d no
apiserver-kubelet-client Jun 26, 2021 15:50 UTC 308d no
controller-manager.conf Jun 26, 2021 15:50 UTC 308d no
etcd-healthcheck-client Jun 26, 2021 18:17 UTC 308d no
etcd-peer Jun 26, 2021 18:17 UTC 308d no
etcd-server Jun 26, 2021 18:17 UTC 308d no
front-proxy-client Jun 26, 2021 15:50 UTC 308d yes
scheduler.conf Jun 26, 2021 15:50 UTC 308d no
k get po -n kube-system | grep -E "coredns|flannel"
coredns-6d47d6497-6lfh9 0/1 Terminating 0 54d
coredns-6d47d6497-gmnkl 0/1 Terminating 0 54d
coredns-6d47d6497-mqp77 0/1 ContainerCreating 0 49d
coredns-6d47d6497-mxxbw 0/1 ContainerCreating 0 45d
kube-flannel-ds-amd64-5z2dk 0/1 CrashLoopBackOff 13778 54d
kube-flannel-ds-amd64-9xppj 0/1 CrashLoopBackOff 3250 54d
kube-flannel-ds-amd64-bmdrs 0/1 CrashLoopBackOff 2270 54d
kube-flannel-ds-amd64-g5hnf 0/1 Terminating 0 56d
kube-flannel-ds-amd64-mkxpf 0/1 Terminating 0 56d
kube-flannel-ds-amd64-s28vg 0/1 CrashLoopBackOff 13712 53d
kube-flannel-ds-amd64-sf88m 0/1 CrashLoopBackOff 1133 54d
kube-flannel-ds-amd64-tfh94 1/1 Running 2916 53d
kube-flannel-ds-amd64-wql4l 0/1 CrashLoopBackOff 13336 54d
# journalctl -xeu kubelet (trying to see why flannel is having problems on starting on a node)
...
Aug 22 00:53:03 ip-10-161-160-21 kubelet[2960]: E0822 00:53:03.890718 2960 cni.go:331] Error adding kube-system_nvidia-device-plugin-daemonset-xrspw/95197b48230757137dc006c1a7eccc0c8924039fb6095bf11a87d13085fc91ce to network flannel/cbr0: open /run/flannel/subnet.env: no such file or directory
Aug 22 00:53:03 ip-10-161-160-21 kubelet[2960]: E0822 00:53:03.892881 2960 cni.go:331] Error adding nginx-ingress_nginx-ingress-mjwnk/126ff2cc962c4d869ed7ab52727856b3b97589265f3f0d28fb524efe76f59da8 to network flannel/cbr0: open /run/flannel/subnet.env: no such file or directory
Aug 22 00:53:04 ip-10-161-160-21 kubelet[2960]: W0822 00:53:04.894743 2960 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "ebs-csi-controller-0_kube-system": CNI failed to retrieve network namespace path: cannot find network namespace for the t
Aug 22 00:53:04 ip-10-161-160-21 kubelet[2960]: W0822 00:53:04.897834 2960 pod_container_deletor.go:75] Container "b852ebdba0fac62d2cec46a10e7768f370983f444fa3e87a629c8b7c945941f2" not found in pod's containers
Aug 22 00:53:04 ip-10-161-160-21 kubelet[2960]: W0822 00:53:04.900028 2960 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "b852ebdba0fac62d2cec46a10e7768f370983f444fa3e87a629c8b7c945941f2"
Aug 22 00:53:04 ip-10-161-160-21 kubelet[2960]: E0822 00:53:04.907399 2960 pod_workers.go:190] Error syncing pod 0d6dc046-81eb-42d5-ae50-0a3bdbe6c5e9 ("kube-flannel-ds-amd64-wql4l_kube-system(0d6dc046-81eb-42d5-ae50-0a3bdbe6c5e9)"), skipping: failed to "StartContainer" for "kube-flannel" with CrashLoopBackOff: "B
Aug 22 00:53:06 ip-10-161-160-21 kubelet[2960]: E0822 00:53:05.936271 2960 kuberuntime_manager.go:883] PodSandboxStatus of sandbox "1414c93382055136f2681fd34542ee236f13d501433d156a16a487a41d58c411" for pod "ebs-csi-controller-0_kube-system(2090339b-525f-47e3-b9e5-f86cb2f012cb)" error: rpc error: code = Unknown de
Aug 22 00:53:07 ip-10-161-160-21 kubelet[2960]: E0822 00:53:07.175113 2960 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "95197b48230757137dc006c1a7eccc0c8924039fb6095bf11a87d13085fc91ce" network for pod "nvidia-device-plugin-
Aug 22 00:53:07 ip-10-161-160-21 kubelet[2960]: E0822 00:53:07.175165 2960 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-xrspw_kube-system(0930b1aa-dd39-4f16-85dc-d6906e8c75d5)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "95197b48230757137dc00
Aug 22 00:53:07 ip-10-161-160-21 kubelet[2960]: E0822 00:53:07.175192 2960 kuberuntime_manager.go:688] createPodSandbox for pod "nvidia-device-plugin-daemonset-xrspw_kube-system(0930b1aa-dd39-4f16-85dc-d6906e8c75d5)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "95197b48230757137dc0
Aug 22 00:53:07 ip-10-161-160-21 kubelet[2960]: E0822 00:53:07.175256 2960 pod_workers.go:190] Error syncing pod 0930b1aa-dd39-4f16-85dc-d6906e8c75d5 ("nvidia-device-plugin-daemonset-xrspw_kube-system(0930b1aa-dd39-4f16-85dc-d6906e8c75d5)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemons
# docker logs for flannel container
docker logs 79a162264a7e
I0822 01:15:15.603627 1 main.go:518] Determining IP address of default interface
I0822 01:15:15.603811 1 main.go:531] Using interface with name eth0 and address nodex
I0822 01:15:15.603824 1 main.go:548] Defaulting external address to interface address (nodex)
W0822 01:15:15.603831 1 client_config.go:517] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E0822 01:15:45.605629 1 main.go:243] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-amd64-tfh94': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-amd64-tfh94: dial tcp 10.96.0.1:443: i/o timeout
@sumitKash
Copy link

Hi , i am going to renew my cluster which is on version 1.13.
Can you share the details of what went wrong pls, will be very helpful for me.

@leonj1
Copy link
Author

leonj1 commented Dec 15, 2020

After much battling and wrestling, I never managed to solve this.
Ended up tearing this cluster down and starting from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment