Pods have a big number of restarts
NAME READY STATUS RESTARTS AGE
details-v1-6798fccf5f-t7zqc 2/2 Running 0 22h
productpage-v1-5f7b97679-gn2js 2/2 Running 190 22h
ratings-v1-5675c99f79-66c96 2/2 Running 0 22h
reviews-v1-586cb488f9-cjxxz 2/2 Running 190 17h
reviews-v2-67ccbd89c7-4jgcr 2/2 Running 242 1d
reviews-v3-6fd9fddb9f-mczvf 2/2 Running 190 17h
staging-nc-nutcracker-579b75498c-lg5p7 2/2 Running 364 1d
xx-homepage-7f97cb6cdf-l5c8s 2/2 Running 190 17h
xx-homepage-7f97cb6cdf-r7cnx 2/2 Running 242 1d
xx-homepage-7f97cb6cdf-wv4lj 2/2 Running 190 17h
Logs from one of those pods:
Normal SuccessfulMountVolume 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm MountVolume.SetUp succeeded for volume "istio-envoy"
Normal SuccessfulMountVolume 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm MountVolume.SetUp succeeded for volume "default-token-92zgx"
Warning NetworkNotReady 1m (x3 over 1m) kubelet, gke-lab4-default-pool-0eb9f919-qdxm network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Normal SuccessfulMountVolume 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm MountVolume.SetUp succeeded for volume "istio-certs"
Normal SandboxChanged 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Pod sandbox changed, it will be killed and re-created.
Normal Started 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Started container
Normal Created 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Created container
Normal Pulled 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Container image "docker.io/istio/proxy_init:0.7.1" already present on machine
Normal Pulled 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Container image "alpine" already present on machine
Normal Created 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Created container
Normal Pulling 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm pulling image "ipedrazas/multicluster:v0.4"
Normal Started 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Started container
Normal Pulled 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Successfully pulled image "ipedrazas/multicluster:v0.4"
Normal Created 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Created container
Normal Started 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Started container
Normal Pulled 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Container image "docker.io/istio/proxy:0.7.1" already present on machine
Normal Created 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Created container
Normal Started 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Started container
There are 2 messages that are a bit weird:
Warning NetworkNotReady 1m (x3 over 1m) kubelet, gke-lab4-default-pool-0eb9f919-qdxm network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Normal SandboxChanged 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Pod sandbox changed, it will be killed and re-created.
It seems related to:
- node not having enough resources
-> % pods -owide
NAME READY STATUS RESTARTS AGE IP NODE
details-v1-6798fccf5f-t7zqc 2/2 Running 0 23h 10.8.10.5 gke-lab4-default-pool-0eb9f919-j5hm
productpage-v1-5f7b97679-26pzf 2/2 Running 0 56s 10.8.11.7 gke-lab4-default-pool-0eb9f919-w60m
productpage-v1-5f7b97679-gn2js 0/2 Terminating 194 23h 10.8.9.237 gke-lab4-default-pool-0eb9f919-qdxm
ratings-v1-5675c99f79-66c96 2/2 Running 0 23h 10.8.11.4 gke-lab4-default-pool-0eb9f919-w60m
reviews-v1-586cb488f9-cjxxz 0/2 Error 194 18h 10.8.13.240 gke-lab4-default-pool-0eb9f919-bn5z
reviews-v2-67ccbd89c7-4jgcr 2/2 Running 248 1d 10.8.7.103 gke-lab4-default-pool-0eb9f919-0qv0
reviews-v3-6fd9fddb9f-mczvf 0/2 Error 194 18h 10.8.13.243 gke-lab4-default-pool-0eb9f919-bn5z
staging-nc-nutcracker-579b75498c-lg5p7 2/2 Running 373 1d 10.8.7.96 gke-lab4-default-pool-0eb9f919-0qv0
xx-homepage-7f97cb6cdf-l5c8s 0/2 Error 194 18h 10.8.13.242 gke-lab4-default-pool-0eb9f919-bn5z
xx-homepage-7f97cb6cdf-r7cnx 2/2 Running 248 1d 10.8.7.107 gke-lab4-default-pool-0eb9f919-0qv0
xx-homepage-7f97cb6cdf-wv4lj 2/2 Running 194 18h 10.8.9.235 gke-lab4-default-pool-0eb9f919-qdxm
All the pods in the same node are killed at the same time. Points to a node issue
reviews-v1-586cb488f9-cjxxz 0/2 Error 194 18h 10.8.13.240 gke-lab4-default-pool-0eb9f919-bn5z
reviews-v3-6fd9fddb9f-mczvf 0/2 Error 194 18h 10.8.13.243 gke-lab4-default-pool-0eb9f919-bn5z
xx-homepage-7f97cb6cdf-l5c8s 0/2 Error 194 18h 10.8.13.242 gke-lab4-default-pool-0eb9f919-bn5z
Wonder what will happen when mchines are recycled (1h to go):
-> % nodes -owide
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-lab4-default-pool-0eb9f919-0qv0 Ready <none> 23h v1.10.2-gke.1 35.187.15.126 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
gke-lab4-default-pool-0eb9f919-bn5z Ready <none> 21h v1.10.2-gke.1 35.195.192.172 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
gke-lab4-default-pool-0eb9f919-j5hm Ready <none> 23h v1.10.2-gke.1 35.205.184.102 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
gke-lab4-default-pool-0eb9f919-qdxm Ready <none> 23h v1.10.2-gke.1 35.233.89.91 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
gke-lab4-default-pool-0eb9f919-w60m Ready <none> 23h v1.10.2-gke.1 35.233.97.202 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
``
* Going to GCP and deleting the node has no effect, pods keeps dying.
* Last time we had this problem we fixed it by modifying the resources of the deployed pods. Not sure why resources are fine until they're not. It feels more like a combination of resources available in the node and resources used by pods. Noisy neighbour?
Downgraded the kubernetes version, machines have different names and pods have the
And pods have the right age: