Pods have a big number of restarts
NAME READY STATUS RESTARTS AGE
details-v1-6798fccf5f-t7zqc 2/2 Running 0 22h
productpage-v1-5f7b97679-gn2js 2/2 Running 190 22h
ratings-v1-5675c99f79-66c96 2/2 Running 0 22h
reviews-v1-586cb488f9-cjxxz 2/2 Running 190 17h
reviews-v2-67ccbd89c7-4jgcr 2/2 Running 242 1d
reviews-v3-6fd9fddb9f-mczvf 2/2 Running 190 17h
staging-nc-nutcracker-579b75498c-lg5p7 2/2 Running 364 1d
xx-homepage-7f97cb6cdf-l5c8s 2/2 Running 190 17h
xx-homepage-7f97cb6cdf-r7cnx 2/2 Running 242 1d
xx-homepage-7f97cb6cdf-wv4lj 2/2 Running 190 17h
Logs from one of those pods:
Normal SuccessfulMountVolume 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm MountVolume.SetUp succeeded for volume "istio-envoy"
Normal SuccessfulMountVolume 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm MountVolume.SetUp succeeded for volume "default-token-92zgx"
Warning NetworkNotReady 1m (x3 over 1m) kubelet, gke-lab4-default-pool-0eb9f919-qdxm network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Normal SuccessfulMountVolume 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm MountVolume.SetUp succeeded for volume "istio-certs"
Normal SandboxChanged 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Pod sandbox changed, it will be killed and re-created.
Normal Started 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Started container
Normal Created 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Created container
Normal Pulled 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Container image "docker.io/istio/proxy_init:0.7.1" already present on machine
Normal Pulled 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Container image "alpine" already present on machine
Normal Created 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Created container
Normal Pulling 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm pulling image "ipedrazas/multicluster:v0.4"
Normal Started 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Started container
Normal Pulled 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Successfully pulled image "ipedrazas/multicluster:v0.4"
Normal Created 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Created container
Normal Started 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Started container
Normal Pulled 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Container image "docker.io/istio/proxy:0.7.1" already present on machine
Normal Created 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Created container
Normal Started 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Started container
There are 2 messages that are a bit weird:
Warning NetworkNotReady 1m (x3 over 1m) kubelet, gke-lab4-default-pool-0eb9f919-qdxm network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Normal SandboxChanged 1m kubelet, gke-lab4-default-pool-0eb9f919-qdxm Pod sandbox changed, it will be killed and re-created.
It seems related to:
- node not having enough resources
-> % pods -owide
NAME READY STATUS RESTARTS AGE IP NODE
details-v1-6798fccf5f-t7zqc 2/2 Running 0 23h 10.8.10.5 gke-lab4-default-pool-0eb9f919-j5hm
productpage-v1-5f7b97679-26pzf 2/2 Running 0 56s 10.8.11.7 gke-lab4-default-pool-0eb9f919-w60m
productpage-v1-5f7b97679-gn2js 0/2 Terminating 194 23h 10.8.9.237 gke-lab4-default-pool-0eb9f919-qdxm
ratings-v1-5675c99f79-66c96 2/2 Running 0 23h 10.8.11.4 gke-lab4-default-pool-0eb9f919-w60m
reviews-v1-586cb488f9-cjxxz 0/2 Error 194 18h 10.8.13.240 gke-lab4-default-pool-0eb9f919-bn5z
reviews-v2-67ccbd89c7-4jgcr 2/2 Running 248 1d 10.8.7.103 gke-lab4-default-pool-0eb9f919-0qv0
reviews-v3-6fd9fddb9f-mczvf 0/2 Error 194 18h 10.8.13.243 gke-lab4-default-pool-0eb9f919-bn5z
staging-nc-nutcracker-579b75498c-lg5p7 2/2 Running 373 1d 10.8.7.96 gke-lab4-default-pool-0eb9f919-0qv0
xx-homepage-7f97cb6cdf-l5c8s 0/2 Error 194 18h 10.8.13.242 gke-lab4-default-pool-0eb9f919-bn5z
xx-homepage-7f97cb6cdf-r7cnx 2/2 Running 248 1d 10.8.7.107 gke-lab4-default-pool-0eb9f919-0qv0
xx-homepage-7f97cb6cdf-wv4lj 2/2 Running 194 18h 10.8.9.235 gke-lab4-default-pool-0eb9f919-qdxm
All the pods in the same node are killed at the same time. Points to a node issue
reviews-v1-586cb488f9-cjxxz 0/2 Error 194 18h 10.8.13.240 gke-lab4-default-pool-0eb9f919-bn5z
reviews-v3-6fd9fddb9f-mczvf 0/2 Error 194 18h 10.8.13.243 gke-lab4-default-pool-0eb9f919-bn5z
xx-homepage-7f97cb6cdf-l5c8s 0/2 Error 194 18h 10.8.13.242 gke-lab4-default-pool-0eb9f919-bn5z
Wonder what will happen when mchines are recycled (1h to go):
-> % nodes -owide
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-lab4-default-pool-0eb9f919-0qv0 Ready <none> 23h v1.10.2-gke.1 35.187.15.126 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
gke-lab4-default-pool-0eb9f919-bn5z Ready <none> 21h v1.10.2-gke.1 35.195.192.172 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
gke-lab4-default-pool-0eb9f919-j5hm Ready <none> 23h v1.10.2-gke.1 35.205.184.102 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
gke-lab4-default-pool-0eb9f919-qdxm Ready <none> 23h v1.10.2-gke.1 35.233.89.91 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
gke-lab4-default-pool-0eb9f919-w60m Ready <none> 23h v1.10.2-gke.1 35.233.97.202 Container-Optimized OS from Google 4.14.22+ docker://17.3.2
``
* Going to GCP and deleting the node has no effect, pods keeps dying.
* Last time we had this problem we fixed it by modifying the resources of the deployed pods. Not sure why resources are fine until they're not. It feels more like a combination of resources available in the node and resources used by pods. Noisy neighbour?