Skip to content

Instantly share code, notes, and snippets.

View bysnupy's full-sized avatar
🦉
Dry eyes ...

Daein Park bysnupy

🦉
Dry eyes ...
View GitHub Profile
@bysnupy
bysnupy / pleg_unhealthy_metrics_table.md
Last active October 19, 2019 07:23
PLEG is not healthy metrics table
Metric Description As of Kubernetes 1.14(OpenShift 4.2)
kubelet_pleg_relist_interval_microseconds Interval in microseconds between "relist" calls kubelet_pleg_relist_interval_seconds
kubelet_pleg_relist_latency_microseconds Latency in microseconds for "relist" kubelet_pleg_relist_duration_seconds
kubelet_runtime_operations Cumulative number of runtime operations by operation type kubelet_runtime_operations_total
kubelet_runtime_operations_latency_microseconds Latency in microseconds of runtime operations. Broken down by operation type kubelet_runtime_operations_duration_seconds
@bysnupy
bysnupy / pleg_unhealthy_metrics_output.md
Created October 19, 2019 07:26
PLEG is not healthy metrics output
# HELP kubelet_pleg_relist_interval_microseconds Interval in microseconds between relisting in PLEG.
# TYPE kubelet_pleg_relist_interval_microseconds summary
kubelet_pleg_relist_interval_microseconds{quantile="0.5"} 1.054052e+06
kubelet_pleg_relist_interval_microseconds{quantile="0.9"} 1.074873e+06
kubelet_pleg_relist_interval_microseconds{quantile="0.99"} 1.126039e+06
kubelet_pleg_relist_interval_microseconds_count 5146

# HELP kubelet_pleg_relist_latency_microseconds Latency in microseconds for relisting pods in PLEG.
# TYPE kubelet_pleg_relist_latency_microseconds summary
OpenShift version: 3.11
etcd information:
- cluster id: aaaaaaaaaaaaaaaa
  - 111111111111111: master1.ocp.example.com:https://10.0.1.10:2380:https://10.0.1.10:2379 (This member is failed.)
  - 222222222222222: master2.ocp.example.com:https://10.0.1.20:2380:https://10.0.1.20:2379
  - 333333333333333: master3.ocp.example.com:https://10.0.1.30:2380:https://10.0.1.30:2379
# oc get pod -n kube-system
 NAME READY STATUS RESTARTS AGE
 :
 master-etcd-master1.ocp.example.com 0/1 CrashLoopBackOff 10 15m
 master-etcd-master2.ocp.example.com 1/1 Running 1 226d
 master-etcd-master3.ocp.example.com 1/1 Running 1 226d
# oc logs master-etcd-master1.ocp.example.com
 :
 2019–12–25 10:15:24.291020 C | raft: tocommit(18928) is out of range [lastIndex(13100)]. Was the raft log corrupted, truncated, or lost?
sh-4.2# etcdctl - cert=$ETCD_PEER_CERT_FILE - key=$ETCD_PEER_KEY_FILE - cacert=$ETCD_TRUSTED_CA_FILE - endpoints=$ETCD_LISTEN_CLIENT_URLS member remove 111111111111111
 Member 111111111111111 removed from cluster aaaaaaaaaaaaaaaa
sh-4.2# etcdctl - cert=$ETCD_PEER_CERT_FILE - key=$ETCD_PEER_KEY_FILE - cacert=$ETCD_TRUSTED_CA_FILE - endpoints=$ETCD_LISTEN_CLIENT_URLS - write-out=table member list
 + - - - - - - - - - + - - - - -+ - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - + - - - - - - - - - - - - - - +
 | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
 + - - - - - - - - - + - - - - -+ - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - + - - - - - - - - - - - - - - +
 | 222222222222222 | started | master2.ocp.example.com | https://10.0.1.20:2380 | https://10.0.1.20:2379 |
 | 333333333333333 | started | master3.ocp.example.com | https://10.0.1.30:2380 | https://10.0.1.30:2379 |
 + - - - - - - - - - + - - - - -+ - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - + - - - - - - - 
# ssh master1
 master1 ~# mv /etc/origin/node/pods/etcd.yaml .
 master1 ~# oc get pod -n kube-system
 NAME READY STATUS RESTARTS AGE
 :
 master-etcd-master2.ocp.example.com 1/1 Running 2 1m
 master-etcd-master3.ocp.example.com 1/1 Running 1 226d
 master1 ~# mv /var/lib/etcd /var/lib/etcd_bak
 master1 ~# mkdir /var/lib/etcd
# oc rsh master-etcd-master2.ocp.example.com
 sh-4.2# source /etc/etcd/etcd.conf
 sh-4.2# export ETCDCTL_API=3
 sh-4.2# etcdctl - cert=$ETCD_PEER_CERT_FILE - key=$ETCD_PEER_KEY_FILE - cacert=$ETCD_TRUSTED_CA_FILE - endpoints=$ETCD_LISTEN_CLIENT_URLS member add master1.ocp.example.com - peer-urls https://10.0.1.10:2380
 Member 444444444444444 added to cluster aaaaaaaaaaaaaaaa
ETCD_NAME="master1.ocp.example.com"
 ETCD_INITIAL_CLUSTER="master1.ocp.example.com=https://10.0.1.10:2380,master2.ocp.example.com=https://10.0.1.20:2380,master3.ocp.example.com=https://10.0.1.30:2380"
 ETCD_INITIAL_CLUSTER_STATE="existing"
# ssh master1
 master1 ~# vim /etc/etcd/etcd.conf
 ETCD_NAME="master1.ocp.example.com"
 :
 ETCD_INITIAL_CLUSTER="master1.ocp.example.com=https://10.0.1.10:2380,master2.ocp.example.com=https://10.0.1.20:2380,master3.ocp.example.com=https://10.0.1.30:2380"
 ETCD_INITIAL_CLUSTER_STATE="existing"
master1 ~# mv etcd.yaml /etc/origin/node/pods/
 master1 ~# oc get pod -n kube-system
 NAME READY STATUS RESTARTS AGE
 :
 master-etcd-master1.ocp.example.com 1/1 Running 0 5s
 master-etcd-master2.ocp.example.com 1/1 Running 2 17m
 master-etcd-master3.ocp.example.com 1/1 Running 1 226d
master1 ~# oc logs master-etcd-master1.ocp.example.com
 :
# oc rsh master-etcd-master2.ocp.example.com
 sh-4.2# source /etc/etcd/etcd.conf
 sh-4.2# export ETCDCTL_API=3
 sh-4.2# ETCD_ALL_ENDPOINTS=` etcdctl - cert=$ETCD_PEER_CERT_FILE - key=$ETCD_PEER_KEY_FILE - cacert=$ETCD_TRUSTED_CA_FILE - endpoints=$ETCD_LISTEN_CLIENT_URLS - write-out=fields member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'`
 sh-4.2# etcdctl - cert=$ETCD_PEER_CERT_FILE - key=$ETCD_PEER_KEY_FILE - cacert=$ETCD_TRUSTED_CA_FILE - endpoints=$ETCD_LISTEN_CLIENT_URLS - write-out=table member list
 + - - - - - - - - - + - - - - -+ - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - + - - - - - - - - - - - - - - +
 | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
 + - - - - - - - - - + - - - - -+ - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - + - - - - - - - - - - - - - - +
 | 444444444444444 | started | master1.ocp.example.com | https://10.0.1.10:2380 | https://10.0.1.10:2379 |