Skip to content

Instantly share code, notes, and snippets.

@toschneck
Last active March 25, 2022 17:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save toschneck/dd71139156c52415fcab5c82baa06e17 to your computer and use it in GitHub Desktop.
Save toschneck/dd71139156c52415fcab5c82baa06e17 to your computer and use it in GitHub Desktop.
pefromance-metrics-explore.md

Grafana

Alerts history

sum(ALERTS_FOR_STATE) by (alertname)

Loki

LogQL in 5 Min

etcd too long response

{namespace="kube-system"} |~ "took too long .* to execute"

counted:

count_over_time({namespace="kube-system"} |~ "took too long .* to execute" [60m])

etcd client switching to pick first

count_over_time({namespace="kube-system"} |~ "ClientConn switching balancer to \"pick_first" [60m])

combined api server and etcd logs

{component=~"kube-apiserver|etcd",namespace="kube-system"}

K8s Event Logger

Node Port proxy failed

{namespace="kube-system", container_name="k8s-event-logger"} |~ "nodeport-proxy-envoy.*failed"

counter

count_over_time({namespace="kube-system", container_name="k8s-event-logger"} |~ "nodeport-proxy-envoy.*failed"[1h]) > 0

SystemOOM events

{namespace="kube-system", container_name="k8s-event-logger"} |~ "SystemOOM"

count

count_over_time({namespace="kube-system", container_name="k8s-event-logger"} |~ "SystemOOM"[4h]) > 0

filter events, execule containerd prefix stdout F and parse JSON

{namespace="kube-system", container_name="k8s-event-logger"}
 |regexp `(?P<pre>(?s)(.+?).*stdout F )(?P<json>(?s)(.+?)$)`
  | line_format "{{.json}}" |json | reason="SystemOOM" 
   |regexp `(?P<pre2>(?s)(.+?).*victim process: )(?P<victim>(?s)(.+?))\, pid`
    | line_format "{{.involvedObject_name}} {{.victim}} {{.message}}"

Use Regex expression

Benchmarking

Tools

etcdperf:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: etcdperf
  name: etcdperf
  namespace: monitoring
spec:
  selector:
    matchLabels:
      run: etcdperf
  template:
    metadata:
      labels:
        run: etcdperf
    spec:
      nodeSelector:
        node-role.kubernetes.io/master: ""
      containers:
      - image: quay.io/openshift-scale/etcd-perf
        name: etcdperf
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcdperf-cron
spec:
  schedule: "10 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: etcdperf
            image: quay.io/openshift-scale/etcd-perf:latest
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - date; echo FIO job
          restartPolicy: OnFailure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment