Skip to content

Instantly share code, notes, and snippets.

@maiqueb
Last active June 13, 2022 14:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maiqueb/feead4bfe72f2bd1e9bb9b1eab915ee9 to your computer and use it in GitHub Desktop.
Save maiqueb/feead4bfe72f2bd1e9bb9b1eab915ee9 to your computer and use it in GitHub Desktop.

Poison Pill

---
apiVersion: poison-pill.medik8s.io/v1alpha1
kind: PoisonPillRemediation
metadata:
  creationTimestamp: "2022-06-08T19:51:00Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2022-06-08T19:51:00Z"
  finalizers:
  - poison-pill.medik8s.io/ppr-finalizer
  generation: 2
  labels:
    app.kubernetes.io/part-of: node-healthcheck-controller
  name: worker-0-0
  namespace: openshift-operators
  ownerReferences:
  - apiVersion: remediation.medik8s.io/v1alpha1
    controller: false
    kind: NodeHealthCheck
    name: nodehealthcheck-multi
    uid: 06d2efe8-b16e-4d6f-963f-237720d6719b
  resourceVersion: "166950"
  uid: 0dee9649-dfe4-4ff5-b5d3-2828e55b475d
spec:
  remediationStrategy: NodeDeletion
status:
  nodeBackup:
    apiVersion: v1
    kind: Node
    metadata:
      annotations:
        is-reboot-capable.poison-pill.medik8s.io: "true"
        k8s.ovn.org/host-addresses: '["192.168.123.66"]'
        k8s.ovn.org/l3-gateway-config: '{"default":{"mode":"shared","interface-id":"br-ex_worker-0-0","mac-address":"52:54:00:9a:20:9f","ip-addresses":["192.168.123.66/24"],"ip-address":"192.168.123.66/24","next-hops":["192.168.123.1"],"next-hop":"192.168.123.1","node-port-enable":"true","vlan-id":"0"}}'
        k8s.ovn.org/node-chassis-id: 78e0cf47-c9ab-4006-bb52-2239c1498a21
        k8s.ovn.org/node-mgmt-port-mac-address: 62:6c:3e:08:d4:a7
        k8s.ovn.org/node-primary-ifaddr: '{"ipv4":"192.168.123.66/24"}'
        k8s.ovn.org/node-subnets: '{"default":"10.129.2.0/23"}'
        machine.openshift.io/machine: openshift-machine-api/ocp-edge-cluster-0-7z778-worker-0-6gw69
        machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
        machineconfiguration.openshift.io/currentConfig: rendered-worker-e2f8956a0a474adb1bf0ea83d2f01d0e
        machineconfiguration.openshift.io/desiredConfig: rendered-worker-e2f8956a0a474adb1bf0ea83d2f01d0e
        machineconfiguration.openshift.io/reason: ""
        machineconfiguration.openshift.io/state: Done
        volumes.kubernetes.io/controller-managed-attach-detach: "true"
      creationTimestamp: "2022-06-08T13:10:03Z"
      labels:
        beta.kubernetes.io/arch: amd64
        beta.kubernetes.io/os: linux
        kubernetes.io/arch: amd64
        kubernetes.io/hostname: worker-0-0
        kubernetes.io/os: linux
        node-role.kubernetes.io/worker: ""
        node.openshift.io/os_id: rhcos
      managedFields:
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              f:machineconfiguration.openshift.io/controlPlaneTopology: {}
        manager: machine-config-controller
        operation: Update
        time: "2022-06-08T13:10:03Z"
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:spec:
            f:providerID: {}
        manager: machine-controller-manager
        operation: Update
        time: "2022-06-08T13:10:03Z"
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              f:k8s.ovn.org/node-subnets: {}
        manager: master-0-0
        operation: Update
        time: "2022-06-08T13:10:03Z"
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              f:machine.openshift.io/machine: {}
        manager: nodelink-controller
        operation: Update
        time: "2022-06-08T13:10:03Z"
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              f:k8s.ovn.org/host-addresses: {}
              f:k8s.ovn.org/l3-gateway-config: {}
              f:k8s.ovn.org/node-chassis-id: {}
              f:k8s.ovn.org/node-mgmt-port-mac-address: {}
              f:k8s.ovn.org/node-primary-ifaddr: {}
        manager: worker-0-0
        operation: Update
        time: "2022-06-08T13:10:43Z"
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              f:machineconfiguration.openshift.io/currentConfig: {}
              f:machineconfiguration.openshift.io/desiredConfig: {}
              f:machineconfiguration.openshift.io/reason: {}
              f:machineconfiguration.openshift.io/state: {}
        manager: machine-config-daemon
        operation: Update
        time: "2022-06-08T13:10:53Z"
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:status:
            f:conditions:
              k:{"type":"DiskPressure"}:
                f:lastHeartbeatTime: {}
              k:{"type":"MemoryPressure"}:
                f:lastHeartbeatTime: {}
              k:{"type":"PIDPressure"}:
                f:lastHeartbeatTime: {}
              k:{"type":"Ready"}:
                f:lastHeartbeatTime: {}
            f:images: {}
        manager: Go-http-client
        operation: Update
        time: "2022-06-08T19:11:49Z"
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:spec:
            f:taints: {}
          f:status:
            f:conditions:
              k:{"type":"DiskPressure"}:
                f:lastTransitionTime: {}
                f:message: {}
                f:reason: {}
                f:status: {}
              k:{"type":"MemoryPressure"}:
                f:lastTransitionTime: {}
                f:message: {}
                f:reason: {}
                f:status: {}
              k:{"type":"PIDPressure"}:
                f:lastTransitionTime: {}
                f:message: {}
                f:reason: {}
                f:status: {}
              k:{"type":"Ready"}:
                f:lastTransitionTime: {}
                f:message: {}
                f:reason: {}
                f:status: {}
        manager: kube-controller-manager
        operation: Update
        time: "2022-06-08T19:51:00Z"
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              f:is-reboot-capable.poison-pill.medik8s.io: {}
          f:spec:
            f:unschedulable: {}
        manager: manager
        operation: Update
        time: "2022-06-08T19:51:00Z"
      name: worker-0-0
      resourceVersion: "166946"
      uid: e72804c3-b841-4087-8199-cf9fb3f8da1f
    spec:
      providerID: baremetalhost:///openshift-machine-api/openshift-worker-0-0/fd3a6c6c-2f69-4382-bc52-27690df6773c
      taints:
      - effect: NoSchedule
        key: node.kubernetes.io/unreachable
        timeAdded: "2022-06-08T19:48:52Z"
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        timeAdded: "2022-06-08T19:48:57Z"
      - effect: NoSchedule
        key: node.kubernetes.io/unschedulable
        timeAdded: "2022-06-08T19:51:00Z"
      unschedulable: true
    status:
      addresses:
      - address: 192.168.123.66
        type: InternalIP
      - address: worker-0-0
        type: Hostname
      allocatable:
        cpu: 7500m
        ephemeral-storage: "38161122446"
        hugepages-1Gi: "0"
        hugepages-2Mi: "0"
        memory: 31752504Ki
        pods: "250"
      capacity:
        cpu: "8"
        ephemeral-storage: 41407468Ki
        hugepages-1Gi: "0"
        hugepages-2Mi: "0"
        memory: 32903480Ki
        pods: "250"
      conditions:
      - lastHeartbeatTime: "2022-06-08T19:47:34Z"
        lastTransitionTime: "2022-06-08T19:48:52Z"
        message: Kubelet stopped posting node status.
        reason: NodeStatusUnknown
        status: Unknown
        type: MemoryPressure
      - lastHeartbeatTime: "2022-06-08T19:47:34Z"
        lastTransitionTime: "2022-06-08T19:48:52Z"
        message: Kubelet stopped posting node status.
        reason: NodeStatusUnknown
        status: Unknown
        type: DiskPressure
      - lastHeartbeatTime: "2022-06-08T19:47:34Z"
        lastTransitionTime: "2022-06-08T19:48:52Z"
        message: Kubelet stopped posting node status.
        reason: NodeStatusUnknown
        status: Unknown
        type: PIDPressure
      - lastHeartbeatTime: "2022-06-08T19:47:34Z"
        lastTransitionTime: "2022-06-08T19:48:52Z"
        message: Kubelet stopped posting node status.
        reason: NodeStatusUnknown
        status: Unknown
        type: Ready
      daemonEndpoints:
        kubeletEndpoint:
          Port: 10250
      images:
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5464c60a20bf697933a9b88e6ed3f01fa6f2d56f895b70c57a5bebe0179e40db
        sizeBytes: 1052658293
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a8d2a3aff8eb4d3148b9b1555417a82fdd34231b2cb5e4855c138888f832dc99
        sizeBytes: 690854570
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f5d254b52f95f0874f21a84fb0f997772039bcf25895518e419160ac513f053c
        sizeBytes: 657003652
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1e4e12fddde318344ee0f42b42fda9dd630730d9f56c5c03a3d4c709284a12c9
        sizeBytes: 629068310
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9dbcee7091d8dfbcc733f96da159ba0f50666477c14bbc21a60af3acc7a8115e
        sizeBytes: 550585947
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:968c55e2aceeea83edff635fadafc21af88c9c769bf1fa1aa10502701d000dd5
        sizeBytes: 513946200
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a3f4e4cf64df0e0fde74a50e875ae95de2d913a85d42cb10ece8df693941c328
        sizeBytes: 497395348
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:46d599ef5e8952827cf4d7a1787c5eafaa6b3003f823c32484116a86dfb273ec
        sizeBytes: 473727443
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2c28d844818f9bc139d84b86c8ec75bd2db7c8cdb80d653e7c2ee0d487ec7b99
        sizeBytes: 470558395
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e33bf3759eb15a5feba08eec0150f575a568936ad4b47cc1a56ff94465407264
        sizeBytes: 463839911
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba44dead03ea74107f90d58525106fb52d27a120b73c6cc8e2be31d37043ca1c
        sizeBytes: 458030557
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:928ef7f43a9c793ebe2a6134acedfd487e5d94d79e3507394525091dc92679dd
        sizeBytes: 426076007
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:139aadb1611f22601b2b17e7599c8b7c1d98c4f1b7a404a87a576f06e25a09e6
        sizeBytes: 416037557
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a53ca7145189af06a4a58360a4bbe84907e4ec4e598a8be0d1439ae6c5d4387
        sizeBytes: 412926579
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:917832f1fce7b620637edbadd6c3d5fe7455ba04232129ea4b570d5e65b63100
        sizeBytes: 408319802
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2b423e88cdd37f307aff51cbb0f53fc45deff9618f5b4f12bfb78bea7aff51a2
        sizeBytes: 401179516
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3df935837634adb5df8080ac7263c7fb4c4f9d8fd45b36e32ca4fb802bdeaecc
        sizeBytes: 376461231
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb70d956fa55b463d4107c7a31375b0955f22763c50c608b755079978bc6ec0b
        sizeBytes: 369747118
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2eb78fa376f816585903fd72d3403c2af3e5a31563c17ca9207d1c0fbb4a7dfb
        sizeBytes: 350343390
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ec4e9eea3829fc91e836d2976349549aab358473aa163c684a6b3a9c7d64975
        sizeBytes: 344091715
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04c5cfcb4700850a3d58767c37fcf6abf8838b7d7ccb0778004f2d164a7acb97
        sizeBytes: 342771165
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a5a3a2b75ae62a921a4ab43b55f99401e93d55da42c5ba4b5cbd98c1bc785fff
        sizeBytes: 337246617
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fcb3cb6b22fe4946336955472cfbfd0d47034b837f3113cbcf219be77ad64cae
        sizeBytes: 334948496
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0c134d4266596cde4ff501098aab2238bb7da8452a00bb4116eaf00f7bb479c
        sizeBytes: 318321441
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1dd92c848a3c88cd70b085664218a28bf14650d26e486f5c346570872906cd8
        sizeBytes: 309635970
      - names:
        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:35f3c8509bf9450c28e69bf0e134db9342107ad92bd24847acd9d01297976767
        sizeBytes: 305239231
      - names:
        - registry.redhat.io/openshift4/poison-pill-manager-rhel8-operator@sha256:6aed5380a4f9c0e75b2a3d1a86e490ce290ad3f7bf3b73d9392b2699a24d96d3
        - registry.redhat.io/openshift4/poison-pill-manager-rhel8-operator@sha256:ce985e973bebf94acd047c0f589d7e58faff038db7e9505cb00ab5e31e4ac4ef
        sizeBytes: 274759170
      - names:
        - registry.redhat.io/openshift4/poison-pill-manager-operator-bundle@sha256:4973dd7381609107531de8045a39fc5b2c34ff975d4f6f55770cca5361e09c16
        sizeBytes: 388183
      - names:
        - registry.redhat.io/node-healthcheck-operator-tech-preview/node-healthcheck-operator-bundle@sha256:20e279fbd1a6133986708d03d65a14c68f078589ff5eb3da32f0f1454ceb823b
        sizeBytes: 217045
      nodeInfo:
        architecture: amd64
        bootID: 332bdf17-598f-49a0-bf10-cc2d3155e9e5
        containerRuntimeVersion: cri-o://1.23.2-12.rhaos4.10.git5fe1720.el8
        kernelVersion: 4.18.0-305.49.1.el8_4.x86_64
        kubeProxyVersion: v1.23.5+3afdacb
        kubeletVersion: v1.23.5+3afdacb
        machineID: 312df18bc632462ea6cca6da3de08d10
        operatingSystem: linux
        osImage: Red Hat Enterprise Linux CoreOS 410.84.202206010432-0 (Ootpa)
        systemUUID: 312df18b-c632-462e-a6cc-a6da3de08d10
  timeAssumedRebooted: "2022-06-08T19:54:05Z"

Node after reboot

---
apiVersion: v1
kind: Node
metadata:
  annotations:
    is-reboot-capable.poison-pill.medik8s.io: "true"
    k8s.ovn.org/host-addresses: '["192.168.123.66","fd00:1101:0:1:6b7b:6516:192e:1aba"]'
    k8s.ovn.org/l3-gateway-config: '{"default":{"mode":"shared","interface-id":"br-ex_worker-0-0","mac-address":"52:54:00:9a:20:9f","ip-addresses":["192.168.123.66/24"],"ip-address":"192.168.123.66/24","next-hops":["192.168.123.1"],"next-hop":"192.168.123.1","node-port-enable":"true","vlan-id":"0"}}'
    k8s.ovn.org/node-chassis-id: 78e0cf47-c9ab-4006-bb52-2239c1498a21
    k8s.ovn.org/node-mgmt-port-mac-address: 62:6c:3e:08:d4:a7
    k8s.ovn.org/node-primary-ifaddr: '{"ipv4":"192.168.123.66/24"}'
    k8s.ovn.org/node-subnets: '{"default":"10.129.2.0/23"}'
    machine.openshift.io/machine: openshift-machine-api/ocp-edge-cluster-0-7z778-worker-0-6gw69
    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
    machineconfiguration.openshift.io/currentConfig: rendered-worker-e2f8956a0a474adb1bf0ea83d2f01d0e
    machineconfiguration.openshift.io/desiredConfig: rendered-worker-e2f8956a0a474adb1bf0ea83d2f01d0e
    machineconfiguration.openshift.io/reason: ""
    machineconfiguration.openshift.io/state: Done
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2022-06-08T19:55:36Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: worker-0-0
    kubernetes.io/os: linux
    node-role.kubernetes.io/worker: ""
    node.openshift.io/os_id: rhcos
  name: worker-0-0
  resourceVersion: "172009"
  uid: db9df912-9a97-4e90-9c47-145df8af38c1
spec:
  providerID: baremetalhost:///openshift-machine-api/openshift-worker-0-0/fd3a6c6c-2f69-4382-bc52-27690df6773c
status:
  addresses:
  - address: 192.168.123.66
    type: InternalIP
  - address: worker-0-0
    type: Hostname
  allocatable:
    cpu: 7500m
    ephemeral-storage: "38161122446"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 31752504Ki
    pods: "250"
  capacity:
    cpu: "8"
    ephemeral-storage: 41407468Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 32903480Ki
    pods: "250"
  conditions:
  - lastHeartbeatTime: "2022-06-08T19:55:41Z"
    lastTransitionTime: "2022-06-08T19:55:41Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2022-06-08T19:55:41Z"
    lastTransitionTime: "2022-06-08T19:55:41Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2022-06-08T19:55:41Z"
    lastTransitionTime: "2022-06-08T19:55:41Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2022-06-08T19:55:41Z"
    lastTransitionTime: "2022-06-08T19:55:41Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5464c60a20bf697933a9b88e6ed3f01fa6f2d56f895b70c57a5bebe0179e40db
    sizeBytes: 1052658293
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1e4e12fddde318344ee0f42b42fda9dd630730d9f56c5c03a3d4c709284a12c9
    sizeBytes: 629068310
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9dbcee7091d8dfbcc733f96da159ba0f50666477c14bbc21a60af3acc7a8115e
    sizeBytes: 550585947
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:968c55e2aceeea83edff635fadafc21af88c9c769bf1fa1aa10502701d000dd5
    sizeBytes: 513946200
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a3f4e4cf64df0e0fde74a50e875ae95de2d913a85d42cb10ece8df693941c328
    sizeBytes: 497395348
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:46d599ef5e8952827cf4d7a1787c5eafaa6b3003f823c32484116a86dfb273ec
    sizeBytes: 473727443
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2c28d844818f9bc139d84b86c8ec75bd2db7c8cdb80d653e7c2ee0d487ec7b99
    sizeBytes: 470558395
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e33bf3759eb15a5feba08eec0150f575a568936ad4b47cc1a56ff94465407264
    sizeBytes: 463839911
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba44dead03ea74107f90d58525106fb52d27a120b73c6cc8e2be31d37043ca1c
    sizeBytes: 458030557
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:928ef7f43a9c793ebe2a6134acedfd487e5d94d79e3507394525091dc92679dd
    sizeBytes: 426076007
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:139aadb1611f22601b2b17e7599c8b7c1d98c4f1b7a404a87a576f06e25a09e6
    sizeBytes: 416037557
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a53ca7145189af06a4a58360a4bbe84907e4ec4e598a8be0d1439ae6c5d4387
    sizeBytes: 412926579
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:917832f1fce7b620637edbadd6c3d5fe7455ba04232129ea4b570d5e65b63100
    sizeBytes: 408319802
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2b423e88cdd37f307aff51cbb0f53fc45deff9618f5b4f12bfb78bea7aff51a2
    sizeBytes: 401179516
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3df935837634adb5df8080ac7263c7fb4c4f9d8fd45b36e32ca4fb802bdeaecc
    sizeBytes: 376461231
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb70d956fa55b463d4107c7a31375b0955f22763c50c608b755079978bc6ec0b
    sizeBytes: 369747118
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2eb78fa376f816585903fd72d3403c2af3e5a31563c17ca9207d1c0fbb4a7dfb
    sizeBytes: 350343390
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ec4e9eea3829fc91e836d2976349549aab358473aa163c684a6b3a9c7d64975
    sizeBytes: 344091715
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04c5cfcb4700850a3d58767c37fcf6abf8838b7d7ccb0778004f2d164a7acb97
    sizeBytes: 342771165
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a5a3a2b75ae62a921a4ab43b55f99401e93d55da42c5ba4b5cbd98c1bc785fff
    sizeBytes: 337246617
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fcb3cb6b22fe4946336955472cfbfd0d47034b837f3113cbcf219be77ad64cae
    sizeBytes: 334948496
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0c134d4266596cde4ff501098aab2238bb7da8452a00bb4116eaf00f7bb479c
    sizeBytes: 318321441
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b1dd92c848a3c88cd70b085664218a28bf14650d26e486f5c346570872906cd8
    sizeBytes: 309635970
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:35f3c8509bf9450c28e69bf0e134db9342107ad92bd24847acd9d01297976767
    sizeBytes: 305239231
  - names:
    - registry.redhat.io/openshift4/poison-pill-manager-rhel8-operator@sha256:6aed5380a4f9c0e75b2a3d1a86e490ce290ad3f7bf3b73d9392b2699a24d96d3
    - registry.redhat.io/openshift4/poison-pill-manager-rhel8-operator@sha256:ce985e973bebf94acd047c0f589d7e58faff038db7e9505cb00ab5e31e4ac4ef
    sizeBytes: 274759170
  nodeInfo:
    architecture: amd64
    bootID: 91cd40f3-4702-4746-8a7a-870ee2a40140
    containerRuntimeVersion: cri-o://1.23.2-12.rhaos4.10.git5fe1720.el8
    kernelVersion: 4.18.0-305.49.1.el8_4.x86_64
    kubeProxyVersion: v1.23.5+3afdacb
    kubeletVersion: v1.23.5+3afdacb
    machineID: 312df18bc632462ea6cca6da3de08d10
    operatingSystem: linux
    osImage: Red Hat Enterprise Linux CoreOS 410.84.202206010432-0 (Ootpa)
    systemUUID: 312df18b-c632-462e-a6cc-a6da3de08d10

Conclusions so far

  • Same advertised subnet before and after node delete/add 10.129.2.0/23
  • Original node creation time < poison pill creation time < node assumed rebooted < node added back time
    • "2022-06-08T13:10:03Z" < "2022-06-08T19:51:00Z" < "2022-06-08T19:54:05Z" < "2022-06-08T19:55:36Z"
  • weird thing is poison pill creation timestamp == poison pill delete timestamp

After node re-creation

Finding relevant pod

find pods/poison-pill-* -name \*.yaml -exec grep -H "nodeName: worker-0-0" {} \;
pods/poison-pill-ds-m6fv6/poison-pill-ds-m6fv6.yaml:  nodeName: worker-0-0

PoisonPill ds

cat pods/poison-pill-ds-m6fv6/poison-pill-ds-m6fv6.yaml | grep podIP
  podIP: 10.129.2.6

No connection to the k8s-api

cat pods/poison-pill-ds-m6fv6/manager/manager/logs/current.log 
2022-06-08T19:59:02.589897677Z 2022-06-08T19:59:02.589Z	ERROR	controller-runtime.manager	Failed to get API Group-Resources	{"error": "Get \"https://172.30.0.1:443/api?timeout=32s\": dial tcp 172.30.0.1:443: i/o timeout"}
2022-06-08T19:59:02.589897677Z github.com/go-logr/zapr.(*zapLogger).Error
2022-06-08T19:59:02.589897677Z 	/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132
2022-06-08T19:59:02.589897677Z sigs.k8s.io/controller-runtime/pkg/cluster.New
2022-06-08T19:59:02.589897677Z 	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/cluster/cluster.go:160
2022-06-08T19:59:02.589897677Z sigs.k8s.io/controller-runtime/pkg/manager.New
2022-06-08T19:59:02.589897677Z 	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/manager/manager.go:278
2022-06-08T19:59:02.589897677Z main.main
2022-06-08T19:59:02.589897677Z 	/remote-source/app/main.go:96
2022-06-08T19:59:02.589897677Z runtime.main
2022-06-08T19:59:02.589897677Z 	/opt/rh/go-toolset-1.16/root/usr/lib/go-toolset-1.16-golang/src/runtime/proc.go:225
2022-06-08T19:59:02.589897677Z 2022-06-08T19:59:02.589Z	ERROR	setup	unable to start manager	{"error": "Get \"https://172.30.0.1:443/api?timeout=32s\": dial tcp 172.30.0.1:443: i/o timeout"}
2022-06-08T19:59:02.589897677Z github.com/go-logr/zapr.(*zapLogger).Error
2022-06-08T19:59:02.589897677Z 	/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132
2022-06-08T19:59:02.589897677Z sigs.k8s.io/controller-runtime/pkg/log.(*DelegatingLogger).Error
2022-06-08T19:59:02.589897677Z 	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:144
2022-06-08T19:59:02.589897677Z main.main
2022-06-08T19:59:02.589897677Z 	/remote-source/app/main.go:105
2022-06-08T19:59:02.589897677Z runtime.main
2022-06-08T19:59:02.589897677Z 	/opt/rh/go-toolset-1.16/root/usr/lib/go-toolset-1.16-golang/src/runtime/proc.go:225

OVN-K logs after node reboot

Re-creation of the node logical switch

2022-06-08T19:55:36.032526543Z I0608 19:55:36.032371       1 transact.go:41] Configuring OVN: [{Op:insert Table:Logical_Router_Port Row:map[mac:0a
:58:0a:81:02:01 name:rtos-worker-0-0 networks:{GoSet:[10.129.2.1/23]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil>
 Comment:<nil> Lock:<nil> UUIDName:u2596996921} {Op:mutate Table:Logical_Router Row:map[] Rows:[] Columns:[] Mutations:[{Column:ports Mutator:inse
rt Value:{GoSet:[{GoUUID:u2596996921}]}}] Timeout:<nil> Where:[where column _uuid == {886fb0ac-aaf9-4f37-afec-54d0ebcb14ee}] Until: Durable:<nil> 
Comment:<nil> Lock:<nil> UUIDName:} {Op:wait Table:Logical_Switch Row:map[] Rows:[map[name:worker-0-0]] Columns:[name] Mutations:[] Timeout:0xc002
ecd378 Where:[where column name == worker-0-0] Until:!= Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:} {Op:insert Table:Logical_Switch Row:map[
load_balancer_group:{GoSet:[{GoUUID:d8e91fd7-72ac-4e56-b9c1-031183f12b83}]} name:worker-0-0 other_config:{GoMap:map[exclude_ips:10.129.2.2 subnet:
10.129.2.0/23]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996922}]

LSP creation

2022-06-08T19:55:41.262722526Z I0608 19:55:41.262591       1 transact.go:41] Configuring OVN: [{Op:mutate Table:Address_Set Row:map[] Rows:[] Colu
mns:[] Mutations:[{Column:addresses Mutator:insert Value:{GoSet:[10.129.2.6]}}] Timeout:<nil> Where:[where column _uuid == {ef32522b-133c-42b3-b65
c-bfb9daf5888d}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:} {Op:wait Table:Logical_Switch_Port Row:map[] Rows:[map[name:openshift-op
erators_poison-pill-ds-m6fv6]] Columns:[name] Mutations:[] Timeout:0xc00009b9e0 Where:[where column name == openshift-operators_poison-pill-ds-m6f
v6] Until:!= Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:} {Op:insert Table:Logical_Switch_Port Row:map[addresses:{GoSet:[0a:58:0a:81:02:06 10
.129.2.6]} external_ids:{GoMap:map[namespace:openshift-operators pod:true]} name:openshift-operators_poison-pill-ds-m6fv6 options:{GoMap:map[iface
-id-ver:03649d97-df60-4461-a61f-f6025ae649b3 requested-chassis:worker-0-0]} port_security:{GoSet:[0a:58:0a:81:02:06 10.129.2.6]}] Rows:[] Columns:
[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996942} {Op:mutate Table:Logical_Switch Row:map
[] Rows:[] Columns:[] Mutations:[{Column:ports Mutator:insert Value:{GoSet:[{GoUUID:u2596996942}]}}] Timeout:<nil> Where:[where column _uuid == {9
9702e6e-b85d-40ce-b1fd-76fc18b5dbfd}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]

NAT flow creation

2022-06-08T19:55:41.257043722Z I0608 19:55:41.256966       1 transact.go:41] Configuring OVN: [{Op:insert Table:NAT Row:map[external_ip:192.168.12
3.66 logical_ip:10.129.2.6 options:{GoMap:map[stateless:false]} type:snat] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<
nil> Comment:<nil> Lock:<nil> UUIDName:u2596996947} {Op:mutate Table:Logical_Router Row:map[] Rows:[] Columns:[] Mutations:[{Column:nat Mutator:in
sert Value:{GoSet:[{GoUUID:u2596996947}]}}] Timeout:<nil> Where:[where column _uuid == {0457bfee-accf-4115-a1a9-2dd8cdfdf824}] Until: Durable:<nil
> Comment:<nil> Lock:<nil> UUIDName:}]

add logical port completed

2022-06-08T19:55:41.271195132Z I0608 19:55:41.271130       1 pods.go:333] [openshift-operators/poison-pill-ds-m6fv6] addLogicalPort took 63.64089ms

Tracing the flow w/ the database

Required Tooling

Clone the openshift tooling repo.

SouthBound DB to trace the packet flow

./dev-run-ovndb -e podman ~/bz/2068910/must-gather-t1/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-319e2c4215311a1b8c50b7cb4445f49bb79fc22dcfc120d07adcc3d6cfe6db9f/network_logs/leader_sbdb s

NorthBound DB to see the logical entities

./dev-run-ovndb -e podman ~/bz/2068910/must-gather-t1/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-319e2c4215311a1b8c50b7cb4445f49bb79fc22dcfc120d07adcc3d6cfe6db9f/network_logs/leader_nbdb n

Trace the packet

Must find:

  • IP src (logs) - 10.129.2.6
  • IP dst (logs) - 172.30.0.1
  • MAC src (logs) - 0a:58:0a:81:02:06
  • MAC dst (MAC address of the node logical switch router port) - 0a:58:0a:81:02:01
    • ovn-nbctl show ; focus on the cluster-router; check for the router port connected to the node. Grab that MAC

The trace command:

ovn-trace --ct new 'inport=="openshift-operators_poison-pill-ds-m6fv6" && eth.dst==0a:58:0a:81:02:01 && eth.src==0a:58:0a:81:02:06 && tcp && tcp.dst==443 && ip4.src==10.129.2.6 && ip4.dst==172.30.0.1 && ip.ttl==64'
# tcp,reg14=0x5,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:06,dl_dst=0a:58:0a:81:02:01,nw_src=10.129.2.6,nw_dst=172.30.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=443,tcp_flags=0

ingress(dp="worker-0-0", inport="openshift-operators_poison-pill-ds-m6fv6")
---------------------------------------------------------------------------
 0. ls_in_port_sec_l2 (northd.c:5511): inport == "openshift-operators_poison-pill-ds-m6fv6" && eth.src == {0a:58:0a:81:02:06}, priority 50, uuid c65b84bb
    next;
 1. ls_in_port_sec_ip (northd.c:5144): inport == "openshift-operators_poison-pill-ds-m6fv6" && eth.src == 0a:58:0a:81:02:06 && ip4.src == {10.129.2.6}, priority 90, uuid 53e465fa
    next;
 5. ls_in_pre_acl (northd.c:5771): ip, priority 100, uuid 4debc782
    reg0[0] = 1;
    next;
 6. ls_in_pre_lb (northd.c:5903): ip, priority 100, uuid b54ce52b
    reg0[2] = 1;
    next;
 7. ls_in_pre_stateful (northd.c:5930): reg0[2] == 1 && ip4 && tcp, priority 120, uuid d5036eb5
    reg1 = ip4.dst;
    reg2[0..15] = tcp.dst;
    ct_lb;

ct_lb
-----
 8. ls_in_acl_hint (northd.c:6003): ct.new && !ct.est, priority 7, uuid bf9a2a46
    reg0[7] = 1;
    reg0[9] = 1;
    next;
 9. ls_in_acl (northd.c:6460): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid ae28020e
    reg0[1] = 1;
    next;
14. ls_in_stateful (northd.c:6807): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid bb73a687
    ct_commit { ct_label.blocked = 0; };
    next;
15. ls_in_pre_hairpin (northd.c:6834): ip && ct.trk, priority 100, uuid 91d8252d
    reg0[6] = chk_lb_hairpin();
    reg0[12] = chk_lb_hairpin_reply();
    *** chk_lb_hairpin_reply action not implemented
    next;
24. ls_in_l2_lkup (northd.c:8323): eth.dst == 0a:58:0a:81:02:01, priority 50, uuid e97f9de0
    outport = "stor-worker-0-0";
    output;

egress(dp="worker-0-0", inport="openshift-operators_poison-pill-ds-m6fv6", outport="stor-worker-0-0")
-----------------------------------------------------------------------------------------------------
 0. ls_out_pre_lb (northd.c:5661): ip && outport == "stor-worker-0-0", priority 110, uuid ed485558
    next;
 1. ls_out_pre_acl (northd.c:5661): ip && outport == "stor-worker-0-0", priority 110, uuid 2751e3eb
    next;
 3. ls_out_acl_hint (northd.c:6003): ct.new && !ct.est, priority 7, uuid fa900ef4
    reg0[7] = 1;
    reg0[9] = 1;
    next;
 4. ls_out_acl (northd.c:6463): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 88675420
    reg0[1] = 1;
    next;
 7. ls_out_stateful (northd.c:6811): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 5256a714
    ct_commit { ct_label.blocked = 0; };
    next;
 9. ls_out_port_sec_l2 (northd.c:5609): outport == "stor-worker-0-0", priority 50, uuid 39c8785d
    output;
    /* output to "stor-worker-0-0", type "patch" */

ingress(dp="ovn_cluster_router", inport="rtos-worker-0-0")
----------------------------------------------------------
 0. lr_in_admission (northd.c:10553): eth.dst == 0a:58:0a:81:02:01 && inport == "rtos-worker-0-0" && is_chassis_resident("cr-rtos-worker-0-0"), priority 50, uuid 34fdc297
    xreg0[0..47] = 0a:58:0a:81:02:01;
    next;
 1. lr_in_lookup_neighbor (northd.c:10696): 1, priority 0, uuid 36c09299
    reg9[2] = 1;
    next;
 2. lr_in_learn_neighbor (northd.c:10705): reg9[2] == 1, priority 100, uuid d0c78031
    next;
10. lr_in_ip_routing_pre (northd.c:10955): 1, priority 0, uuid babb34a3
    reg7 = 0;
    next;
11. lr_in_ip_routing (northd.c:9469): ip4.src == 10.129.2.0/23, priority 69, uuid f312a121
    ip.ttl--;
    reg8[0..15] = 0;
    reg0 = 100.64.0.7;
    reg1 = 100.64.0.1;
    eth.src = 0a:58:64:40:00:01;
    outport = "rtoj-ovn_cluster_router";
    flags.loopback = 1;
    next;
12. lr_in_ip_routing_ecmp (northd.c:11030): reg8[0..15] == 0, priority 150, uuid 45975931
    next;
13. lr_in_policy (northd.c:11163): 1, priority 0, uuid 81c1c224
    reg8[0..15] = 0;
    next;
14. lr_in_policy_ecmp (northd.c:11165): reg8[0..15] == 0, priority 150, uuid 6da818b5
    next;
15. lr_in_arp_resolve (northd.c:11553): outport == "rtoj-ovn_cluster_router" && reg0 == 100.64.0.7, priority 100, uuid 21faae89
    eth.dst = 0a:58:64:40:00:07;
    next;
19. lr_in_arp_request (northd.c:11845): 1, priority 0, uuid 023c018f
    output;

egress(dp="ovn_cluster_router", inport="rtos-worker-0-0", outport="rtoj-ovn_cluster_router")
--------------------------------------------------------------------------------------------
 0. lr_out_chk_dnat_local (northd.c:13071): 1, priority 0, uuid 938e3f6e
    reg9[4] = 0;
    next;
 6. lr_out_delivery (northd.c:11893): outport == "rtoj-ovn_cluster_router", priority 100, uuid 13335d8d
    output;
    /* output to "rtoj-ovn_cluster_router", type "patch" */

ingress(dp="join", inport="jtor-ovn_cluster_router")
----------------------------------------------------
 0. ls_in_port_sec_l2 (northd.c:5511): inport == "jtor-ovn_cluster_router", priority 50, uuid 32527aa4
    next;
 6. ls_in_pre_lb (northd.c:5658): ip && inport == "jtor-ovn_cluster_router", priority 110, uuid ef5d74cc
    next;
24. ls_in_l2_lkup (northd.c:8323): eth.dst == 0a:58:64:40:00:07, priority 50, uuid 14d8650a
    outport = "jtor-GR_worker-0-0";
    output;

egress(dp="join", inport="jtor-ovn_cluster_router", outport="jtor-GR_worker-0-0")
---------------------------------------------------------------------------------
 0. ls_out_pre_lb (northd.c:5661): ip && outport == "jtor-GR_worker-0-0", priority 110, uuid 286cbbd3
    next;
 9. ls_out_port_sec_l2 (northd.c:5609): outport == "jtor-GR_worker-0-0", priority 50, uuid 512a5b3f
    output;
    /* output to "jtor-GR_worker-0-0", type "l3gateway" */

ingress(dp="GR_worker-0-0", inport="rtoj-GR_worker-0-0")
--------------------------------------------------------
 0. lr_in_admission (northd.c:10553): eth.dst == 0a:58:64:40:00:07 && inport == "rtoj-GR_worker-0-0", priority 50, uuid 713d00d5
    xreg0[0..47] = 0a:58:64:40:00:07;
    next;
 1. lr_in_lookup_neighbor (northd.c:10696): 1, priority 0, uuid 36c09299
    reg9[2] = 1;
    next;
 2. lr_in_learn_neighbor (northd.c:10705): reg9[2] == 1 || reg9[3] == 0, priority 100, uuid 44284960
    next;
10. lr_in_ip_routing_pre (northd.c:10955): 1, priority 0, uuid babb34a3
    reg7 = 0;
    next;
11. lr_in_ip_routing (northd.c:9469): reg7 == 0 && ip4.dst == 0.0.0.0/0, priority 1, uuid 9b71401b
    ip.ttl--;
    reg8[0..15] = 0;
    reg0 = 192.168.123.1;
    reg1 = 192.168.123.66;
    eth.src = 52:54:00:9a:20:9f;
    outport = "rtoe-GR_worker-0-0";
    flags.loopback = 1;
    next;
12. lr_in_ip_routing_ecmp (northd.c:11030): reg8[0..15] == 0, priority 150, uuid 45975931
    next;
13. lr_in_policy (northd.c:11163): 1, priority 0, uuid 81c1c224
    reg8[0..15] = 0;
    next;
14. lr_in_policy_ecmp (northd.c:11165): reg8[0..15] == 0, priority 150, uuid 6da818b5
    next;
15. lr_in_arp_resolve (northd.c:11199): ip4, priority 0, uuid fb0d2c27
    get_arp(outport, reg0);
    /* MAC binding to 52:54:00:28:29:4a. */
    next;
19. lr_in_arp_request (northd.c:11845): 1, priority 0, uuid 023c018f
    output;

egress(dp="GR_worker-0-0", inport="rtoj-GR_worker-0-0", outport="rtoe-GR_worker-0-0")
-------------------------------------------------------------------------------------
 0. lr_out_chk_dnat_local (northd.c:13071): 1, priority 0, uuid 938e3f6e
    reg9[4] = 0;
    next;
 1. lr_out_undnat (northd.c:13091): ip, priority 50, uuid fb36210d
    flags.loopback = 1;
    ct_dnat;

ct_dnat /* assuming no un-dnat entry, so no change */
-----------------------------------------------------
 2. lr_out_post_undnat (northd.c:13093): ip && ct.new, priority 50, uuid 6775acd8
    ct_commit;
    next;
 3. lr_out_snat (northd.c:12772): ip && ip4.src == 10.129.2.6, priority 33, uuid 170db2a7
    ct_snat(192.168.123.66);

ct_snat(ip4.src=192.168.123.66)
-------------------------------
 6. lr_out_delivery (northd.c:11893): outport == "rtoe-GR_worker-0-0", priority 100, uuid 4e692985
    output;
    /* output to "rtoe-GR_worker-0-0", type "l3gateway" */

ingress(dp="ext_worker-0-0", inport="etor-GR_worker-0-0")
---------------------------------------------------------
 0. ls_in_port_sec_l2 (northd.c:5511): inport == "etor-GR_worker-0-0", priority 50, uuid 35825175
    next;
 6. ls_in_pre_lb (northd.c:5658): ip && inport == "etor-GR_worker-0-0", priority 110, uuid fbd73ff5
    next;
24. ls_in_l2_lkup (northd.c:7528): 1, priority 0, uuid b88d8e9f
    outport = get_fdb(eth.dst);
    next;
25. ls_in_l2_unknown (northd.c:7533): outport == "none", priority 50, uuid d2e5c12c
    outport = "_MC_unknown";
    output;

multicast(dp="ext_worker-0-0", mcgroup="_MC_unknown")
-----------------------------------------------------

    egress(dp="ext_worker-0-0", inport="etor-GR_worker-0-0", outport="br-ex_worker-0-0")
    ------------------------------------------------------------------------------------
         0. ls_out_pre_lb (northd.c:5661): ip && outport == "br-ex_worker-0-0", priority 110, uuid d3308454
            next;
         9. ls_out_port_sec_l2 (northd.c:5609): outport == "br-ex_worker-0-0", priority 50, uuid 401b3206
            output;
            /* output to "br-ex_worker-0-0", type "localnet" */

Conclusions so far

  • the packet should have reached the ovn cluster router. It did.
  • should have been forwarded (via a load balancer) to the correct node logical switch. It was not.
  • it was instead forwarded to to join switch, and egressed the cluster.

Check the logical switches load balancers

[root@2a094cab1d7f ~]# ovn-nbctl ls-lb-list worker-0-0 | grep 172.30.0.1:443
[root@2a094cab1d7f ~]# ovn-nbctl ls-lb-list worker-0-1 | grep 172.30.0.1:443
cab73afa-1efd-4380-b06b-d5c30a8dbee4    Service_default/    tcp        172.30.0.1:443          192.168.123.105:6443,192.168.123.63:6443,192.168.123.74:6443
[root@2a094cab1d7f ~]# ovn-nbctl ls-lb-list worker-0-2 | grep 172.30.0.1:443
cab73afa-1efd-4380-b06b-d5c30a8dbee4    Service_default/    tcp        172.30.0.1:443          192.168.123.105:6443,192.168.123.63:6443,192.168.123.74:6443

The worker-0-0 node is missing the default load balancer. Not the only one though:

[root@2a094cab1d7f ~]# ovn-nbctl ls-lb-list worker-0-0 | wc -l
69
[root@2a094cab1d7f ~]# ovn-nbctl ls-lb-list worker-0-1 | wc -l
82
[root@2a094cab1d7f ~]# ovn-nbctl ls-lb-list worker-0-2 | wc -l
82

Seems there are plenty of load-balancer things that were not re-created.

List the load-balancer entries for the kubernetes service

[root@2a094cab1d7f ~]# ovn-nbctl list load_balancer | grep "172.30.0.1:443" -B8
_uuid               : 756cdcd6-bac9-41b8-a205-53e5316e1749
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="default/kubernetes"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_default/kubernetes_TCP_node_router_master-0-2"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : tcp
selection_fields    : []
vips                : {"172.30.0.1:443"="192.168.123.105:6443,169.254.169.2:6443,192.168.123.74:6443"}
--
_uuid               : 68ea4861-71ee-481e-b407-2e1b488e8f1f
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="default/kubernetes"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_default/kubernetes_TCP_node_router_master-0-0"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : tcp
selection_fields    : []
vips                : {"172.30.0.1:443"="169.254.169.2:6443,192.168.123.63:6443,192.168.123.74:6443"}
--
_uuid               : 6fcfa9a3-1de7-4154-9b71-64950236552c
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="default/kubernetes"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_default/kubernetes_TCP_node_router_master-0-1"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : tcp
selection_fields    : []
vips                : {"172.30.0.1:443"="192.168.123.105:6443,192.168.123.63:6443,169.254.169.2:6443"}
--
_uuid               : cab73afa-1efd-4380-b06b-d5c30a8dbee4
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="default/kubernetes"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_default/kubernetes_TCP_node_switch_master-0-0_merged"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : tcp
selection_fields    : []
vips                : {"172.30.0.1:443"="192.168.123.105:6443,192.168.123.63:6443,192.168.123.74:6443"}

Seems we need to point at these existing services. Question: is that enough, or were these re-computed ?...

Broken node logical switch

_uuid               : 99702e6e-b85d-40ce-b1fd-76fc18b5dbfd
acls                : [2fc3c8d9-29ff-48e6-a234-44cf6290773b]
copp                : []
dns_records         : []
external_ids        : {}
forwarding_groups   : []
load_balancer       : [89c0c3a2-e1ed-43eb-a442-ceadddb4d80f, c8dc9d33-ab20-412d-9461-5836eb3b6883, f30d59ee-e8c7-4d76-a985-af67828c9575]
load_balancer_group : [d8e91fd7-72ac-4e56-b9c1-031183f12b83]
name                : worker-0-0
other_config        : {mcast_eth_src="0a:58:0a:81:02:01", mcast_ip4_src="10.129.2.1", mcast_querier="true", mcast_snoop="true", subnet="10.129.2.0/23"}
ports               : [19411ea8-5661-41d4-8216-9889d5450e0b, 2194b342-d759-46aa-96bf-dc73b5bb435a, 256c04b1-a1d2-46d0-8c97-c486f868470a, 515b32b2-b231-4b83-909f-69a265aa326d, 65c00071-cd3f-418e-b653-599c2234c734, e70eeedd-cd93-4158-8fe2-7476ccf2079e, f141fd63-8975-4151-ad28-c250f32f6b6e]
qos_rules           : []

Good node logical switch

_uuid               : ceffef6d-efae-43cd-9b5f-c2a04141d84b
acls                : [71ab6b0e-b5e6-43bc-992d-87d243f90551]
copp                : []
dns_records         : []
external_ids        : {}
forwarding_groups   : []
load_balancer       : [04b88ce8-63ea-4689-b4f6-39860bb3f605, 05ac3177-1d51-431c-8519-3e0448e8a6dc, 268c5ee9-29da-42c3-85be-a810bf15661f, 455d3098-c7b6-456e-a234-4603e2cb17e8, 6a76a1e7-c64f-4550-bb07-e899542b8be0, 73f5d555-7e2e-4cb4-902e-38faabb7a6f4, 89c0c3a2-e1ed-43eb-a442-ceadddb4d80f, 8c35b57f-7720-49e0-902a-649b8da20e67, c769dba0-9788-4e21-a304-81478f87b273, cab73afa-1efd-4380-b06b-d5c30a8dbee4, d6966a3a-6d11-4032-a3fa-0be5c2f985f2, f7f44526-1fba-470c-842b-17cc4907c213]
load_balancer_group : [d8e91fd7-72ac-4e56-b9c1-031183f12b83]
name                : master-0-1
other_config        : {mcast_eth_src="0a:58:0a:81:00:01", mcast_ip4_src="10.129.0.1", mcast_querier="true", mcast_snoop="true", subnet="10.129.0.0/23"}
ports               : [...] # prune the logical ports
qos_rules           : []

Conclusion

Seems the Load balancers are not reconciled whenever a node is deleted - the only LB reconciliation in place happens when the associated K8s service is deleted.

Deleting the associated Load Balancers whenever a node is deleted is being investigated - does the node tracker impact any other places ? Can we force service reconciliation in it ?

@maiqueb
Copy link
Author

maiqueb commented Jun 9, 2022

Some relevant logs:

022-06-08T19:55:36.018032475Z I0608 19:55:36.017443       1 services_controller.go:246] Processing sync for service default/kubernetes
2022-06-08T19:55:36.018117323Z I0608 19:55:36.018090       1 kube.go:285] Getting endpoints for slice default/kubernetes
2022-06-08T19:55:36.018180053Z I0608 19:55:36.018155       1 kube.go:312] Adding slice kubernetes endpoints: [192.168.123.105], port: 6443
2022-06-08T19:55:36.018239298Z I0608 19:55:36.018214       1 kube.go:312] Adding slice kubernetes endpoints: [192.168.123.63], port: 6443
2022-06-08T19:55:36.018305648Z I0608 19:55:36.018282       1 kube.go:312] Adding slice kubernetes endpoints: [192.168.123.74], port: 6443
2022-06-08T19:55:36.018365198Z I0608 19:55:36.018341       1 kube.go:328] LB Endpoints for default/kubernetes are: [192.168.123.105 192.168.123.63
 192.168.123.74] / [] on port: 6443
2022-06-08T19:55:36.018602693Z I0608 19:55:36.018556       1 services_controller.go:310] Service default/kubernetes has 0 cluster-wide and 1 per-n
ode configs, making 0 and 4 load balancers
2022-06-08T19:55:36.018803999Z I0608 19:55:36.018758       1 services_controller.go:319] Skipping no-op change for service default/kubernetes
2022-06-08T19:55:36.018872057Z I0608 19:55:36.018842       1 services_controller.go:250] Finished syncing service kubernetes on namespace default 
: 2.250579ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment