Skip to content

Instantly share code, notes, and snippets.

@dougbtv
Last active June 8, 2023 15:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dougbtv/015ce004967421019cbabe6df60217a2 to your computer and use it in GitHub Desktop.
Save dougbtv/015ce004967421019cbabe6df60217a2 to your computer and use it in GitHub Desktop.
Workaround for 4.12 Upgrade when Whereabouts reconciler fails

Workaround for 4.12 Upgrade when Whereabouts reconciler fails

This procedure is to work around a stuck upgrade where there are failing pods and the reconciler fails.

The problem you may experience is a whereabouts reconciler pod in a crashloopbackoff state with an error message like:

[error] could not create the pod networks controller: Could not find node with node name ''.: resource name may not be empty

This is likely caused by a missing environment variable, as fixed in: openshift/cluster-network-operator#1829

Summary

The gist of this procedure is to:

  • Disable the whereabouts reconciler temporarily by removing them from the cluster-network-operator (CNO) Networks object.
  • Recreate net-attach-defs by hand (so they're still usable)

In brief, the whereabouts reconciler is "opt-in" and you opt-in to using it by having net-attach-defs in the Networks object that reference whereabouts.

If those are removed, the reconciler will not launch, and therefore, the upgrade should succeed.

We follow up on this by creating the net-attach-defs by hand (as opposed to the CNO Networks), which will keep us in an "opt-out" state, so they are still usable.

NOTE: THIS NEEDS AN UPDATE TO RE-ENABLE THE RECONCILER, which is still a to-do

Step 1: Save and remove

First edit the networks object with:

oc edit networks.operator.openshift.io cluster

I recommend that you save an entire copy of this. However, you can just save the additionalNetworks section.

Look for the additionalNetworks section, like so:

spec:
  additionalNetworks:
  - name: ipvlan-dynamic
    namespace: example-item-ns
    rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "ipvlan-dynamic", "type": "ipvlan",
      "mode": "l2", "master": "bond1", "ipam": { "type": "whereabouts", "range": "[removed]::/64",
      "range_start": "[removed]:4438:39ff:feff:1101", "range_end": "[removed]:4438:39ff:feff:2125"
      } }'
    type: Raw
  - name: ipvlan-static
    namespace: example-item-ns
    rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "ipvlan-static", "type": "ipvlan",
      "mode": "l2", "master": "bond1", "ipam": { "type": "static" } }'
    type: Raw
  - name: ipvlan-dynamic
    namespace: openshift-monitoring
    rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "ipvlan-dynamic", "type": "ipvlan",
      "mode": "l2", "master": "bond1", "ipam": { "type": "whereabouts", "range": "[removed]::/64",
      "range_start": "[removed]:4438:39ff:feff:3101", "range_end": "[removed]:4438:39ff:feff:3125"
      } }'
    type: Raw
  - name: daemon-network
    namespace: example-item-ns
    rawCNIConfig: '{"cniVersion": "0.3.1","name": "daemon-network","type": "ipvlan","mode":
      "l2","master": "bond1.3201","mtu": 1500,"ipam": {"type": "static"}}'
    type: Raw
  - name: daemon-network
    namespace: example-scale
    rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "daemon-network", "type": "ipvlan",
      "mode": "l2", "master": "bond1.3201", "ipam": { "type": "static" } }'
    type: Raw

IMPORTANT: SAVE THESE ITEMS BEFORE YOU REMOVE THEM

Remove any items from the additionalNetworks section that contain "type": "whereabouts" which in this case is two items ipvlan-dynamic in two namespaces (items #1 and #3)

This would leave you with a section that reads:

spec:
  additionalNetworks:
  - name: ipvlan-static
    namespace: example-item-ns
    rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "ipvlan-static", "type": "ipvlan",
      "mode": "l2", "master": "bond1", "ipam": { "type": "static" } }'
    type: Raw
  - name: daemon-network
    namespace: example-item-ns
    rawCNIConfig: '{"cniVersion": "0.3.1","name": "daemon-network","type": "ipvlan","mode":
      "l2","master": "bond1.3201","mtu": 1500,"ipam": {"type": "static"}}'
    type: Raw
  - name: daemon-network
    namespace: example-scale
    rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "daemon-network", "type": "ipvlan",
      "mode": "l2", "master": "bond1.3201", "ipam": { "type": "static" } }'
    type: Raw

Save that file.

The CNO should now process this change and remove the reconciler pods.

You do not need to wait for the pods to be removed to continue.

Step 2: Recreate net-attach-defs and create them by hand.

We will now transpose these items into net-attach-def CRD yaml. You need three parts: The name, the namespace, and the rawCNIConfig JSON.

A resulting yaml file will look like:

---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ipvlan-dynamic
  namespace: example-item-ns
spec:
  config: '{ "cniVersion": "0.3.1", "name": "ipvlan-dynamic", "type": "ipvlan",
      "mode": "l2", "master": "bond1", "ipam": { "type": "whereabouts", "range": "[removed]::/64",
      "range_start": "[removed]:4438:39ff:feff:1101", "range_end": "[removed]:4438:39ff:feff:2125"
      } }'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ipvlan-dynamic
  namespace: openshift-monitoring
spec:
  config: '{ "cniVersion": "0.3.1", "name": "ipvlan-dynamic", "type": "ipvlan",
      "mode": "l2", "master": "bond1", "ipam": { "type": "whereabouts", "range": "[removed]::/64",
      "range_start": "[removed]:4438:39ff:feff:3101", "range_end": "[removed]:4438:39ff:feff:3125"
      } }'

Save this as my-net-attach-defs.yml and then issue:

oc create -f my-net-attach-defs.yml

Step 3: TODO: Need a way to manually start the reconciler again.

This is a to-do. This process should get an upgrade "un-stuck" as it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment