Skip to content

Instantly share code, notes, and snippets.

@amisevsk
Last active August 4, 2022 21:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amisevsk/1e8b14f7dcaa50727d35144133394747 to your computer and use it in GitHub Desktop.
Save amisevsk/1e8b14f7dcaa50727d35144133394747 to your computer and use it in GitHub Desktop.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: devworkspace-temporary-clusterrole
rules:
- apiGroups:
- ""
resourceNames:
- workspace-preferences-configmap
resources:
- configmaps
verbs:
- create
- delete
- get
- patch
- apiGroups:
- ""
resources:
- configmaps
- persistentvolumeclaims
- pods
- secrets
- serviceaccounts
verbs:
- '*'
- apiGroups:
- ""
resources:
- events
- namespaces
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods/exec
verbs:
- create
- apiGroups:
- ""
resourceNames:
- workspace-credentials-secret
resources:
- secrets
verbs:
- create
- delete
- get
- patch
- apiGroups:
- ""
resources:
- services
verbs:
- '*'
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resourceNames:
- devworkspace-controller
resources:
- deployments/finalizers
verbs:
- update
- apiGroups:
- apps
- extensions
resources:
- deployments
verbs:
- get
- list
- watch
- apiGroups:
- apps
- extensions
resources:
- deployments
- replicasets
verbs:
- '*'
- apiGroups:
- apps
- extensions
resources:
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- authorization.k8s.io
resources:
- localsubjectaccessreviews
- subjectaccessreviews
verbs:
- create
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- config.openshift.io
resourceNames:
- cluster
resources:
- proxies
verbs:
- get
- apiGroups:
- controller.devfile.io
resources:
- '*'
verbs:
- '*'
- apiGroups:
- controller.devfile.io
resources:
- devworkspaceroutings
verbs:
- '*'
- apiGroups:
- controller.devfile.io
resources:
- devworkspaceroutings/status
verbs:
- get
- patch
- update
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- get
- update
- apiGroups:
- metrics.k8s.io
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
verbs:
- create
- get
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- '*'
- apiGroups:
- oauth.openshift.io
resources:
- oauthclients
verbs:
- create
- delete
- deletecollection
- get
- list
- patch
- update
- watch
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterrolebindings
- clusterroles
- rolebindings
- roles
verbs:
- create
- get
- list
- update
- watch
- apiGroups:
- route.openshift.io
resources:
- routes
verbs:
- '*'
- apiGroups:
- route.openshift.io
resources:
- routes/custom-host
verbs:
- create
- apiGroups:
- workspace.devfile.io
resources:
- '*'
verbs:
- '*'
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: devworkspace-temporary-clusterrolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: devworkspace-temporary-clusterrole
subjects:
- kind: ServiceAccount
name: devworkspace-controller-serviceaccount
namespace: openshift-operators

Fixing broken OLM install of DWO without fully removing the operator

Reproducing the issue in cluster-bot

This process uses openshift-operators as the namespace. To use an alternate namespace, replace it's necessary to create an operatorgroup before installing operators:

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: dwo-test
  1. Install DWO release catalog and create a subscription to DWO at an earlier version to allow an upgrade to take place

    cat <<EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: devworkspace-operator-catalog
      namespace: openshift-operators
    spec:
      sourceType: grpc
      image: quay.io/devfile/devworkspace-operator-index:release
      publisher: Red Hat
      displayName: DevWorkspace Operator Catalog
      updateStrategy:
        registryPoll:
          interval: 5m
    ---
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: devworkspace-operator
      namespace: openshift-operators
    spec:
      channel: fast
      name: devworkspace-operator
      source: devworkspace-operator-catalog
      sourceNamespace: openshift-operators
      installPlanApproval: Manual
      startingCSV: devworkspace-operator.v0.15.1
    EOF
  2. Install the next Che Operator

    cat <<EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: che-next-operator-catalog
      namespace: openshift-operators
    spec:
      image: 'quay.io/eclipse/eclipse-che-openshift-opm-catalog:next'
      sourceType: grpc
      updateStrategy:
        registryPoll:
          interval: 5m
    ---
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: che-operator
      namespace: openshift-operators
    spec:
      channel: next
      installPlanApproval: Automatic
      name: eclipse-che-preview-openshift
      source: che-next-operator-catalog
      sourceNamespace: openshift-operators
      startingCSV: eclipse-che-preview-openshift.v7.52.0-639.next
    EOF
  3. Since the installPlanApproval for DWO is "Manual" approve the initial install via e.g. OperatorHub. Once the inital install is complete, both Operators should have updates available.

  4. Create instances of CheCluster and DevWorkspace CRs, as those are impacted by conversion webhooks

    oc new-project eclipse-che
    cat <<EOF | oc apply -f -
    apiVersion: org.eclipse.che/v2
    kind: CheCluster
    metadata:
      name: eclipse-che
      namespace: eclipse-che
    spec: {}
    EOF
    oc apply -f https://raw.githubusercontent.com/devfile/devworkspace-operator/main/samples/plain.yaml -n eclipse-che
  5. Break conversion webhooks for DevWorkspaces in a similar way to what happens when Che attempts to deploy DWO:

    oc patch crds devworkspaces.workspace.devfile.io --patch '
    metadata:
      annotations:
        service.beta.openshift.io/inject-cabundle: "true"
    spec:
      conversion:
        webhook:
          clientConfig:
            caBundle: Cg== # "" in base64 encoding
    '
  6. Verify that conversion webhooks for DWO are broken

    oc get devworkspaces.v1alpha1.workspace.devfile.io --all-namespaces
    
    Error from server: conversion webhook for workspace.devfile.io/v1alpha2, Kind=DevWorkspace failed: Post "https://devworkspace-controller-manager-service.openshift-operators.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
    
  7. Allow the update for DevWorkspace Operator in OperatorHub (should attempt to install DWO v0.15.2 and Che eclipse-che-preview-openshift.v7.52.0-639.next). This update will get stuck

Congratulations, your cluster is broken.

Repairing DWO installation

Note: If this issue is encountered naturally (i.e. Dev Spaces / Eclipse Che attempts to install DevWorkspaces from YAML), delete the devworkspace-controller namespace. In this case, it is also necessary to apply the process below to the devworkspacetemplates.workspace.devfile.io CRD

  1. Remove the inject-cabundle annotation from DevWorkspaces CRD

    oc annotate crd devworkspaces.workspace.devfile.io service.beta.openshift.io/inject-cabundle-

    This is necessary as leaving it in place will overwrite the changes below.

  2. Restore caBundle in CRD to match what is served by DevWorkspace Operator install

    # Get serving cert
    CA_BUNDLE=$(oc get secret -n openshift-operators devworkspace-controller-manager-service-cert -o json | jq -r '.data.olmCAKey')
    # Set correct caBundle and service reference
    oc patch crds devworkspaces.workspace.devfile.io --patch "
    spec:
      conversion:
        webhook:
          clientConfig:
            caBundle: ${CA_BUNDLE}
          service:
            name: devworkspace-controller-manager-service
            namespace: openshift-operators
            path: /convert
            port: 443
    "
  3. Verify that conversion webhooks for DWO are restored -- the below command should not show an error:

    oc get devworkspaces.v1alpha1.workspace.devfile.io --all-namespaces
    
    W0804 16:45:34.843990   87674 warnings.go:70] workspace.devfile.io/v1alpha1 DevWorkspace is deprecated; use workspace.devfile.io/v1alpha2 DevWorkspace
    NAMESPACE     NAME                 WORKSPACE ID                PHASE     URL
    eclipse-che   plain-devworkspace   workspace9b4ca9ef26124f9d   Running
    

Repairing OLM install of DevWorkspace Operator

The above process should result in breaking the OLM install of DWO as well. In some cases (e.g. if DWO is not updated after conversion webhooks are broken) this may not be the case, making this step and later steps unnecessary.

To check if your cluster has this problem, typical signs are:

  • There are multiple CSVs for DWO in the openshift-operators namespace, with one having phase "Pending" and the other "Replacing"
    ❯ oc get csv | grep devworkspace
    devworkspace-operator.v0.15.1                    DevWorkspace Operator   0.15.1            devworkspace-operator.v0.15.0                    Replacing
    devworkspace-operator.v0.15.2                    DevWorkspace Operator   0.15.2            devworkspace-operator.v0.15.1                    Pending
    
  • In the "Installed Operators" UI, multiple entries for DWO or one entry that is stuck in a loop of Installing and InstallSucceeded
  • Inability to get new updates for DWO

To fix this, it's nececssary to uninstall the operator but leave all workloads in place. If the devworkspace controller deployment is removed, nothing will be available to serve webhooks. Follow these steps carefully

  1. Apply a temporary clusterrole and clusterrolebinding to allow the DevWorkspace deployment to continue to run while we reinstall the subscription and CSVs (the link below should point at the other file in this gist)

    oc apply -f https://gist.githubusercontent.com/amisevsk/1e8b14f7dcaa50727d35144133394747/raw/e53301b3f623bdeb2f3c6deaef9130640ab8916a/dwo-clusterrole-and-binding.yaml
    

    This clusterrole and clusterrolebinding are identical to the one supplied by OLM for the operator. OLM will delete the existing clusterrole and binding when the CSV is deleted.

  2. Delete the DevWorkspace Operator subscription and any CSVs created for it. Note: the --cascade=orphan option is required to avoid deleting the controller deployment and service.

    oc delete sub devworkspace-operator --cascade=orphan
    for csv in $(oc get csvs | grep devworkspace | cut -f 1 -d ' '); do
      oc delete csv $csv --cascade=orphan
    done
  3. Verify that conversion webhooks are still functional (if not, see steps above to repair) -- the command should not show a TLS or service unavailable error

    oc get devworkspaces.v1alpha1.workspace.devfile.io --all-namespaces
    
    W0804 16:45:34.843990   87674 warnings.go:70] workspace.devfile.io/v1alpha1 DevWorkspace is deprecated; use workspace.devfile.io/v1alpha2 DevWorkspace
    NAMESPACE     NAME                 WORKSPACE ID                PHASE     URL
    eclipse-che   plain-devworkspace   workspace9b4ca9ef26124f9d   Running
    
  4. Re-create the DWO subscription

    cat <<EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: devworkspace-operator
      namespace: openshift-operators
    spec:
      channel: fast
      name: devworkspace-operator
      source: devworkspace-operator-catalog
      sourceNamespace: openshift-operators
      installPlanApproval: Manual
      startingCSV: devworkspace-operator.v0.15.1
    EOF
  5. Approve the manual install (or let it install automatically) and ensure it succeeds, CSVs are as expected, and install functions normally. A successful update after installation is a good sign :)

  6. Delete temporary clusterroles created in step 1

    oc delete clusterrole devworkspace-temporary-clusterrole
    oc delete clusterrolebinding devworkspace-temporary-clusterrolebinding
    

Fixing Che Operator installation

In one case while following the process above, the Che Operator OLM installation was also broken. The general process to fix this should be the same as for fixing the DWO installation above (you may need the correct clusterrole and binding for the Che Operator instead of for DWO), but full documentation hasn't been tested.

To test that Che Operator conversion webhooks are working, you can use the command

❯ oc get checlusters.v1.org.eclipse.che --all-namespaces
W0804 17:04:58.752317   91479 warnings.go:70] org.eclipse.che/v1 CheCluster is deprecated and will be removed in future releases
NAMESPACE     NAME          AGE
eclipse-che   eclipse-che   50m

Strange issues encountered during reinstalling Che Operator (to be investigated further)

  • InstallPlan fails with

    InstallComponentFailedinstall strategy failed: service che-operator-service not safe to replace: extraneous ownerreferences found

    • Fix: Manually add correct ownerref to Che Operator service (???)
      # Generate ownerref patch for the service
      SVC_PATCH=$(oc get csv -o yaml | yq -y '.items[] | select(.metadata.name | match("eclipse-che*")) | {
        "metadata": {
          "ownerReferences": [{
            "apiVersion": "operators.coreos.com/v1alpha1",
            "blockOwnerDeletion": false,
            "controller": false,
            "kind": "ClusterServiceVersion",
            "name": "eclipse-che-preview-openshift.v7.52.0-642.next",
            "uid": .metadata.uid,
          }]
        }
      }')
      oc patch svc che-operator-service -n openshift-operators --patch "$SVC_PATCH"
  • Service che-operator-service is deleted somehow

    • Fix: Recreate it (this is liable to break, it's best to save the service from the cluster before attempting anything)
      cat <<EOF | oc apply -f -
      kind: Service
      apiVersion: v1
      metadata:
        name: che-operator-service
        namespace: openshift-operators
        labels:
          operators.coreos.com/eclipse-che-preview-openshift.openshift-operators: ''
      spec:
        ports:
          - name: '443'
            protocol: TCP
            port: 443
            targetPort: 9443
        type: ClusterIP
        selector:
          app: che-operator
      EOF
  • Pod cannot be created for Che Operator controller, with CreateContainerSpecError

    • Fix: Remove securityContext from che-operator deployment podSpec (???)
  • Conversion webhooks are broken by install, with no caBundle and service namespace eclipse-che

    • fix conversion webhooks as for DWO:
      # Get serving cert
      CA_BUNDLE=$(oc get secret -n openshift-operators che-operator-service-cert -o json | jq -r '.data.olmCAKey')
      # Set correct caBundle and service reference
      oc patch crds checlusters.org.eclipse.che --patch "
      spec:
        conversion:
          webhook:
            clientConfig:
              caBundle: ${CA_BUNDLE}
            service:
              name: che-operator-service
              namespace: openshift-operators
              path: /convert
              port: 443
      "
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment