Skip to content

Instantly share code, notes, and snippets.

@amisevsk
Last active June 10, 2022 14:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amisevsk/a278abed2665d48bb0685e34e2501621 to your computer and use it in GitHub Desktop.
Save amisevsk/a278abed2665d48bb0685e34e2501621 to your computer and use it in GitHub Desktop.

Dev Spaces proxy issue

During the pre-release testing for Dev Spaces 3.0, we discovered issues around running Dev Spaces in OpenShift clusters with a proxy configured. The issue was originally reported in https://issues.redhat.com/browse/CRW-2820

Background

The root cause of this issue is that the the Kubernetes library used in Theia to interact with the Kubernetes cluster does not support NO_PROXY settings that use CIDR ranges, e.g.

172.30.0.0/16

This is a known issue, handled as wontfix in the library: kubernetes-client/javascript#233

By default, OpenShift configures the cluster Proxy object to not proxy requests to Kubernetes services (the 172.30.0.0/16 CIDR range). Since the library used by Theia ignores this setting, it attempts to use the proxy for routing requests to the Kubernetes API (172.30.0.1), making those requests fail.

The workaround for this issue is to explicitly add 172.30.0.1 to the no-proxy list. Since the DevWorkspace Operator container supports adding values to the proxy configuration, the workaround arrived at was to add this into the DevWorkspaceOperatorConfig by creating the object

apiVersion: controller.devfile.io/v1alpha1
kind: DevWorkspaceOperatorConfig
metadata:
  name: devworkspace-operator-config
  namespace: openshift-operators
config:
  routing:
    proxyConfig:
      noProxy: 172.30.0.1

in the DevWorkspace Operator's installed namespace.

A PR was opened in the DevWorkspace Operator to explicitly add the IP specified by $KUBERNETES_SERVICE_HOST to the no-proxy list when provisioning workspace pods: devfile/devworkspace-operator#840

Further issues

Note: This issue was reported as https://issues.redhat.com/browse/CRW-3050

Later, an further issue was detected in how the DevWorkspace Operator handles the proxy configuration: devfile/devworkspace-operator#865. If the DevWorkspaceOperatorConfig specifies proxy settings at DWO startup, the cluster config is ignored and only the values in the DevWorkspaceOperatorConfig are used. This issue was not noticed for a while as in testing the workaround described above, the noProxy: 172.30.0.1 settings was added after the controller was already running.

This issue means that the workaround above only works if it is applied after the DWO controller starts. If the controller happens to restart, the workaround must be re-applied by

  1. Removing the noProxy setting from the DevWorkspaceOperatorConfig
  2. Restarting the DevWorkspace controller pod (by deleting the running one or rolling out the deployment)
  3. Waiting for the DevWorkspace controller pod to enter a running state
  4. Re-applying the noProxy setting in the DevWorkspaceOperatorConfig

This issue is resolved in PR devfile/devworkspace-operator#866

DevWorkspace Operator releases

As both fixes for this issue are changes to the DevWorkspace Operator, the underlying issues will be fixed when DWO releases. Since Dev Spaces does not fix the DWO dependency version, these fixes will be pushed to all customers once DWO releases a new version.

At the time of writing, the plan is to not release either DWO version for Dev Spaces 3.0, and instead delay (likely v0.15.1) until DevSpaces 3.1

Workarounds and solutions

The interaction between multiple bugs here means that there are multiple ways to fix this issue

Option 1: Proceed as planned; no DWO release for Dev Spaces 3.0. Release DWO v0.15.1 timed with Dev Spaces 3.1

Impact: Any customer running Dev Spaces in a cluster that uses a Proxy (airgap or otherwise) will encounter the original issue https://issues.redhat.com/browse/CRW-2820. If they use the workaround described above, they will also encounter devfile/devworkspace-operator#865 if the controller restarts.

Workarounds:

  • Option 1: apply noProxy: 172.30.0.1 DevWorkspaceOperatorConfig as described above, with the caveat that if the controller restarts for any reason, the 4-step re-workaround will be required. This might be confusing as there's typically no signal that the controller has restarted, and would result in workspaces suddenly not working anymore.
  • (alternate) Option 2: Instead of documenting the original workaround, document adding noProxy: 172.30.0.1 to the cluster-wide Proxy config and restarting the DevWorkspace Operator controller pod. This would resolve the issue without triggering the further bug, but has the downside of requiring a cluster-wide configuration change
    • However, the config change is low risk, as the default proxy already includes 172.30.0.1 in noProxy, through the CIDR range.
    • I haven't fully tested this and it should be made sure that this fully resolves the problem in proxy clusters.

Option 2: Release DWO v0.15.0 timed with the Dev Spaces 3.0

Impact: Releasing DWO v0.15.0 would include the fix devfile/devworkspace-operator#840, removing the need to add settings to the proxy configuration. As a result, most customers would not encounter the further DWO proxy configuration issue devfile/devworkspace-operator#865. As a result, the workarounds described above would be required only for customers that require additional noProxy settings, and avoid the issue in the majority of cases.

Workarounds: Secondary issue devfile/devworkspace-operator#865 would still potentially need to be documented as a KI, with the caveat that (IMO) very few/potentially no customers would encounter it. Generally, additional proxy configuration on OpenShift should be done in the cluster Proxy object, and this DWOC configuration field exists partly to allow proxy support on Kubernetes.

The DWO v0.15.0 release is being delayed to line up with Dev Spaces 3.1 in order to avoid adding potential changes and room for errors in Dev Spaces 3.0.

Option 3: Release DWO v0.15.1 timed with Dev Spaces 3.0

This is Option 2 with the added work of downstreaming + releasing DWO v0.15.1 instead of v0.15.0. This would fix all underlying issues and require no workarounds for proxy clusters. However, v0.15.1 is not prepared for release and so this would take somewhat longer (1-2 days)

@dmytro-ndp
Copy link

+1 for Option 3 to avoid unnecessary workarounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment