During the pre-release testing for Dev Spaces 3.0, we discovered issues around running Dev Spaces in OpenShift clusters with a proxy configured. The issue was originally reported in https://issues.redhat.com/browse/CRW-2820
The root cause of this issue is that the the Kubernetes library used in Theia to interact with the Kubernetes cluster does not support NO_PROXY
settings that use CIDR ranges, e.g.
172.30.0.0/16
This is a known issue, handled as wontfix in the library: kubernetes-client/javascript#233
By default, OpenShift configures the cluster
Proxy object to not proxy requests to Kubernetes services (the 172.30.0.0/16
CIDR range). Since the library used by Theia ignores this setting, it attempts to use the proxy for routing requests to the Kubernetes API (172.30.0.1
), making those requests fail.
The workaround for this issue is to explicitly add 172.30.0.1
to the no-proxy list. Since the DevWorkspace Operator container supports adding values to the proxy configuration, the workaround arrived at was to add this into the DevWorkspaceOperatorConfig by creating the object
apiVersion: controller.devfile.io/v1alpha1
kind: DevWorkspaceOperatorConfig
metadata:
name: devworkspace-operator-config
namespace: openshift-operators
config:
routing:
proxyConfig:
noProxy: 172.30.0.1
in the DevWorkspace Operator's installed namespace.
A PR was opened in the DevWorkspace Operator to explicitly add the IP specified by $KUBERNETES_SERVICE_HOST
to the no-proxy list when provisioning workspace pods: devfile/devworkspace-operator#840
Note: This issue was reported as https://issues.redhat.com/browse/CRW-3050
Later, an further issue was detected in how the DevWorkspace Operator handles the proxy configuration: devfile/devworkspace-operator#865. If the DevWorkspaceOperatorConfig specifies proxy settings at DWO startup, the cluster config is ignored and only the values in the DevWorkspaceOperatorConfig are used. This issue was not noticed for a while as in testing the workaround described above, the noProxy: 172.30.0.1
settings was added after the controller was already running.
This issue means that the workaround above only works if it is applied after the DWO controller starts. If the controller happens to restart, the workaround must be re-applied by
- Removing the
noProxy
setting from the DevWorkspaceOperatorConfig - Restarting the DevWorkspace controller pod (by deleting the running one or rolling out the deployment)
- Waiting for the DevWorkspace controller pod to enter a running state
- Re-applying the
noProxy
setting in the DevWorkspaceOperatorConfig
This issue is resolved in PR devfile/devworkspace-operator#866
As both fixes for this issue are changes to the DevWorkspace Operator, the underlying issues will be fixed when DWO releases. Since Dev Spaces does not fix the DWO dependency version, these fixes will be pushed to all customers once DWO releases a new version.
- DevWorkspaceOperator v0.15.0 includes the original PR to fix the issue for Theia: devfile/devworkspace-operator#840
- DevWorkspaceOperator v0.15.1 could be released to also fix the operator config issue: devfile/devworkspace-operator#866
At the time of writing, the plan is to not release either DWO version for Dev Spaces 3.0, and instead delay (likely v0.15.1) until DevSpaces 3.1
The interaction between multiple bugs here means that there are multiple ways to fix this issue
Option 1: Proceed as planned; no DWO release for Dev Spaces 3.0. Release DWO v0.15.1 timed with Dev Spaces 3.1
Impact: Any customer running Dev Spaces in a cluster that uses a Proxy (airgap or otherwise) will encounter the original issue https://issues.redhat.com/browse/CRW-2820. If they use the workaround described above, they will also encounter devfile/devworkspace-operator#865 if the controller restarts.
Workarounds:
- Option 1: apply
noProxy: 172.30.0.1
DevWorkspaceOperatorConfig as described above, with the caveat that if the controller restarts for any reason, the 4-step re-workaround will be required. This might be confusing as there's typically no signal that the controller has restarted, and would result in workspaces suddenly not working anymore. - (alternate) Option 2: Instead of documenting the original workaround, document adding
noProxy: 172.30.0.1
to the cluster-wide Proxy config and restarting the DevWorkspace Operator controller pod. This would resolve the issue without triggering the further bug, but has the downside of requiring a cluster-wide configuration change- However, the config change is low risk, as the default proxy already includes
172.30.0.1
in noProxy, through the CIDR range. - I haven't fully tested this and it should be made sure that this fully resolves the problem in proxy clusters.
- However, the config change is low risk, as the default proxy already includes
Impact: Releasing DWO v0.15.0 would include the fix devfile/devworkspace-operator#840, removing the need to add settings to the proxy configuration. As a result, most customers would not encounter the further DWO proxy configuration issue devfile/devworkspace-operator#865. As a result, the workarounds described above would be required only for customers that require additional noProxy
settings, and avoid the issue in the majority of cases.
Workarounds: Secondary issue devfile/devworkspace-operator#865 would still potentially need to be documented as a KI, with the caveat that (IMO) very few/potentially no customers would encounter it. Generally, additional proxy configuration on OpenShift should be done in the cluster
Proxy object, and this DWOC configuration field exists partly to allow proxy support on Kubernetes.
The DWO v0.15.0 release is being delayed to line up with Dev Spaces 3.1 in order to avoid adding potential changes and room for errors in Dev Spaces 3.0.
This is Option 2 with the added work of downstreaming + releasing DWO v0.15.1 instead of v0.15.0. This would fix all underlying issues and require no workarounds for proxy clusters. However, v0.15.1 is not prepared for release and so this would take somewhat longer (1-2 days)
+1 for Option 3 to avoid unnecessary workarounds.