Skip to content

Instantly share code, notes, and snippets.

@jkeam
Last active July 23, 2024 18:45
Show Gist options
  • Save jkeam/87ce34534ed0ca5b21ca308815ad903f to your computer and use it in GitHub Desktop.
Save jkeam/87ce34534ed0ca5b21ca308815ad903f to your computer and use it in GitHub Desktop.
OpenShift AI Ingress Certs

OpenShift AI Ingress

Introduction

Wonderful Diagram of how KServe works from the docs

  +----------------------+        +-----------------------+      +--------------------------+
  |Istio Virtual Service |        |Istio Virtual Service  |      | K8S Service              |
  |                      |        |                       |      |                          |
  |sklearn-iris          |        |sklearn-iris-predictor |      | sklearn-iris-predictor   |
  |                      +------->|-default               +----->| -default-$revision       |
  |                      |        |                       |      |                          |
  |KServe Route          |        |Knative Route          |      | Knative Revision Service |
  +----------------------+        +-----------------------+      +------------+-------------+
   Knative Ingress Gateway           Knative Local Gateway                    Kube Proxy
   (Istio gateway)                   (Istio gateway)                          |
                                                                              |
                                                                              |
  +-------------------------------------------------------+                   |
  |  Knative Revision Pod                                 |                   |
  |                                                       |                   |
  |  +-------------------+      +-----------------+       |                   |
  |  |                   |      |                 |       |                   |
  |  |kserve-container   |<-----+ Queue Proxy     |       |<------------------+
  |  |                   |      |                 |       |
  |  +-------------------+      +--------------^--+       |
  |                                            |          |
  +-----------------------^-------------------------------+
                          | scale deployment   |
                 +--------+--------+           | pull metrics
                 |  Knative        |           |
                 |  Autoscaler     |-----------
                 |  KPA/HPA        |
                 +-----------------+

If you look at the Knative and Istio diagrams, this will look very similar. At a high level, the Istio Gateway sits in front of the Knative machinery that spins pods up and down. This has the advantage of having a very sophisticated proxy where you can define fine grained policies that fronts the Knative service, in addition to allowing the Knative service to participate in the service mesh.

When used in the context of OpenShift AI, this can get a little complicated because by default the DataScienceCluster (DSC) will use self-signed certs for the Istio Gateway. This is configurable, but if you're like me and find yourself in a situation where you've already installed OpenShift AI and want to change this after the fact, there's a little bit of a dance you have to do to fix this. Let's dig into it.

Certs

The first thing to do is to copy your good certs from the openshift-ingress namespace, which is usually named ingress-certs-xyz into the istio-ingress namespace.

oc get secrets/ingress-certs-xyz -o yaml > custom-knative-serving-cert.yaml

Then edit openshift-certs.yaml and delete all the managedFields and various other stuff you won't need like generation and what not, and most importantly, update the namespace to istio-system and update the name to something like custom-knative-serving-cert. Then create it.

# being extra sure this is created in the istio-system namespace
oc apply -f ./custom-knative-serving-cert.yaml -n istio-system

Unmanage It

Next we need to have the OpenShift AI Operator stop watching the Knative and Istio Ingress so that we can delete them.

  1. Update DSC Initializer oc edit DSCInitialization.dscinitialization.opendatahub.io/default-dsci and set managementState: Removed
spec:
 applicationsNamespace: redhat-ods-applications
 serviceMesh:
   controlPlane:
     metricsCollection: Istio
     name: data-science-smcp
     namespace: istio-system
   managementState: Removed
  1. Update DSC Cluster oc edit DataScienceCluster.datasciencecluster.opendatahub.io/default-dsc and set managementState: Removed for kserve, kserve.serving, and modelmeshserving

    kserve:
      managementState: Removed
      serving:
        ingressGateway:
          certificate:
            secretName: knative-serving-cert
            type: SelfSigned
        managementState: Removed
        name: knative-serving
    modelmeshserving:
      managementState: Removed

Delete It

Next we need to delete the Knative and Istio Ingress.

  1. Delete ServiceMeshControlPlane

    oc delete ServiceMeshControlPlane.maistra.io/data-science-smcp -n istio-system
  2. Delete KNativeServing

    oc delete KNativeServing.operator.knative.dev/knative-serving -n knative-serving

Manage It

Next we need to have the OpenShift AI Operator remanage and recreate the ServiceMeshControlPlane and KnativeServing.

  1. Update DSC Initializer oc edit DSCInitialization.dscinitialization.opendatahub.io/default-dsci and set managementState: Managed
spec:
 applicationsNamespace: redhat-ods-applications
 serviceMesh:
   controlPlane:
     metricsCollection: Istio
     name: data-science-smcp
     namespace: istio-system
   managementState: Managed
  1. Update DSC Cluster oc edit DataScienceCluster.datasciencecluster.opendatahub.io/default-dsc and set managementState: Removed for kserve, kserve.serving, and modelmeshserving and set the ingressGateway. Also update the secretName to match the newly created secret from step 1.

    kserve:
      managementState: Managed
      serving:
        ingressGateway:
          certificate:
            secretName: custom-knative-serving-cert
            type: Provided
        managementState: Managed
        name: knative-serving
    modelmeshserving:
      managementState: Managed

Done

That's it! Wait a few minutes and after the ServiceMeshControlPlane and KNativeServing both get recreated, they will have the correct certs that OpenShift is using, and not its own self-signed certs.

References

  1. KServe Docs
  2. Single Stack Serving Workshop
  3. Serving Large Models Docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment