Skip to content

Instantly share code, notes, and snippets.

@alexcpn
Last active February 12, 2024 19:51
Show Gist options
  • Save alexcpn/f7068ba5b7205e75b955404f2fc24427 to your computer and use it in GitHub Desktop.
Save alexcpn/f7068ba5b7205e75b955404f2fc24427 to your computer and use it in GitHub Desktop.
Making kubeflow work in Kind

We will use the maifest way of installing Kubeflow -https://github.com/kubeflow/manifests

Create a Kind cluster with Service Account Signing key for API Server for Kubeflow to work (Istio Needs it) like below

cat <<EOF | kind create cluster --name=kubeflow  --kubeconfig /home/alexpunnen/kindclusters/mycluster.yaml --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        "service-account-issuer": "kubernetes.default.svc"
        "service-account-signing-key-file": "/etc/kubernetes/pki/sa.key"

EOF

Save the config spec somewhere

manifests$ kind get kubeconfig --name kubeflow > ~/.kube/config

Clone the Kubeflow Manifests Repo and InstalL kubeflow by the advanced/manifests method

git clone https://github.com/kubeflow/manifests
cd manifests

Use the Install all together as one

while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

You may get erros in Istio pods / stuck in ContainerCreating - delete the pods and it should start again

MountVolume.SetUp failed for volume "istiod-ca-cert" : configmap "istio-ca-root-cert" not found
kubectl -n istio-system delete pod cluster-local-gateway-7bf6b98855-mxgft istio-ingressgateway-78bc678876-4bzgn
pod "cluster-local-gateway-7bf6b98855-mxgft" deleted
pod "istio-ingressgateway-78bc678876-4bzgn" deleted
alexpunnen@pop-os:~/manifests$ kubectl get pods -n istio-system 
NAME                                     READY   STATUS    RESTARTS   AGE
authservice-0                            1/1     Running   0          5m19s
cluster-local-gateway-7bf6b98855-ngqz8   1/1     Running   0          3s
istio-ingressgateway-78bc678876-glw9n    1/1     Running   0          3s
istiod-755f4cc457-ndlwp                  1/1     Running   0          5m19s

There is a small bug in one of the manifest file for mysql kubeflow/manifests#2065

Correct that so that mysql and related pods come up

alex@pop-os:~/kubeflow/manifests$ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pv-claim
  namespace: kubeflow
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20G
EOF

Kind clusters uses Process Namespace Sharing instead of Docker; So apply that manifests instead of docker; Else you will get a socket invalid error in your Pods

kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user | kubectl delete -f -
kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user-pns | kubectl apply -f -

Now you can use an ingress-controller or LoadBalancer or port forward to access the Jupyter notebook

The simplest way port forwarding

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

Open your browser and visit http://localhost:8080. You should get the Dex login screen. Login with the default user's credential. The default email address is user@example.com and the default password is 12341234.

However LoadBalancer way is also very simple and in the long run better

Follow these instructions to install and confiture MetalLB

https://kind.sigs.k8s.io/docs/user/loadbalancer/

After that you can Patch your istio-ingressgateway service to type LoadBalancer like below; and access your cluster with the docker network IP - http://172.18.255.200/

~/manifests$ kubectl get svc/istio-ingressgateway -n istio-system 
NAME                   TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                      AGE
istio-ingressgateway   NodePort   10.96.140.228   <none>        15021:32149/TCP,80:31999/TCP,443:31895/TCP,31400:32732/TCP,15443:32015/TCP   28h

~/manifests$ kubectl patch svc/istio-ingressgateway -n istio-system -p '{"spec": {"type": "LoadBalancer"}}'
service/istio-ingressgateway patched

~/manifests$ kubectl get svc/istio-ingressgateway -n istio-system 
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                                                                      AGE
istio-ingressgateway   LoadBalancer   10.96.140.228   172.18.255.200   15021:32149/TCP,80:31999/TCP,443:31895/TCP,31400:32732/TCP,15443:32015/TCP   28h

Create a NoteBook called test2 https://i.imgur.com/JdKHamv.png Note - If you are running in GCP leave the CPU request as 0

For Jupyter access to KFP Pipelien to work you need to add the following Envoy Filter kubeflow/pipelines#4976 (comment)

else you will get an error on kfp.Client Connect

client = kfp.Client()
print(client.list_experiments())
Internal error: Unauthenticated: Request header error: there is no user identity header.

Use the below for Jupyter workbook to workbook// note the namespace and notebook name; and change it as per your context.

cat << EOF | kubectl apply -f -
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: bind-ml-pipeline-nb-kubeflow-user-example-com
 namespace: kubeflow
spec:
 selector:
   matchLabels:
     app: ml-pipeline
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/kubeflow-user-example-com/sa/default-editor"]
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: kubeflow-user-example-com
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: user@example.com
  workloadSelector:
    labels:
      notebook-name: test2
EOF

You can upload the following notebook https://colab.research.google.com/drive/1f_p4EVKReT57J4Maz4vRfhccJ_qVv03W?usp=sharing to Jupyter and test

Note that I faced multiple problems with the V2 Beta version of KFP pipeline; kubeflow/pipelines#6390, but V1 version works fine

Note a fully working instance will have the following pods in running state

alexpunnen@pop-os:~/kindclusters$ kubectl get pods -A
NAMESPACE                   NAME                                                        READY   STATUS    RESTARTS   AGE
auth                        dex-6f4f4fd769-s99tz                                        1/1     Running   1          20m
cert-manager                cert-manager-7dd5854bb4-7nf5q                               1/1     Running   0          20m
cert-manager                cert-manager-cainjector-64c949654c-fln2v                    1/1     Running   0          20m
cert-manager                cert-manager-webhook-6bdffc7c9d-vm7t8                       1/1     Running   0          20m
istio-system                authservice-0                                               1/1     Running   0          20m
istio-system                cluster-local-gateway-7bf6b98855-ngqz8                      1/1     Running   0          15m
istio-system                istio-ingressgateway-78bc678876-glw9n                       1/1     Running   0          15m
istio-system                istiod-755f4cc457-ndlwp                                     1/1     Running   0          20m
knative-eventing            eventing-controller-6b4cc547b9-dt9xk                        1/1     Running   0          20m
knative-eventing            eventing-webhook-7497957865-lzb5x                           1/1     Running   0          20m
knative-eventing            imc-controller-c8d86c869-h6xt4                              1/1     Running   0          20m
knative-eventing            imc-dispatcher-7bf75b8999-dv8hx                             1/1     Running   0          20m
knative-eventing            mt-broker-controller-5596fd9c9-wd4cj                        1/1     Running   0          20m
knative-eventing            mt-broker-filter-8c699b678-4z9fb                            1/1     Running   0          20m
knative-eventing            mt-broker-ingress-f8b9b6cfc-nncw5                           1/1     Running   0          20m
knative-serving             activator-7d554f9d67-nz4j9                                  2/2     Running   1          18m
knative-serving             autoscaler-549ccd665f-f5wcv                                 2/2     Running   1          18m
knative-serving             controller-c548cfcff-xjv4p                                  2/2     Running   1          18m
knative-serving             istio-webhook-68fddcc567-hbqrq                              2/2     Running   1          18m
knative-serving             networking-istio-5664b9fb9c-dr9pm                           2/2     Running   1          18m
knative-serving             webhook-6644fdc69-prnh2                                     2/2     Running   1          18m
kube-system                 coredns-f9fd979d6-4vwgf                                     1/1     Running   0          95m
kube-system                 coredns-f9fd979d6-r7b4k                                     1/1     Running   0          95m
kube-system                 etcd-kubeflow-control-plane                                 1/1     Running   0          95m
kube-system                 kindnet-nv7n8                                               1/1     Running   0          95m
kube-system                 kube-apiserver-kubeflow-control-plane                       1/1     Running   0          95m
kube-system                 kube-controller-manager-kubeflow-control-plane              1/1     Running   0          95m
kube-system                 kube-proxy-g5m4w                                            1/1     Running   0          95m
kube-system                 kube-scheduler-kubeflow-control-plane                       1/1     Running   0          95m
kubeflow-user-example-com   ml-pipeline-ui-artifact-767659f9df-mcksg                    2/2     Running   0          4m52s
kubeflow-user-example-com   ml-pipeline-visualizationserver-6ff9f47c6b-54ktk            2/2     Running   0          4m52s
kubeflow                    admission-webhook-deployment-f5d8f47f8-hb9fm                1/1     Running   0          18m
kubeflow                    cache-deployer-deployment-6dbb64ddcd-6t9h6                  2/2     Running   1          18m
kubeflow                    cache-server-f84f6bdcc-6qn6h                                2/2     Running   0          18m
kubeflow                    centraldashboard-5fb844d56d-lhb2b                           1/1     Running   0          18m
kubeflow                    jupyter-web-app-deployment-bdfb5d69f-c8df6                  1/1     Running   0          18m
kubeflow                    katib-controller-7b98cd6865-7rvb9                           1/1     Running   0          18m
kubeflow                    katib-db-manager-7689947dc5-m28bj                           1/1     Running   2          18m
kubeflow                    katib-mysql-586f79b694-ssl98                                1/1     Running   0          18m
kubeflow                    katib-ui-64fbdf4d94-m5lqr                                   1/1     Running   0          18m
kubeflow                    kfserving-controller-manager-0                              2/2     Running   0          18m
kubeflow                    kubeflow-pipelines-profile-controller-6cfd6bf9bd-jj5sv      1/1     Running   0          18m
kubeflow                    metacontroller-0                                            1/1     Running   0          18m
kubeflow                    metadata-envoy-deployment-95b58bbbb-m9xkg                   1/1     Running   0          18m
kubeflow                    metadata-grpc-deployment-7cb87744c7-75zw8                   2/2     Running   3          18m
kubeflow                    metadata-writer-76b6b98985-ktbmd                            2/2     Running   1          18m
kubeflow                    minio-5b65df66c9-w5skh                                      2/2     Running   0          18m
kubeflow                    ml-pipeline-84858dd97b-7l7ww                                2/2     Running   1          18m
kubeflow                    ml-pipeline-persistenceagent-6ff46967ff-k5kwn               2/2     Running   1          18m
kubeflow                    ml-pipeline-scheduledworkflow-66bdf9948d-k2ccl              2/2     Running   0          18m
kubeflow                    ml-pipeline-ui-867664b965-gzm9b                             2/2     Running   0          18m
kubeflow                    ml-pipeline-viewer-crd-64dddf4597-vnxgp                     2/2     Running   1          18m
kubeflow                    ml-pipeline-visualizationserver-7f88f8b84b-qwzgq            2/2     Running   0          18m
kubeflow                    mpi-operator-d5bfb8489-dltng                                1/1     Running   0          18m
kubeflow                    mxnet-operator-6cffc568b7-p9n2p                             1/1     Running   0          18m
kubeflow                    mysql-f7b9b7dd4-5wl62                                       2/2     Running   0          18m
kubeflow                    notebook-controller-deployment-c88b44b79-g2vrl              1/1     Running   0          18m
kubeflow                    profiles-deployment-5c94fd8fbf-b9vlt                        2/2     Running   0          18m
kubeflow                    pytorch-operator-56bffbbd86-rllxr                           2/2     Running   0          18m
kubeflow                    tensorboard-controller-controller-manager-d7c68d6df-dvc5h   3/3     Running   1          18m
kubeflow                    tensorboards-web-app-deployment-59ff4c7bd8-852k9            1/1     Running   0          18m
kubeflow                    tf-job-operator-859885c8c4-hxbgn                            1/1     Running   0          18m
kubeflow                    volumes-web-app-deployment-6457c9bcfc-hdf5k                 1/1     Running   0          18m
kubeflow                    workflow-controller-7b44676dff-lqfxx                        2/2     Running   1          18m
kubeflow                    xgboost-operator-deployment-c6ddb584-878t4                  2/2     Running   1          18m
local-path-storage          local-path-provisioner-78776bfc44-lqfvv                     1/1     Running   0          95m
@diegolovison
Copy link

Tried with the latest commit and it is failing.
If you would like to provide help you are welcome
See: https://kubeflow.slack.com/archives/C7REE0ETX/p1707764991199189

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment