Skip to content

Instantly share code, notes, and snippets.

@sberryman
Created January 6, 2023 17:59
Show Gist options
  • Save sberryman/24981db1aea9d52eaa3c87deca991317 to your computer and use it in GitHub Desktop.
Save sberryman/24981db1aea9d52eaa3c87deca991317 to your computer and use it in GitHub Desktop.
Kubeflow 1.5 on a Kind cluster

How to setup kubeflow locally

This process worked for me after 5 days of hell trying to get everything running. Most of this was pulled from this excellent blog post. https://jacobtomlinson.dev/posts/2022/running-kubeflow-inside-kind-with-gpu-support/

Setup the kind cluster

ToDo: Find the patch to kind to allow GPU attachment... (kind of important, I know...)

Create a kind-gpu.yaml file

# kind-gpu.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: kubeflow-gpu
nodes:
  - role: control-plane
    image: kindest/node:v1.21.2
    gpus: True
    extraPortMappings:
    - containerPort: 31080
      listenAddress: 127.0.0.1
      hostPort: 80
kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        "service-account-issuer": "kubernetes.default.svc"
        "service-account-signing-key-file": "/etc/kubernetes/pki/sa.key"

Fire up the cluster

$kind create cluster --config kind-gpu.yaml
Creating cluster "kubeflow-gpu" ...
 βœ“ Ensuring node image (kindest/node:v1.21.2) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦  
 βœ“ Writing configuration πŸ“œ 
 βœ“ Starting control-plane πŸ•ΉοΈ 
 βœ“ Installing CNI πŸ”Œ 
 βœ“ Installing StorageClass πŸ’Ύ 
Set kubectl context to "kind-kubeflow-gpu"
You can now use your cluster with:

kubectl cluster-info --context kind-kubeflow-gpu

Have a nice day! πŸ‘‹

Switch to the proper context

kubectx kind-kubeflow-gpu

Make cluster GPU aware

Next we need to install the NVIDIA operator via helm. This will add the device plugins to the Kuberenetes API so it can detect GPUs and schedule them.

We want to avoid the operator trying to install drivers though as we already did that so we need to disable driver installs.

helm repo add nvidia https://nvidia.github.io/gpu-operator \
  && helm repo update

helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator \
  --set driver.enabled=false

Install kubeflow

Firse we need to clone the manifests provided by the working group

git clone https://github.com/kubeflow/manifests.git
cd manifests
git checkout v1.5-branch

Now it is time to run the install!

$ ./hack/setup-kubeflow.sh

Go grab some coffee and watch the pods spin up (roughly 74). I had to run the setup-kubeflow.sh script three times and it took roughly 15 minutes on a 14 core machine with 128 GB ram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment