This talks through the steps required to build a kind cluster with GPU support, then share that to a vCluster running inside the kind cluster.
This is an expansion on the tutorial
https://www.substratus.ai/blog/kind-with-gpus/ including the steps required
to make GPU work with containerd runtime for docker.
This guide requires an NVidia graphics card.
Follow the instructions for installing the nvidia container toolkit https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Note
If you are using containerd as your container runtime for docker, you need to set the default for both docker and containerd, otherwise the nodes will not allocate for GPU leaving it unavailable with pods stuck in pending state.
If you are only using docker then:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo systemctl restart dockerIf using both containerd and docker then:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo nvidia-ctk runtime configure --runtime=containerd --set-as-default
sudo systemctl restart containerd dockerThe nvidia container runtime needs to be instructed to accept visible devices
as volume mounts. This involves changing the container-runtime config to
uncomment the accept-nvidia-visible-devices-as-volume-mounts
sudo sed -i '/accept-nvidia-visible-devices-as-volume-mounts/c\accept-nvidia-visible-devices-as-volume-mounts = true' /etc/nvidia-container-runtime/config.tomlOnce this has been completed, we can spin up a kind cluster. This cluster uses the following
kind.yaml
kind.yaml configuration
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
name: gputest
nodes:
- role: control-plane
extraMounts:
- hostPath: /dev/null
containerPath: /var/run/nvidia-container-devices/all
- role: worker
extraMounts:
- hostPath: /dev/null
containerPath: /var/run/nvidia-container-devices/all
- role: worker
extraMounts:
- hostPath: /dev/null
containerPath: /var/run/nvidia-container-devices/all
- role: worker
extraMounts:
- hostPath: /dev/null
containerPath: /var/run/nvidia-container-devices/allkind create cluster --config kind.yamlThis will create a single master, 3 worker cluster with nvidia mounted.
In Sams article K8s Kind with GPUs linked above, he lists a step where he
required symlinking ldconfig to ldconfig.real inside the cluster nodes
I found I did not need to undertake this step with Kind nodes 1.29 however if you are using an older container runtime, this may still be relevant for your environment.
If this is the case, apply the following to add a symlink to the kind nodes.
for name in $(k get no -o jsonpath="{.items[*].metadata.name}"); do
docker exec -ti ${name} ln -s /sbin/ldconfig /sbin/ldconfig.real
doneYou should only need this step if the GPU operator fails to start.
For your cluster to become gpu ready you need to install the GPU operator
from nvidia. This can be installed via helm from nvidia/gpu-operator
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia || true
helm repo update
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator --set driver.enabled=falseDepending on your system, it may then take a while for the operator to become fully available.
Once ready, the worker nodes should have the allocation nvidia.com/gpu
assigned to them
Caution
This may take a while to become ready. In testing, I saw times up to 5 minutes for a 3 node cluster.
$ kubectl get node -o yaml | yq '.items[] | [{"name": .metadata.name, "status": .status.allocatable."nvidia.com/gpu"}]'
- name: gputest-control-plane
status: null
- name: gputest-worker
status: "1"
- name: gputest-worker2
status: "1"
- name: gputest-worker3
status: "1"If you are only running a single node cluster, this may be on the control-plane instead.
Once the kind cluster is ready, we want to be able to schedule GPU workloads inside a vcluster loaded inside our kind cluster.
To make this possible, we need to instruct our vcluster nodes to read their state from the real kind cluster nodes.
First, lets install vcluster. We'll do this using ClusterAPI.
Note
This requires clusterctl >= 1.9.0
If you do not have clusterctl installed on your machine, you can install it
by running the following for AMD 64. Remember to change the architecture if
you are on a different platform.
VERSION=$(curl --silent "https://api.github.com/repos/kubernetes-sigs/cluster-api/releases/latest" | jq -r .tag_name)
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/${VERSION}/clusterctl-linux-amd64 -o clusterctl \
&& sudo install -o root -g root -m 0755 clusterctl /usr/local/bin/clusterctlOnce you have the latest clusterctl binary installed, initialise this into the kind cluster with:
clusterctl init --infrastructure vclusterIn order to use the GPU with vCluster, we need to expose the real kind nodes to vcluster rather than using a virtual node for this purpose. This is achieved by syncing real nodes to the virtual cluster by setting the following values to the cluster chart:
sync:
fromHost:
nodes:
enabled: trueRef: https://www.vcluster.com/docs/vcluster/configure/vcluster-yaml/sync/from-host/nodes
Save the following yaml out as cluster.yaml:
Cluster API for vClusters CR
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: kind
namespace: vcluster
spec:
controlPlaneRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: VCluster
name: kind
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: VCluster
name: kind
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: VCluster
metadata:
name: kind
namespace: vcluster
spec:
controlPlaneEndpoint:
host: ""
port: 0
helmRelease:
chart:
name: vcluster
repo: https://charts.loft.sh
version: 0.22.1
values: |-
sync:
fromHost:
nodes:
enabled: trueNext, create a new vcluster namespace and apply the cluster.yaml file
k create ns vcluster
k apply -f cluster.yamlWait for the cluster to come up and then connect to it with
vcluster connect kind -n vclusterNote
Similar to clsuterctl above, if you do not have vcluster installed, then
you may install it for amd64 with:
VERSION=$(curl --silent "https://api.github.com/repos/loft-sh/vcluster/releases/latest" | jq -r .tag_name)
curl -L -o vcluster "https://github.com/loft-sh/vcluster/releases/download/${VERSION}/vcluster-linux-amd64" \
&& sudo install -c -m 0755 vcluster /usr/local/bin && rm -f vclusterWe can verify that the node has the allocation by again running:
kubectl get node -o yaml | yq '.items[] | [{"name": .metadata.name, "status": .status.allocatable."nvidia.com/gpu"}]'
- name: gputest-worker
status: "1"Now we need the GPU operator to be installed again, however as we're running in
a vCluster, and the allocations are coming from the kind cluster nodes, we
can safely ignore installing the toolkit and just install the operator.
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false,toolkit.enabled=falseOnce the operator has started inside the vcluster, create a test pod to verify that the GPU is accessible
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
resources:
limits:
nvidia.com/gpu: 1
EOFWait for a few seconds for the pod to start and then check its logs
$ k logs cuda-vectoradd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
DoneThat's it. You should have a successful GPU enabled VCluster running inside a kind cluster.