Skip to content

Instantly share code, notes, and snippets.

@bgulla
Created April 17, 2023 20:31
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save bgulla/5ea0e7fd310b5db4f9b66036d1cdb3d3 to your computer and use it in GitHub Desktop.
Save bgulla/5ea0e7fd310b5db4f9b66036d1cdb3d3 to your computer and use it in GitHub Desktop.
RKE2/K3s Nvidia GPU-Operator installation
prep:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
&& helm repo update
install:
helm install --wait nvidiagpu \
-n gpu-operator --create-namespace \
--set toolkit.env[0].name=CONTAINERD_CONFIG \
--set toolkit.env[0].value=/var/lib/rancher/k3s/agent/etc/containerd/config.toml \
--set toolkit.env[1].name=CONTAINERD_SOCKET \
--set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
--set toolkit.env[2].value=nvidia \
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
--set-string toolkit.env[3].value=true \
nvidia/gpu-operator
delete:
helm uninstall -n gpu-operator nvidiagpu
cluster-info:
kubectl get nodes -o wide
@gaalw
Copy link

gaalw commented Mar 27, 2024

Hello! It is great script. But how to check that gpu-operator is working well?

@shan100github
Copy link

you can either run kubectl run nvidia-smi --restart=Never --rm -i --tty --image nvidia/cuda:11.0.3-base-ubuntu20.04 -- nvidia-smi or
find the driver pod kubectl get pod -n gpu-operator | grep driver and run kubectl exec -it nvidia-driver-daemonset-qxtlz -n gpu-operator -- nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A5000               On  |   00000000:01:00.0 Off |                  Off |
| 30%   37C    P8             28W /  230W |       1MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+```

@gecube
Copy link

gecube commented Apr 3, 2024

I want to run it on Jetson platform (arm), there is no nvidia-smi, as I know. But anyway - thanks for the precise command.

@shan100github
Copy link

probably you could try nvcr.io/nvidia/l4t-cuda:12.2.2-devel-arm64-ubuntu22.04 or images from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-cuda/tags
note: I am particularly sure and didn't tried it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment