Skip to content

Instantly share code, notes, and snippets.

@dcasati
Last active November 26, 2024 23:18
Show Gist options
  • Save dcasati/7a82fd530497b6f3ae7decaed3474925 to your computer and use it in GitHub Desktop.
Save dcasati/7a82fd530497b6f3ae7decaed3474925 to your computer and use it in GitHub Desktop.
# Source: nvidia-device-plugin/templates/daemonset-device-plugin.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvdp-nvidia-device-plugin
namespace: nvidia-device-plugin
labels:
helm.sh/chart: nvidia-device-plugin-0.15.0
app.kubernetes.io/name: nvidia-device-plugin
app.kubernetes.io/instance: nvdp
app.kubernetes.io/version: "0.15.0"
app.kubernetes.io/managed-by: Helm
spec:
selector:
matchLabels:
app.kubernetes.io/name: nvidia-device-plugin
app.kubernetes.io/instance: nvdp
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/name: nvidia-device-plugin
app.kubernetes.io/instance: nvdp
annotations:
{}
spec:
priorityClassName: system-node-critical
securityContext:
{}
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
imagePullPolicy: IfNotPresent
name: nvidia-device-plugin-ctr
command: ["nvidia-device-plugin"]
env:
- name: MPS_ROOT
value: "/run/nvidia/mps"
- name: MIG_STRATEGY
value: "mixed"
- name: NVIDIA_MIG_MONITOR_DEVICES
value: all
- name: NVIDIA_VISIBLE_DEVICES
value: all
- name: NVIDIA_DRIVER_CAPABILITIES
value: compute,utility
securityContext:
capabilities:
add:
- SYS_ADMIN
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
# The MPS /dev/shm is needed to allow for MPS daemon health-checking.
- name: mps-shm
mountPath: /dev/shm
- name: mps-root
mountPath: /mps
- name: cdi-root
mountPath: /var/run/cdi
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: mps-root
hostPath:
path: /run/nvidia/mps
type: DirectoryOrCreate
- name: mps-shm
hostPath:
path: /run/nvidia/mps/shm
- name: cdi-root
hostPath:
path: /var/run/cdi
type: DirectoryOrCreate
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: feature.node.kubernetes.io/pci-10de.present
operator: In
values:
- "true"
- matchExpressions:
- key: feature.node.kubernetes.io/cpu-model.vendor_id
operator: In
values:
- NVIDIA
- matchExpressions:
- key: nvidia.com/gpu.present
operator: In
values:
- "true"
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment