Skip to content

Instantly share code, notes, and snippets.

@alexeldeib
Created July 25, 2023 23:34
Show Gist options
  • Save alexeldeib/ff2eea629cc4cf58e9581f5ed7b01ddc to your computer and use it in GitHub Desktop.
Save alexeldeib/ff2eea629cc4cf58e9581f5ed7b01ddc to your computer and use it in GitHub Desktop.
nvidia ds with time slicing aks
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# Mark this pod as a critical add-on; when enabled, the critical add-on
# scheduler reserves resources for critical add-on pods so that they can
# be rescheduled after a failure.
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.14.0
name: nvidia-device-plugin-ctr
env:
- name: CONFIG_FILE
value: "/opt/config/config.yaml"
securityContext:
privileged: true
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: config
mountPath: "/opt/config"
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: config
configMap:
name: nvidia-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-config
namespace: kube-system
labels:
app: nvidia
data:
config.yaml: |-
version: v1
flags:
migStrategy: "none"
failOnInitError: false
nvidiaDriverRoot: "/"
plugin:
passDeviceSpecs: true
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment