Skip to content

Instantly share code, notes, and snippets.

@allanlei
Created June 26, 2020 05:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save allanlei/e9361b3edfa5f687ce7955d2fc386a7a to your computer and use it in GitHub Desktop.
Save allanlei/e9361b3edfa5f687ce7955d2fc386a7a to your computer and use it in GitHub Desktop.
Custom Driver Install for NVIDIA on GKE
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-driver-installer-cos
namespace: kube-system
labels:
k8s-app: nvidia-driver-installer-cos
spec:
selector:
matchLabels:
k8s-app: nvidia-driver-installer-cos
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: nvidia-driver-installer-cos
k8s-app: nvidia-driver-installer-cos
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-accelerator
operator: Exists
- matchExpressions:
- key: cloud.google.com/gke-os-distribution
operator: In
values:
- "cos"
tolerations:
- operator: "Exists"
hostNetwork: true
hostPID: true
volumes:
- name: dev
hostPath:
path: /dev
- name: vulkan-icd-mount
hostPath:
path: /home/kubernetes/bin/nvidia/vulkan/icd.d
- name: nvidia-install-dir-host
hostPath:
path: /home/kubernetes/bin/nvidia
- name: root-mount
hostPath:
path: /
- name: cos-tools
hostPath:
path: /var/lib/cos-tools
initContainers:
- image: "cos-nvidia-installer:fixed"
imagePullPolicy: Never
name: nvidia-driver-installer
resources:
requests:
cpu: 0.15
securityContext:
privileged: true
env:
- name: NVIDIA_DRIVER_VERSION
value: "440.64.00"
- name: NVIDIA_INSTALL_DIR_HOST
value: /home/kubernetes/bin/nvidia
- name: NVIDIA_INSTALL_DIR_CONTAINER
value: /usr/local/nvidia
- name: VULKAN_ICD_DIR_HOST
value: /home/kubernetes/bin/nvidia/vulkan/icd.d
- name: VULKAN_ICD_DIR_CONTAINER
value: /etc/vulkan/icd.d
- name: ROOT_MOUNT_DIR
value: /root
- name: COS_TOOLS_DIR_HOST
value: /var/lib/cos-tools
- name: COS_TOOLS_DIR_CONTAINER
value: /build/cos-tools
volumeMounts:
- name: nvidia-install-dir-host
mountPath: /usr/local/nvidia
- name: vulkan-icd-mount
mountPath: /etc/vulkan/icd.d
- name: dev
mountPath: /dev
- name: root-mount
mountPath: /root
- name: cos-tools
mountPath: /build/cos-tools
containers:
- image: "gcr.io/google-containers/pause:2.0"
name: pause
@adityaSharma369
Copy link

adityaSharma369 commented Jul 3, 2020

I used above code but it is not going into ready state

NAMESPACE     NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR           
kube-system   nvidia-driver-installer-cos   1         1         0       1            0           <none>    

But if iam removing NVIDIA_DRIVER_VERSION from env,it is running...

@adityaSharma369
Copy link

adityaSharma369 commented Jul 3, 2020

can you please help me with this @allanlei

@allanlei
Copy link
Author

allanlei commented Jul 5, 2020

@adityaSharma369 Unfortunately, you haven't provided any information for me to help you. It would also be very beneficial to know how to manage and debug kubernetes before using GPUs.

@gtato
Copy link

gtato commented May 6, 2022

I tested this with "510.47.03" but it does not do much, it still installs the default version "450.119.04".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment