Skip to content

Instantly share code, notes, and snippets.

@kozikow
Last active February 28, 2018 19:22
Show Gist options
  • Save kozikow/e3998c3b8a87840aa5c0aa684a21245d to your computer and use it in GitHub Desktop.
Save kozikow/e3998c3b8a87840aa5c0aa684a21245d to your computer and use it in GitHub Desktop.
Ubuntu driver repro steps
export DEV_CLUSTER_ZONE=europe-west1-b
export DEV_CLUSTER_NAME=debug-ubuntu-drivers
gcloud beta container clusters create \
    --accelerator=type=nvidia-tesla-k80,count=1 \
    --zone=$DEV_CLUSTER_ZONE \
    --num-nodes=1 \
    --cluster-version=1.9.2-gke.1 \
    --machine-type=n1-standard-8 \
    --image-type=ubuntu \
    --scopes=https://www.googleapis.com/auth/devstorage.read_write \
    $DEV_CLUSTER_NAME
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset.yaml
daemonset "nvidia-driver-installer" created
kubectl get daemonsets --all-namespaces
NAMESPACE     NAME                       DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                              AGE
kube-system   fluentd-gcp-v2.0.10        1         1         1         1            1           beta.kubernetes.io/fluentd-ds-ready=true   16m
kube-system   nvidia-driver-installer    1         1         0         1            0           <none>                                     35s
kube-system   nvidia-gpu-device-plugin   1         1         1         1            1           <none>                                     16m
kubectl get pods --namespace=kube-system
NAME                                     READY     STATUS     RESTARTS   AGE
event-exporter-v0.1.7-7c4f8bb746-8tdgn   2/2       Running    0          26m
fluentd-gcp-v2.0.10-pbqll                2/2       Running    0          24m
heapster-v1.5.0-54ffbcdc75-95hm2         3/3       Running    0          26m
kube-dns-6cdf767cb8-b9882                4/4       Running    0          26m
kube-dns-autoscaler-69c5cbdcdd-t7vhd     1/1       Running    0          26m
kube-proxy-dev-kozikow-instance          1/1       Running    0          24m
kubernetes-dashboard-775fc9968-xww4v     1/1       Running    0          26m
l7-default-backend-57856c5f55-sxd27      1/1       Running    0          26m
metrics-server-v0.2.0-86585d9749-hgt89   2/2       Running    0          26m
nvidia-driver-installer-wzztn            0/1       Init:0/1   0          14m
nvidia-gpu-device-plugin-96fzc           1/1       Running    0          24m
kubectl logs -f nvidia-driver-installer-wzztn --namespace=kube-system -c nvidia-driver-installer
+ NVIDIA_DRIVER_VERSION=384.111
+ NVIDIA_DRIVER_DOWNLOAD_URL_DEFAULT=https://us.download.nvidia.com/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
+ NVIDIA_DRIVER_DOWNLOAD_URL=https://us.download.nvidia.com/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
+ NVIDIA_INSTALL_DIR_HOST=/home/kubernetes/bin/nvidia
+ NVIDIA_INSTALL_DIR_CONTAINER=/usr/local/nvidia
++ basename https://us.download.nvidia.com/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
+ NVIDIA_INSTALLER_RUNFILE=NVIDIA-Linux-x86_64-384.111.run
+ ROOT_MOUNT_DIR=/root
+ set +x
Downloading kernel sources...
Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Get:2 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Get:4 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [102 kB]
Get:5 http://security.ubuntu.com/ubuntu xenial-security/universe Sources [72.8 kB]
Get:6 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [584 kB]
Get:7 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [12.7 kB]
Get:8 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [403 kB]
Get:9 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [3486 B]
Get:10 http://archive.ubuntu.com/ubuntu xenial/universe Sources [9802 kB]
Get:11 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:12 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:13 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:14 http://archive.ubuntu.com/ubuntu xenial/multiverse amd64 Packages [176 kB]
Get:15 http://archive.ubuntu.com/ubuntu xenial-updates/universe Sources [240 kB]
Get:16 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [951 kB]
Get:17 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [13.1 kB]
Get:18 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [760 kB]
Get:19 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [18.5 kB]
Get:20 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [5153 B]
Get:21 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [7168 B]
Fetched 25.0 MB in 2s (11.4 MB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  linux-gcp-headers-4.13.0-1007
The following NEW packages will be installed:
  linux-gcp-headers-4.13.0-1007 linux-headers-4.13.0-1007-gcp
0 upgraded, 2 newly installed, 0 to remove and 46 not upgraded.
Need to get 11.5 MB of archives.
After this operation, 84.5 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 linux-gcp-headers-4.13.0-1007 all 4.13.0-1007.10 [10.7 MB]
Get:2 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 linux-headers-4.13.0-1007-gcp amd64 4.13.0-1007.10 [726 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 11.5 MB in 0s (43.4 MB/s)
Selecting previously unselected package linux-gcp-headers-4.13.0-1007.
(Reading database ... 9482 files and directories currently installed.)
Preparing to unpack .../linux-gcp-headers-4.13.0-1007_4.13.0-1007.10_all.deb ...
Unpacking linux-gcp-headers-4.13.0-1007 (4.13.0-1007.10) ...
Selecting previously unselected package linux-headers-4.13.0-1007-gcp.
Preparing to unpack .../linux-headers-4.13.0-1007-gcp_4.13.0-1007.10_amd64.deb ...
Unpacking linux-headers-4.13.0-1007-gcp (4.13.0-1007.10) ...
Setting up linux-gcp-headers-4.13.0-1007 (4.13.0-1007.10) ...
Setting up linux-headers-4.13.0-1007-gcp (4.13.0-1007.10) ...
Downloading kernel sources... DONE.
Configuring installation directories...
/usr/local/nvidia /
Updating container's ld cache...
kubectl get pod nvidia-driver-installer-wzztn --namespace=kube-system  --template '{{.status.initContainerStatuses}}'
[map[containerID:docker://f07791b34ff5b9487b18acbb52d09c7ca8e4b1b984c4e955d356c8ad0a680273 image:gcr.io/google-containers/ubuntu-nvidia-driver-installer@sha256:7ffaf40fcf6bcc5bc87501b6be295a47ce74e1f7aac914a9f3e6c6fb8dd780a4 imageID:docker-pullable://gcr.io/google-containers/ubuntu-nvidia-driver-installer@sha256:7ffaf40fcf6bcc5bc87501b6be295a47ce74e1f7aac914a9f3e6c6fb8dd780a4 lastState:map[] name:nvidia-driver-installer ready:false restartCount:0 state:map[running:map[startedAt:2018-02-28T19:07:31Z]]]]% 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment