Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@ritazh
Forked from wbuchwalter/docker-gpu-howto.md
Created March 8, 2017 23:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ritazh/8a8ccae58741d69e66de963c4a7676cb to your computer and use it in GitHub Desktop.
Save ritazh/8a8ccae58741d69e66de963c4a7676cb to your computer and use it in GitHub Desktop.

1. Create k8s cluster

  • Use my fork of ACS-Engine on the k8s-gpu-flag branch
  • Launch and build ACS-Engine, and customize the kubernetes example template to have NC6 VMs (example
  • Create RG: az group create --name k8s --location southcentralus (or other region with GPU)
  • Deploy the generated template az group deployment create --ressource-group k8s --template-file azuredeploy.json --parameters @azuredeploy.pamareters.json

2. Install drivers

TODO: It would be cool to improve this script to install the NVIDIA libraries and binaries in a specific folder, which will make it much easier to expose the drivers to the container in the next step

3. Scheduling a GPU container

  • You need to specify alpha.kubernetes.io/nvidia-gpu: 1 as a limit and request
  • You need to expose the drivers to the container as a volume. If you are using TF original docker image, it is based on ubuntu 16.04, just like your cluster's VM, so you can just mount /usr/bin and /usr/lib/x86_64-linux-gnu, it's a bit dirty but it works. Ideally, improve the previous script to install the driver in a specific directory and only expose this one.
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-smi
spec:      
  containers:
  - image: nvidia/cuda
    name: nvidia-smi
    args:
      - nvidia-smi
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
      requests:
        alpha.kubernetes.io/nvidia-gpu: 1  
    volumeMounts:
    - mountPath: /usr/bin/
      name: binaries
    - mountPath: /usr/lib/x86_64-linux-gnu
      name: libraries
  volumes:
  - name: binaries
    hostPath:
      path: /usr/bin/
  - name: libraries
    hostPath:
      path: /usr/lib/x86_64-linux-gnu

4. Limitations

  • My acs-engine fork uses the current released version of k8s, which only supports 1 GPU. I plan on upgrading to 1.6 beta soon with multi-GPU support, but haven't had time yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment