mhamilton723/Tensorflow Serving with a GPU Kubernetes cluster on Azure.md

## Tensorflow Serving with a GPU Kubernetes cluster on Azure.md

      
    Raw
  

              Tensorflow Serving with a GPU Kubernetes cluster on Azure.md
            
          
    Tensorflow Serving with a GPU Kubernetes cluster on Azure

Prerequisites


az command line version >= 2.0.23

Making the Kubernetes cluster

First make a resource group to house your deployment. Note that as of 12/22/2017 this will only work in westus2 and uksouth because of Nvidia driver deployment.
az group create -n gpu-cluster-rg -l westus2 
Spin up a cluster:
az acs create \ 
    -n gpu-cluster \
    --orchestrator-type Kubernetes \
    -g gpu-cluster-rg \
    --agent-vm-size Standard_NC6 \
    --generate-ssh-keys \
    -l westus2
Connect to your cluster with kubectl

Install the kubectl cli:
az acs kubernetes install-cli
Get the cluster's credentials:
az acs kubernetes get-credentials --resource-group=gpu-cluster-rg --name=gpu-cluster

Quick sanity check:

To make sure that the you are connected to your cluster and that the drivers are in working condition check the following
 kubectl get nodes | grep agentpool

Should return something like:
k8s-agentpool0-98822346-0   Ready     agent     24m       v1.7.9

We can now check to make sure the cluster has the proper drivers:
kubectl describe node k8s-agentpool0-98822346-0 | grep nvidia-gpu

Should return something like:
 alpha.kubernetes.io/nvidia-gpu:  1
 alpha.kubernetes.io/nvidia-gpu:  1

The first line is your overall capacity, and your second line is the GPUs you have availible (this will be zero if you have GPU obs running). If your first line is  alpha.kubernetes.io/nvidia-gpu:  0 you might not be in a region that has the latest gpu drivers, or have a recent enough az command line.
Using kubectl's UI

You can see a nice graphical manager for your kubernetes cluster using the following:
kubectl proxy

This will deploy a website to localhost:8001. You can configure the port with the --port flag. Now you can navegat to  localhost:8001/ui to see the manager.
Tensorflow Serving

Tensorflow serving is a library for deploying tensorflow models efficiently. They supply a GPU enabled dockerfile that as of 12/22/2017 does not compile. We have built and published an earlier version of this docker image so you can jump straight to the deployment:
Here is a yaml file for a simple tf-serving deployment. The to deploy your own model, zip up an exported tensorflow model and host it online. We use azure blob storage, and generate SAS urls. Paste whatever URL you use in the <YOUR_MODEL_URL> section of the yaml.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: tf-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: tf-server
    spec:
      volumes:
        - name: bin
          hostPath: 
            path: /usr/lib/nvidia-384/bin
        - name: lib
          hostPath:
            path: /usr/lib/nvidia-384
        - name: libcuda
          hostPath:
            path: /usr/lib/x86_64-linux-gnu/libcuda.so.1
      containers:
      - name: tf-container
        image: mhamilton723/tensorflow-serving-devel-gpu
        command: ["/bin/sh", "-c"]
        args: ["MODEL_URL=\"<YOUR_MODEL_URL>\";
          MODEL_NAME=saved_model;
          PORT=9000;
          cd /serving;
          mkdir models;
          ZIP_FILE=\"model.zip\";
          curl -o \"$ZIP_FILE\" \"$MODEL_URL\";
          echo \"HERE\";
          python -m zipfile -e \"$ZIP_FILE\" /serving/models/1;
          rm \"$ZIP_FILE\";
          ls -l /serving/models/;
          tensorflow_model_server --port=\"$PORT\" --model_name=\"$MODEL_NAME\" --model_base_path=\"/serving/models/\""]
        ports:
        - containerPort: 9000
        resources:
          limits:
            alpha.kubernetes.io/nvidia-gpu: 1
        volumeMounts:
        - mountPath: /usr/local/nvidia/bin
          name: bin
        - mountPath: /usr/local/nvidia/lib64
          name: lib
        - mountPath: /usr/lib/x86_64-linux-gnu/libcuda.so.1
          name: libcuda

---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: tf-service
  name: tf-service
spec:
  ports:
  - port: 9000
    targetPort: 9000
  selector:
    app: tf-server
  type: LoadBalancer

Once you have modified the above yaml file to point to your model, save it to a file, such as tf-serving.yaml. Then you can deploy it to the cluster with
kubectl create -f tf-serving.yaml
Note this will take about 1 minute to boot up the servers on the nodes, and will take around 5 minutes to make the service endpoint.
Calling your API

Grab the IP address of your new service using
kubectl get services
You should see an output like the following:
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)          AGE
kubernetes   ClusterIP      10.0.0.1       <none>          443/TCP          1d
tf-service   LoadBalancer   10.0.166.211   <YOUR _IP>      9000:30242/TCP   7m

<YOUR_IP> is the external IP adress you will call in order to query your model.
Authors


Mark Hamilton, marhamil@microsoft.com
Andrew Shonhoffer

Thanks to


William Buchwalter