thoraxe/instructions.adoc

## instructions.adoc

      
    Raw
  

              instructions.adoc
            
          
    GPU demos with OpenShift using Caffe2


Environment


The RHPDS catalog item you provisioned is a single-node OpenShift environment
that is backed by an Amazon P-type EC2 instance which has 1 NVIDIA GPU. It is
a 100% vanilla/standard OpenShift Container Platform 3.10 installation.
Post-install, there were a few additional things done consuming Ansible
content from the https://github.com/redhat-performance/openshift-psap
repository:


The NVIDIA hardware driver for the GPU was installed on the host

The driver provides kernel support for using the GPU


The NVIDIA runtime hook was installed

The hook is a prestart hook, which means it is executed right before the app/container (ENTRYPOINT, COMMAND)
starts. The hook is mount binding binaries, libraries, devices and configuration files into the running
container. Making the needed files available to run a CUDA workload.


The NVIDIA device plugin daemonset was installed

The DevicePlugin is a daemonset that runs on all nodes in the environment and discovers the GPUs.
It then makes the GPUs an allocatable resource that pods can consume.


After these initial steps were performed, a test workload (CUDA vector) was
deployed. This pod ran once and terminated, but it remains in the
nvidia-device-plugin OpenShift project so that you can look at its logs.


Demonstrate GPU Availability


The email you received from RHPDS provided SSH access information. You can
sudo -i to root once you SSH in to the host. Then, you can simply
demonstrate that Kubrernetes/OpenShift recognizes your node has GPU capacity
with the following command:


oc describe $(oc get node -o name) | grep Capacity -A12


You should see that there is a GPU capacity of 1 and a single GPU allocatable
as well (nvidia.com/gpu). At this time there is no support for fractional
consumption/utilization of GPU resources. They are represented as single,
whole GPUs and must be requested by the workload in integer increments.


Note


There is upstream work being done on slicing / "virtualizing" GPUs, but it is still in its early phases.
The following Kubernetes issue tracks that design work and our involvement:
kubernetes/kubernetes#52757


Login to the Cluster Web Console


The email you received from RHPDS provided SSH access information. Your
cluster web console is available at the same hostname, on port 8443:


https://bastion.GUID.openshiftworkshop.com:8443


Note


The environment does not currently use LetsEncrypt so you will need to accept
the self-signed certificate.


You can log in as gpu-user with any password. The cluster is set up to use
AnyPassword which means you can log in with any user, however only
gpu-user has been given cluster-admin rights. Make sure you use gpu-user.


Demonstrate GPU Consumption - CUDA Vector


The aforementioned CUDA vector pod has been deployed in the
nvidia-device-plugin project.


Navigate to that project in the web console.


Hover over Applications in the left navigation


Click Pods


You will find the terminated CUDA pod (cuda-vector-add). Click into the
pod, then go to the Logs tab and show that the pod was successful (Test
PASSED). This means that the program inside of the pod was able to access
the NVIDIA GPU. This is a very basic demonstration that, in fact, GPUs can be
consumed from workloads in OpenShift
Pods.


Caffe2 Application


The rest of the demo you will do uses a [Caffe2](https://caffe2.ai)
workload that runs in a [Jupyter](https://jupyter.org) notebook.


Deploy YAML


The YAML defines a Pod and Service for the application. You can use the Import YAML function in the UI:


https://raw.githubusercontent.com/thoraxe/openshift-psap/ocp-311-tweaks/playbooks/roles/gpu-pod/caffe2-1gpu.yaml


Expose Service


Once deployed, you will need to expose the app with a Route. You can
do this using the CLI or using the web console. The Service is simply called caffe2.


Note


The Caffe2 image is not currently pre-pulled so this would be a good time to
visit the Caffe2 site to talk about what it actually is.


Obtain the Token


Jupyter notebooks have a security feature that requires the use of a token.
Once the app is deployed and exposed, you can execute the following scriptlet on the
console where you SSH’d to look at the GPU resources. It will give you the URL and the token
to directly access the application:


ROUTE=$(oc get routes -n nvidia-device-plugin | grep caffe2 | awk '{print $2}')
TOKEN=$(oc logs -n nvidia-device-plugin pod/caffe2|head -4|grep token=|awk -Ftoken= '{print $2}')
echo http://$ROUTE/notebooks/caffe2/caffe2/python/tutorials/MNIST.ipynb?token=$TOKEN


Visit the URL that is output from the scriptlet.


Execute the Notebook


Jupyter notebooks make it easy to share and describe code. All of the lines
in a notebook are essentially executable code, or descriptions. In the
Jupyter notebook URL you visited:


Click Kernel and then click Restart & Clear Output


then, click Kernel and then click Restart & Run All


This will cause Jupyter to start executing each line of code in the
notebook sequentially.


About the Notebook


The general workflow for the most ML frameworks (caffe, caffe2, mxnet, torch etc.) is the following,
and that is what most examples (MNIST, CIFAR-10,…) for any framework will do, only the API and the
dataset will change.


Get your dataset: The example is using the MNIST dataset which is a collection of 70,000 handwritten
digits. The dataset consists of 60 000 training samples and 10 000 test samples.


Create the data format the framework understands Caffe2 uses the LMDB format. The notebook downloads
the databases that were converted in another notebook (not demonstrated) to the framework native format


Note


see MNIST_Dataset_and_Databases.ipynb for more information


Create the model: The example uses the Lenet model, which is a Convolutional Neural Network (CNN).
This is a pretty standard model used in things related to visual recognition.


Train the model: The configured model (input, layers , output) will be fed with the training data
up to a specified accuracy (for benchmarks: time to accuracy TTA) or numbers of epochs/iterations. The
example uses 200 iterations with a batch size of 64 which results in 12800 samples being trained against.


Test the model: After training the model, it is tested against the 10 000 test images and the test
accuracy is reported.


Inference: The saved model can now be used to do inference on other hand written digits.


The "compute" (calculation) intensive part of the whole pipeline is, as you might have guessed, the training
of the underlying model. The main operations performed are essentially matrix multiplication.
(Forward & Backward Pass, Tensors) Those multiplications can be done massively parallel on a
GPU (For every neuron in every layer there are a lot of computations to be done).