Skip to content

Instantly share code, notes, and snippets.

@bendangnuksung
Last active January 7, 2022 08:59
Show Gist options
  • Save bendangnuksung/9a86e521bbdd2d783d6d8a0fe5fdb82e to your computer and use it in GitHub Desktop.
Save bendangnuksung/9a86e521bbdd2d783d6d8a0fe5fdb82e to your computer and use it in GitHub Desktop.
nvidia docker, tensorflow model server GPU setup for ubuntu (for cuda 9)

Setup Nvidia docker, TF model server GPU setup

Install Docker

Reference here

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce

# check docker status
sudo systemctl status docker


Changing default docker container location (Optional)

Stop docker

sudo systemctl stop docker

Edit /etc/docker/daemon.json

vim /etc/docker/daemon.json

And then add the following lines:

{
  "graph":"/PATH/TO/YOUR/CONTAINER/STORAGE"
}

Start Docker

systemctl start docker

Install nvidia docker

Reference here

# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
sudo docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# restart service
sudo systemctl restart docker

# Test nvidia-smi with the latest official CUDA image
sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

Check if nvidia driver works

# There will be a docker image downloaded called 'nvidia/cuda'
# we will run that docker in bash

sudo docker images
# Copy the 'IMAGE ID' of the docker 'nvidia/cuda'

# run the docker in bash
sudo docker run --runtime=nvidia -it IMAGE_ID bash

# Once you logged in it'll will show as "root@670724d97363:   "
# Now test nvidia drivers are working
nvidia-smi

# It should show something like this
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 390.116                Driver Version: 390.116                   |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
# |===============================+======================+======================|
# |   0  GeForce GT 1030     Off  | 00000000:1D:00.0  On |                  N/A |
# | 36%   44C    P8    N/A /  30W |    286MiB /  1998MiB |      1%      Default |
# +-------------------------------+----------------------+----------------------+
#                                                                                
# +-----------------------------------------------------------------------------+
# | Processes:                                                       GPU Memory |
# |  GPU       PID   Type   Process name                             Usage      |
# |=============================================================================|
# +-----------------------------------------------------------------------------+

# Then exit
exit

Download Tensorflow serving GPU

Note: Cloning tf serving 1.12.0
Reference here and selecting version reference from here

sudo docker pull tensorflow/serving:1.12.0-gpu

you will have a docker image name 'tensorflow/serving'

sudo docker images

Copying your custom model or tensorflow model script to the docker image

'tensorflow/serving' docker image has default ENTRYPOINT where it calls the 'tensorflow_model_server' command when it is run
Inorder to overcome we need to overwrite the ENTRYPOINT when running it.

# First copy the IMAGE ID of the ''tensorflow serving'
sudo docker run --runtime=nvidia --entrypoint bash -it 'IMAGE_ID_tensorflow/serving'

Keep the docker running and open a new terminal. So that we can copy our models into it.

# Get the Container ID of the running docker
sudo docker ps
# Copy the CONAINTER ID

# Now copy your models into it
sudo docker cp <SOURCE_MODEL_PATH> <CONTAINER_ID>:<DESTINATION_MODEL_PATH>

# Once finished copying need to save it, so we need to commit it.
sudo docker <CONTAINER_ID> <your_repo_name>:<tag_name>
# current <your_repo_name> --> tensorflow/serving
# <tag_name> --> more like commit name

Once completed you can exit from the docker container using:

exit

If your custom tensorflow model server needs to run a Script (usually to run multiple models using model server) follow this:

  1. Follow the same step above, which is copying your script to the container
  2. Make the script executable:
    chmod +x script_path.sh
  3. Then save the container using COMMIT. like above
  4. Exit from the running container
  5. Get the new container IMAGE ID
  6. Then run the container.
    sudo docker run --runtime=nvidia -it --entrypoint=<PATH_TO_THE_SCRIPT> IMAGE_ID
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment