qin-yu/gpu-tf-ubuntu.md

## gpu-tf-ubuntu.md

      
    Raw
  

              gpu-tf-ubuntu.md
            
          
    Use TensorFlow with GPU support on Ubuntu with Docker

Docker is the easiest way to run TensorFlow on a GPU since the host machine only requires the NVIDIA® driver (the NVIDIA® CUDA® Toolkit is not required).
Table of Contents


Use TensorFlow with GPU support on Ubuntu with Docker

Table of Contents
System tested
Install Docker Engine - Community (using the repository)

Set up the repository
Install Docker Engine - Community
How to avoid sudo docker


For GPU support on Linux, install NVIDIA Docker support

Verify CUDA Installation
Install NVIDIA Container Toolkit
Usage


Download a TensorFlow Docker image
Start a CPU-only TensorFlow Docker container
Use TensorFlow GPU Docker image
Start a container with a bind mount


System tested


Ubuntu 18.04.1
NVRM 435.21
GCC 7.5.0
Docker 19.03.8

Install Docker Engine - Community (using the repository)

Set up the repository


To allow apt to use a repository over HTTPS:
$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

Add Docker’s official GPG key:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88

Set up the stable repository:
$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"


Install Docker Engine - Community


Install the latest version of Docker Engine - Community and containerd:
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io

Verify that Docker Engine - Community is installed correctly:
$ sudo docker run hello-world


How to avoid sudo docker

The Docker daemon binds to a Unix socket instead of a TCP port. By default that Unix socket is owned by the user root and other users can only access it using sudo. The Docker daemon always runs as the root user. If you don’t want to preface the docker command with sudo, create a Unix group called docker and add users to it. When the Docker daemon starts, it creates a Unix socket accessible by members of the docker group.

Create the docker group:
$ sudo groupadd docker

Add your user to the docker group:
$ sudo usermod -aG docker $USER

Log out and log back in so that your group membership is re-evaluated. On Linux, you can also run the following command to activate the changes to groups:
$ newgrp docker

Verify that you can run docker commands without sudo:
$ docker run hello-world


For GPU support on Linux, install NVIDIA Docker support

Verify CUDA Installation

Make sure you have installed the NVIDIA driver and Docker 19.03 for your Linux distribution Note that you do not need to install the CUDA toolkit on the host, but the driver needs to be installed.

Verify driver version:
$ /proc/driver/nvidia/version
This could be
NVRM version: NVIDIA UNIX x86_64 Kernel Module  435.21  Sun Aug 25 08:17:57 CDT 2019
GCC version:  gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

Verify the CUDA Toolkit version:
$ nvcc -V
This could be Command 'nvcc' not found.

Install NVIDIA Container Toolkit

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker
Usage

sudo if permission denied:

Test nvidia-smi with the latest official CUDA image
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

Start a GPU enabled container on two GPUs
$ docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi

Starting a GPU enabled container on specific GPUs
$ docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi
$ docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi


Specifying a capability (graphics, compute, ...) for my container
$ docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi
Note this is rarely if ever used this way.

Download a TensorFlow Docker image

$ docker pull tensorflow/tensorflow:latest-gpu-py3-jupyter
To check what images are on the machine:
$ docker image ls
One should see the following output if he follows this gist:
REPOSITORY              TAG                      IMAGE ID            CREATED             SIZE
tensorflow/tensorflow   latest-gpu-py3-jupyter   ce8f7398433c        2 months ago        4.26GB
nvidia/cuda             10.0-base                841d44dd4b3c        4 months ago        110MB
hello-world             latest                   fce289e99eb9        15 months ago       1.84kB
Note if one attempts to run:
$ docker run -it --rm tensorflow/tensorflow \
   python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
a new image tensorflow/tensorflow:latest will be downloaded:
REPOSITORY              TAG                      IMAGE ID            CREATED             SIZE
tensorflow/tensorflow   latest-gpu-py3-jupyter   ce8f7398433c        2 months ago        4.26GB
tensorflow/tensorflow   latest                   9bf93bf90865        2 months ago        2.47GB
nvidia/cuda             10.0-base                841d44dd4b3c        4 months ago        110MB
hello-world             latest                   fce289e99eb9        15 months ago       1.84kB
Start a CPU-only TensorFlow Docker container

In general:
$ docker run [-it] [--rm] [-p hostPort:containerPort] tensorflow/tensorflow[:tag] [command]

Start a bash shell session within a TensorFlow-configured container:
$ docker run -it tensorflow/tensorflow bash

To run a TensorFlow program developed on the host machine within a container, mount the host directory and change the container's working directory (-v hostDir:containerDir -w workDir):
$ docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow python ./script.py

Start a Jupyter Notebook server using TensorFlow's nightly build with Python 3 support:
$ docker run -it -p 8888:8888 tensorflow/tensorflow:nightly-py3-jupyter


Use TensorFlow GPU Docker image


Check if a GPU is available:
$ lspci | grep -i nvidia

Verify your nvidia-docker installation:
$ docker run --gpus all --rm nvidia/cuda nvidia-smi

Download and run a GPU-enabled TensorFlow image:
$ docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \
    python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
The output is commented below.
Use the latest TensorFlow GPU image to start a bash shell session in the container:
$ dock run --gpus all -it tensorflow/tensorflow:latest-gpu bash


Start a container with a bind mount

Consider a case where you have a directory source and that when you build the source code, the artifacts are saved into another directory, source/target/. You want the artifacts to be available to the container at /app/, and you want the container to get access to a new build each time you build the source on your development host. Use the following command to bind-mount the target/ directory into your container at /app/. Run the command from within the source directory. The $(pwd) sub-command expands to the current working directory on Linux or macOS hosts.
$ docker run -d \
    -it \
    --name msc2 \
    --mount type=bind,source="$(pwd)"/.,target=/mounteddir \
    tensorflow/tensorflow:latest-gpu-py3
-d means detached. To bring it to foreground, use docker attach CONTAINER, where CONTAINER is a custom name. Note this is more complex than needed to provide a more complete usage. The simplified command I use is, with the port specified for Jupyter notebook:
$ docker run -it \
    --name msc2 \
    -p 8888:8888 \
    --mount type=bind,source="$(pwd)"/.,target=/mounteddir \
    tensorflow/tensorflow:latest-gpu-py3
Run docker rename CONTAINER NEW_NAME to rename the container, and use docker rm CONTAINER to delete the unwanted ones.
The docker run command first creates a writeable container layer over the specified image, and then starts it using the specified command. That is, docker run is equivalent to the API /containers/create then /containers/(id)/start. A stopped container can be restarted with all its previous changes intact using docker start. See docker ps -a to view a list of all containers.[*]
Next time when you want to use the it, with the status being the same as when it exited [*]:
$ docker restart msc2
$ docker attach msc2
Now, in the docker container msc2, you can do pip install notebook and launch Jupyter notebook using [*]:
$ jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root
The jupyter notebook software will give an URL which can be used in browsers outside the Docker container since we specified port with -p.