Docker is the easiest way to run TensorFlow on a GPU since the host machine only requires the NVIDIA® driver (the NVIDIA® CUDA® Toolkit is not required).
- Use TensorFlow with GPU support on Ubuntu with Docker
- Ubuntu 18.04.1
- NVRM 435.21
- GCC 7.5.0
- Docker 19.03.8
Install Docker Engine - Community (using the repository)
- To allow apt to use a repository over HTTPS:
$ sudo apt-get update $ sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common
- Add Docker’s official GPG key:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - $ sudo apt-key fingerprint 0EBFCD88
- Set up the stable repository:
$ sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable"
- Install the latest version of Docker Engine - Community and containerd:
$ sudo apt-get update $ sudo apt-get install docker-ce docker-ce-cli containerd.io
- Verify that Docker Engine - Community is installed correctly:
$ sudo docker run hello-world
The Docker daemon binds to a Unix socket instead of a TCP port. By default that Unix socket is owned by the user root
and other users can only access it using sudo
. The Docker daemon always runs as the root
user. If you don’t want to preface the docker
command with sudo
, create a Unix group called docker
and add users to it. When the Docker daemon starts, it creates a Unix socket accessible by members of the docker
group.
- Create the
docker
group:$ sudo groupadd docker
- Add your user to the docker group:
$ sudo usermod -aG docker $USER
- Log out and log back in so that your group membership is re-evaluated. On Linux, you can also run the following command to activate the changes to groups:
$ newgrp docker
- Verify that you can run docker commands without sudo:
$ docker run hello-world
For GPU support on Linux, install NVIDIA Docker support
Make sure you have installed the NVIDIA driver and Docker 19.03 for your Linux distribution Note that you do not need to install the CUDA toolkit on the host, but the driver needs to be installed.
- Verify driver version:
This could be
$ /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 435.21 Sun Aug 25 08:17:57 CDT 2019 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
- Verify the CUDA Toolkit version:
This could be
$ nvcc -V
Command 'nvcc' not found
.
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker
sudo
if permission denied:
- Test nvidia-smi with the latest official CUDA image
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
- Start a GPU enabled container on two GPUs
$ docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi
- Starting a GPU enabled container on specific GPUs
$ docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi $ docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi
- Specifying a capability (graphics, compute, ...) for my container
Note this is rarely if ever used this way.
$ docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi
$ docker pull tensorflow/tensorflow:latest-gpu-py3-jupyter
To check what images are on the machine:
$ docker image ls
One should see the following output if he follows this gist:
REPOSITORY TAG IMAGE ID CREATED SIZE
tensorflow/tensorflow latest-gpu-py3-jupyter ce8f7398433c 2 months ago 4.26GB
nvidia/cuda 10.0-base 841d44dd4b3c 4 months ago 110MB
hello-world latest fce289e99eb9 15 months ago 1.84kB
Note if one attempts to run:
$ docker run -it --rm tensorflow/tensorflow \
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
a new image tensorflow/tensorflow:latest
will be downloaded:
REPOSITORY TAG IMAGE ID CREATED SIZE
tensorflow/tensorflow latest-gpu-py3-jupyter ce8f7398433c 2 months ago 4.26GB
tensorflow/tensorflow latest 9bf93bf90865 2 months ago 2.47GB
nvidia/cuda 10.0-base 841d44dd4b3c 4 months ago 110MB
hello-world latest fce289e99eb9 15 months ago 1.84kB
In general:
$ docker run [-it] [--rm] [-p hostPort:containerPort] tensorflow/tensorflow[:tag] [command]
- Start a bash shell session within a TensorFlow-configured container:
$ docker run -it tensorflow/tensorflow bash
- To run a TensorFlow program developed on the host machine within a container, mount the host directory and change the container's working directory (
-v hostDir:containerDir -w workDir
):$ docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow python ./script.py
- Start a Jupyter Notebook server using TensorFlow's nightly build with Python 3 support:
$ docker run -it -p 8888:8888 tensorflow/tensorflow:nightly-py3-jupyter
- Check if a GPU is available:
$ lspci | grep -i nvidia
- Verify your
nvidia-docker
installation:$ docker run --gpus all --rm nvidia/cuda nvidia-smi
- Download and run a GPU-enabled TensorFlow image:
The output is commented below.
$ docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \ python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
- Use the latest TensorFlow GPU image to start a bash shell session in the container:
$ dock run --gpus all -it tensorflow/tensorflow:latest-gpu bash
Consider a case where you have a directory source
and that when you build the source code, the artifacts are saved into another directory, source/target/
. You want the artifacts to be available to the container at /app/
, and you want the container to get access to a new build each time you build the source on your development host. Use the following command to bind-mount the target/
directory into your container at /app/
. Run the command from within the source
directory. The $(pwd)
sub-command expands to the current working directory on Linux or macOS hosts.
$ docker run -d \
-it \
--name msc2 \
--mount type=bind,source="$(pwd)"/.,target=/mounteddir \
tensorflow/tensorflow:latest-gpu-py3
-d
means detached. To bring it to foreground, use docker attach CONTAINER
, where CONTAINER
is a custom name. Note this is more complex than needed to provide a more complete usage. The simplified command I use is, with the port specified for Jupyter notebook:
$ docker run -it \
--name msc2 \
-p 8888:8888 \
--mount type=bind,source="$(pwd)"/.,target=/mounteddir \
tensorflow/tensorflow:latest-gpu-py3
Run docker rename CONTAINER NEW_NAME
to rename the container, and use docker rm CONTAINER
to delete the unwanted ones.
The docker run
command first creates a writeable container layer over the specified image, and then start
s it using the specified command. That is, docker run
is equivalent to the API /containers/create
then /containers/(id)/start
. A stopped container can be restarted with all its previous changes intact using docker start
. See docker ps -a
to view a list of all containers.[*]
Next time when you want to use the it, with the status being the same as when it exited [*]:
$ docker restart msc2
$ docker attach msc2
Now, in the docker container msc2
, you can do pip install notebook
and launch Jupyter notebook using [*]:
$ jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root
The jupyter notebook software will give an URL which can be used in browsers outside the Docker container since we specified port with -p
.
$ sudo docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \ python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
gives