Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save hideojoho/6f9c517a2fade40f679490dc091a4ee3 to your computer and use it in GitHub Desktop.
Save hideojoho/6f9c517a2fade40f679490dc091a4ee3 to your computer and use it in GitHub Desktop.
How to set up Ubuntu Server 20.04 with GPU to use TensorFlow and Pytorch via JupyterLab on Docker

How to set up Ubuntu Server 20.04 with GPU to use TensorFlow and Pytorch via JupyterLab on Docker

Environment

  • Ubuntu Server 20.04
  • CUDA Driver 10.2
  • Docker 19.03
  • NVIDIA Container Toolkit 2.3.0
  • NVIDIA RTX 2080ti (or other supported GPUs)

Steps

  • Update packages
$ sudo apt update; sudo apt -y upgrade
  • Install build-essential
$ sudo apt -y install build-essential
  • Install CUDA Driver
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
$ sudo apt update
$ sudo apt -y install cuda-drivers
  • Put the run level back to 3
$ sudo systemctl set-default  multi-user.target
  • Reboot
$ sudo shutdown -r now
  • Install Docker
$ sudo apt -y install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) \
    stable"
$ sudo apt update
$ sudo apt -y install docker-ce docker-ce-cli containerd.io
  • Update permission
$ sudo usermod -aG docker $USER
  • Reboot
$ sudo shutdown -r now
  • Install NVIDIA Container Toolkit
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt update && sudo apt -y install nvidia-container-toolkit
$ sudo systemctl restart docker
  • Check Docker
$ docker ps -a
CONTAINER ID        IMAGE                   COMMAND             CREATED             STATUS                  PORTS 
  • Run a docker with GPU
$ docker run --gpus all --pid host nvidia/cuda:10.2-runtime nvidia-smi
Sat Jun 20 10:26:13 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:07:00.0 Off |                  N/A |
| 35%   38C    P0    40W / 250W |      0MiB / 11018MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Next Steps

URLs

How to set up TensorFlow

Assumption

TensorFlow Docker Image

  • Get TensorFlow Docker Image for GPU
$ docker pull tensorflow/tensorflow:latest-gpu
  • Run a new container using the image
    • -d: Start a container as a service
    • --gpus all: Use GPUs via NVIDIA Container Toolkit
    • -u root: Login as a root (default)
    • -p 8888:8888: Port forward to use Jupyer Lab from your host browser
    • -v $(pwd):/workspace: Map the host's current folder to the one /workspace inside the container to sync files
    • --shm-size: Increase shared memory size from 64MB
$ docker run -d --gpus all -it -u root -p 8888:8888 -v $(pwd):/workspace --shm-size=256m tensorflow/tensorflow:latest-gpu
  • Find out the container ID (49ad4f2852c7 is the ID here)
$ dockerp ps -a
CONTAINER ID        IMAGE                              COMMAND             CREATED             STATUS              PORTS                    NAMES
f95b6ac4e565        tensorflow/tensorflow:latest-gpu   "/bin/bash"         4 seconds ago       Up 2 seconds        0.0.0.0:8888->8888/tcp   competent_robinson
  • Access to bash in the container
$ docker exec -it -u root CONTAINER_ID bash

________                               _______________                
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ / 
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

root@f95b6ac4e565:/#
  • Update packages in the container
root@f95b6ac4e565:/# apt update; apt -y upgrade
  • Create some directories for users
root@f95b6ac4e565:/# mkdir .local .jupyter .cache .cache/pip
root@f95b6ac4e565:/# chmod 777 .local .jupyter .cache .cache/pip
  • Install Jupyter Lab
root@f95b6ac4e565:/# pip install jupyterlab
root@f95b6ac4e565:/# exit
  • Login to container a user
$ docker exec -it -u $(id -u):$(id -g) CONTAINER_ID bash

________                               _______________                
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ / 
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


You are running this container as user with ID 1000 and group 1000,
which should map to the ID and group for your user on the Docker host. Great!

tf-docker / >
  • Add /.local/bin to PATH
tf-docker / > export PATH=/.local/bin:$PATH
  • Set a password for Jupyter Lab
tf-docker / > jupyter notebook password
Enter password:
Verify password:
[NotebookPasswordApp] Wrote hashed password to /.jupyter/jupyter_notebook_config.json
tf-docker / >
  • Start Jupyter Lab
tf-docker / > cd workspace
tf-docker /workspace > jupyter lab --no-browser --ip=*
/usr/local/lib/python3.6/dist-packages/IPython/paths.py:67: UserWarning: IPython parent '/' is not a writable location, using a temp directory.
  " using a temp directory.".format(parent))
[W 05:55:57.761 LabApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 05:55:57.767 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/dist-packages/jupyterlab
[I 05:55:57.767 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 05:55:57.768 LabApp] Serving notebooks from local directory: /workspace
[I 05:55:57.769 LabApp] The Jupyter Notebook is running at:
[I 05:55:57.769 LabApp] http://06982c99a734:8888/
[I 05:55:57.769 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
  • Visit YOUR_UBUNTU_SERVER_IP:8888 and type the password
  • When you manage to login to Jupyer Lab and see the current folder, start a new notebook with Python 3 and execute the following scripts
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
  • You should get an output like
Num GPUs Available:  1

How to stop the container and restart from where you left

  • Steps to stop

    • Save your notebooks
    • Shutdown Jupyer Lab
    • Exit from Container's bash
    • Stop the container with $ docker stop CONTAINER_ID
    • ⚠️ DO NOT delete the stopped container
      • If you did, don't worry. Start the step from the "Run a new container using the image" step
      • If you set up the workspace correctly, all your files should be available from your current folder to reproduce
  • Steps to resume

    • Start the container with $ docker start CONTAINER_ID
    • Login to bash
    • Start Jupyter Lab

When you need a pip package

  • Type the following commnad in Jupyter Notebook cell and run
import sys
!{sys.executable} -m pip install PACKAGE_NAME

URLs

How to set up PyTorch

Assumption

PyTorch Docker Image

  • Get Pytorch Docker Image
$ docker pull pytorch/pytorch
  • Run a new container using the image
    • -d: Start a container as a service
    • --gpus all: Use GPUs via NVIDIA Container Toolkit
    • -u root: Login as a root (default)
    • -p 8888:8888: Port forward to use Jupyer Lab from your host browser
    • -v $(pwd):/workspace: Map the host's current folder to the one /workspace inside the container to sync files
    • --shm-size: Increase shared memory size from 64MB
$ docker run -d --gpus all -it -u root -p 8888:8888 -v $(pwd):/workspace --shm-size=256m pytorch/pytorch
  • Find out the container ID (f58be4bf9639 is the ID here)
$ dockerp ps -a
CONTAINER ID        IMAGE                              COMMAND             CREATED             STATUS                      PORTS                    NAMES
f58be4bf9639        pytorch/pytorch                    "/bin/bash"         10 minutes ago      Up 10 minutes               0.0.0.0:8888->8888/tcp   beautiful_spence
  • Access to bash in the container
$ docker exec -it -u root CONTAINER_ID bash
root@f58be4bf9639:/workspace# 
  • Update packages in the container
root@f58be4bf9639:/workspace# apt update; apt -y upgrade
  • Create some directories for users
root@f58be4bf9639:/workspace# mkdir /.local /.jupyter /.conda /.cache /.cache/pip
root@f58be4bf9639:/workspace# chmod 777 /.local /.jupyter /.conda /.cache /.cache/pip
  • Install Jupyter Lab and other packages
root@f58be4bf9639:/workspace# conda install --yes jupyterlab matplotlib
root@f58be4bf9639:/workspace# exit
  • Login to container a user
$ docker exec -it -u $(id -u):$(id -g) CONTAINER_ID bash
groups: cannot find name for group ID 1000
I have no name!@f58be4bf9639:/workspace$
  • Add /.local/bin to PATH
I have no name!@f58be4bf9639:/workspace$ export PATH=/.local/bin:$PATH
  • Set a password for Jupyter Lab
I have no name!@f58be4bf9639:/workspace$ jupyter notebook password
Enter password:
Verify password:
[NotebookPasswordApp] Wrote hashed password to /.jupyter/jupyter_notebook_config.json
I have no name!@f58be4bf9639:/workspace$
  • Start Jupyter Lab
I have no name!@f58be4bf9639:/workspace$ jupyter lab --no-browser --ip=*
/opt/conda/lib/python3.7/site-packages/IPython/paths.py:67: UserWarning: IPython parent '/' is not a writable location, using a temp directory.
  " using a temp directory.".format(parent))
[W 05:51:59.787 LabApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 05:51:59.793 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
[I 05:51:59.793 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 05:51:59.795 LabApp] Serving notebooks from local directory: /workspace
[I 05:51:59.795 LabApp] The Jupyter Notebook is running at:
[I 05:51:59.795 LabApp] http://7686e1fe5a61:8888/
[I 05:51:59.795 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
  • Visit YOUR_UBUNTU_SERVER_IP:8888 and type the password
  • When you manage to login to Jupyer Lab and see the current folder, start a new notebook with Python 3 and execute the following scripts
import torch
print(torch.cuda.is_available())
  • You should get an output like
True

How to stop the container and restart from where you left

  • Steps to stop

    • Save your notebooks
    • Shutdown Jupyer Lab
    • Exit from Container's bash
    • Stop the container with $ docker stop CONTAINER_ID
    • ⚠️ DO NOT delete the stopped container
      • If you did, don't worry. Start the step from the "Run a new container using the image" step
      • If you set up the workspace correctly, all your files should be available from your current folder to reproduce
  • Steps to resume

    • Start the container with $ docker start CONTAINER_ID
    • Login to bash
    • Start Jupyter Lab

When you need a conda or pip package

  • Type the following command from Jupyter Notebook cell and run
import sys
!conda install --yes --prefix {sys.prefix} PACKAGE_NAME
  • Or if conda doesn't cut it, then try pip
import sys
!{sys.executable} -m pip install PACKAGE_NAME
  • If you have a permission error with the above method, then login to the container as a root and run the command as usual.

URLs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment