hideojoho/0-How-to-setup-ubuntu-server-2004-for-gpu.md

## 0-How-to-setup-ubuntu-server-2004-for-gpu.md

      
    Raw
  

              0-How-to-setup-ubuntu-server-2004-for-gpu.md
            
          
    How to set up Ubuntu Server 20.04 with GPU to use TensorFlow and Pytorch via JupyterLab on Docker

Environment


Ubuntu Server 20.04
CUDA Driver 10.2
Docker 19.03
NVIDIA Container Toolkit 2.3.0
NVIDIA RTX 2080ti (or other supported GPUs)

Steps


Update packages

$ sudo apt update; sudo apt -y upgrade


Install build-essential

$ sudo apt -y install build-essential


Install CUDA Driver

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
$ sudo apt update
$ sudo apt -y install cuda-drivers


Put the run level back to 3

$ sudo systemctl set-default  multi-user.target


Reboot

$ sudo shutdown -r now


Install Docker

$ sudo apt -y install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) \
    stable"
$ sudo apt update
$ sudo apt -y install docker-ce docker-ce-cli containerd.io


Update permission

$ sudo usermod -aG docker $USER


Reboot

$ sudo shutdown -r now


Install NVIDIA Container Toolkit

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt update && sudo apt -y install nvidia-container-toolkit
$ sudo systemctl restart docker


Check Docker

$ docker ps -a
CONTAINER ID        IMAGE                   COMMAND             CREATED             STATUS                  PORTS 


Run a docker with GPU

$ docker run --gpus all --pid host nvidia/cuda:10.2-runtime nvidia-smi
Sat Jun 20 10:26:13 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:07:00.0 Off |                  N/A |
| 35%   38C    P0    40W / 250W |      0MiB / 11018MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Next Steps


How to setup TensorFlow
How to setup PyTorch

URLs


https://qiita.com/ksasaki/items/b20a785e1a0f610efa08 (Japanese)
https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=deblocal
https://docs.docker.com/engine/install/ubuntu/
https://github.com/NVIDIA/nvidia-docker/tree/master#quickstart


## 1-How-to-setup-tensorflow.md

      
    Raw
  

              1-How-to-setup-tensorflow.md
            
          
    How to set up TensorFlow

Assumption


Complete the steps: How-to-setup-ubuntu-server-2004-for-gpu.md

TensorFlow Docker Image


Get TensorFlow Docker Image for GPU

$ docker pull tensorflow/tensorflow:latest-gpu


Run a new container using the image

-d: Start a container as a service
--gpus all: Use GPUs via NVIDIA Container Toolkit
-u root: Login as a root (default)
-p 8888:8888: Port forward to use Jupyer Lab from your host browser
-v $(pwd):/workspace: Map the host's current folder to the one /workspace inside the container to sync files
--shm-size: Increase shared memory size from 64MB


$ docker run -d --gpus all -it -u root -p 8888:8888 -v $(pwd):/workspace --shm-size=256m tensorflow/tensorflow:latest-gpu


Find out the container ID (49ad4f2852c7 is the ID here)

$ dockerp ps -a
CONTAINER ID        IMAGE                              COMMAND             CREATED             STATUS              PORTS                    NAMES
f95b6ac4e565        tensorflow/tensorflow:latest-gpu   "/bin/bash"         4 seconds ago       Up 2 seconds        0.0.0.0:8888->8888/tcp   competent_robinson


Access to bash in the container

$ docker exec -it -u root CONTAINER_ID bash

________                               _______________                
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ / 
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

root@f95b6ac4e565:/#


Update packages in the container

root@f95b6ac4e565:/# apt update; apt -y upgrade


Create some directories for users

root@f95b6ac4e565:/# mkdir .local .jupyter .cache .cache/pip
root@f95b6ac4e565:/# chmod 777 .local .jupyter .cache .cache/pip


Install Jupyter Lab

root@f95b6ac4e565:/# pip install jupyterlab
root@f95b6ac4e565:/# exit


Login to container a user

$ docker exec -it -u $(id -u):$(id -g) CONTAINER_ID bash

________                               _______________                
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ / 
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


You are running this container as user with ID 1000 and group 1000,
which should map to the ID and group for your user on the Docker host. Great!

tf-docker / >


Add /.local/bin to PATH

tf-docker / > export PATH=/.local/bin:$PATH


Set a password for Jupyter Lab

tf-docker / > jupyter notebook password
Enter password:
Verify password:
[NotebookPasswordApp] Wrote hashed password to /.jupyter/jupyter_notebook_config.json
tf-docker / >


Start Jupyter Lab

tf-docker / > cd workspace
tf-docker /workspace > jupyter lab --no-browser --ip=*
/usr/local/lib/python3.6/dist-packages/IPython/paths.py:67: UserWarning: IPython parent '/' is not a writable location, using a temp directory.
  " using a temp directory.".format(parent))
[W 05:55:57.761 LabApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 05:55:57.767 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/dist-packages/jupyterlab
[I 05:55:57.767 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 05:55:57.768 LabApp] Serving notebooks from local directory: /workspace
[I 05:55:57.769 LabApp] The Jupyter Notebook is running at:
[I 05:55:57.769 LabApp] http://06982c99a734:8888/
[I 05:55:57.769 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).


Visit YOUR_UBUNTU_SERVER_IP:8888 and type the password
When you manage to login to Jupyer Lab and see the current folder, start a new notebook with Python 3 and execute the following scripts

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))


You should get an output like

Num GPUs Available:  1

How to stop the container and restart from where you left


Steps to stop

Save your notebooks
Shutdown Jupyer Lab
Exit from Container's bash
Stop the container with $ docker stop CONTAINER_ID
⚠️ DO NOT delete the stopped container

If you did, don't worry. Start the step from the "Run a new container using the image" step
If you set up the workspace correctly, all your files should be available from your current folder to reproduce


Steps to resume

Start the container with $ docker start CONTAINER_ID
Login to bash
Start Jupyter Lab


When you need a pip package


Type the following commnad in Jupyter Notebook cell and run

import sys
!{sys.executable} -m pip install PACKAGE_NAME

URLs


https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/


## 2-How-to-setup-pytorch.md

      
    Raw
  

              2-How-to-setup-pytorch.md
            
          
    How to set up PyTorch

Assumption


Complete the steps: How-to-setup-ubuntu-server-2004-for-gpu.md

PyTorch Docker Image


Get Pytorch Docker Image

$ docker pull pytorch/pytorch


Run a new container using the image

-d: Start a container as a service
--gpus all: Use GPUs via NVIDIA Container Toolkit
-u root: Login as a root (default)
-p 8888:8888: Port forward to use Jupyer Lab from your host browser
-v $(pwd):/workspace: Map the host's current folder to the one /workspace inside the container to sync files
--shm-size: Increase shared memory size from 64MB


$ docker run -d --gpus all -it -u root -p 8888:8888 -v $(pwd):/workspace --shm-size=256m pytorch/pytorch


Find out the container ID (f58be4bf9639 is the ID here)

$ dockerp ps -a
CONTAINER ID        IMAGE                              COMMAND             CREATED             STATUS                      PORTS                    NAMES
f58be4bf9639        pytorch/pytorch                    "/bin/bash"         10 minutes ago      Up 10 minutes               0.0.0.0:8888->8888/tcp   beautiful_spence


Access to bash in the container

$ docker exec -it -u root CONTAINER_ID bash
root@f58be4bf9639:/workspace# 


Update packages in the container

root@f58be4bf9639:/workspace# apt update; apt -y upgrade


Create some directories for users

root@f58be4bf9639:/workspace# mkdir /.local /.jupyter /.conda /.cache /.cache/pip
root@f58be4bf9639:/workspace# chmod 777 /.local /.jupyter /.conda /.cache /.cache/pip


Install Jupyter Lab and other packages

root@f58be4bf9639:/workspace# conda install --yes jupyterlab matplotlib
root@f58be4bf9639:/workspace# exit


Login to container a user

$ docker exec -it -u $(id -u):$(id -g) CONTAINER_ID bash
groups: cannot find name for group ID 1000
I have no name!@f58be4bf9639:/workspace$


Add /.local/bin to PATH

I have no name!@f58be4bf9639:/workspace$ export PATH=/.local/bin:$PATH


Set a password for Jupyter Lab

I have no name!@f58be4bf9639:/workspace$ jupyter notebook password
Enter password:
Verify password:
[NotebookPasswordApp] Wrote hashed password to /.jupyter/jupyter_notebook_config.json
I have no name!@f58be4bf9639:/workspace$


Start Jupyter Lab

I have no name!@f58be4bf9639:/workspace$ jupyter lab --no-browser --ip=*
/opt/conda/lib/python3.7/site-packages/IPython/paths.py:67: UserWarning: IPython parent '/' is not a writable location, using a temp directory.
  " using a temp directory.".format(parent))
[W 05:51:59.787 LabApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 05:51:59.793 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
[I 05:51:59.793 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 05:51:59.795 LabApp] Serving notebooks from local directory: /workspace
[I 05:51:59.795 LabApp] The Jupyter Notebook is running at:
[I 05:51:59.795 LabApp] http://7686e1fe5a61:8888/
[I 05:51:59.795 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).


Visit YOUR_UBUNTU_SERVER_IP:8888 and type the password
When you manage to login to Jupyer Lab and see the current folder, start a new notebook with Python 3 and execute the following scripts

import torch
print(torch.cuda.is_available())


You should get an output like

True

How to stop the container and restart from where you left


Steps to stop

Save your notebooks
Shutdown Jupyer Lab
Exit from Container's bash
Stop the container with $ docker stop CONTAINER_ID
⚠️ DO NOT delete the stopped container

If you did, don't worry. Start the step from the "Run a new container using the image" step
If you set up the workspace correctly, all your files should be available from your current folder to reproduce


Steps to resume

Start the container with $ docker start CONTAINER_ID
Login to bash
Start Jupyter Lab


When you need a conda or pip package


Type the following command from Jupyter Notebook cell and run

import sys
!conda install --yes --prefix {sys.prefix} PACKAGE_NAME


Or if conda doesn't cut it, then try pip

import sys
!{sys.executable} -m pip install PACKAGE_NAME


If you have a permission error with the above method, then login to the container as a root and run the command as usual.

URLs


https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/