nvidia-smi -a
Output should display GPU information.
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base
nvidia-ctk --version
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
grep " name:" /etc/cdi/nvidia.yaml
curl https://get.docker.com | sh && sudo systemctl --now enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 -it --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3
nvidia-docker run -it --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3
Where:
- xx.xx is the container version. For example, 22.01.
- tfx is the version of TensorFlow. For example, tf1 or tf2.
Using version 22.04 and tensorflow 2:
nvidia-docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 -it --rm nvcr.io/nvidia/tensorflow:22.04-tf2-py3
When you run the docker command without the variables --ipc=host --ulimit memlock=-1 --ulimit stack=87108864
, you are asked for them, so we pass them by default.
Check available versions here: https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/
Follow the steps above, but run the command:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 -it --rm nvcr.io/nvidia/pytorch:xx.xx-py3
nvidia-docker run -it --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 --rm nvcr.io/nvidia/pytorch:xx.xx-py3
Where:
xx.xx is the container version. For example, 22.01.
Test the container:
python
>>> import torch
>>> print(torch.cuda.is_available())
True
Use the link https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch for newer versions
For each volume that you want to mount on the docker image, we use the `-v "":""
Exemple:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 -it --rm -v .:/Mario/Scripts/ -v "/media/marinello/HD 20002/Data/Mario":/Mario/Data nvcr.io/nvidia/tensorflow:23.04-tf2-py3
For each -v
, a volume is created in the container.
Install these dependencies
apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
Each image is gigantic, so when we install a newer version and stop using the docker container, you can remove it using this command:
docker rmi nvcr.io/nvidia/tensorflow:22.04-tf2-py3