Skip to content

Instantly share code, notes, and snippets.

@Marinell0
Last active June 4, 2024 14:02
Show Gist options
  • Save Marinell0/342e57c20fa7adbcfc7f5a5a0ee5ffae to your computer and use it in GitHub Desktop.
Save Marinell0/342e57c20fa7adbcfc7f5a5a0ee5ffae to your computer and use it in GitHub Desktop.
How to load NVidia CUDA docker with Tensorflow or Pytorch

Check if Nvidia driver is working:

nvidia-smi -a

Output should display GPU information.

Docker NVidia

How to install Nvidia-ctk

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base

nvidia-ctk --version

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

grep "  name:" /etc/cdi/nvidia.yaml

Install nvidia-container-toolkit

curl https://get.docker.com | sh && sudo systemctl --now enable docker

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker

Run TensorFlow docker

Docker >= 19.03

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 -it --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3

Docker < 19.03

nvidia-docker run -it --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 --rm nvcr.io/nvidia/tensorflow:xx.xx-tfx-py3

Where:

  • xx.xx is the container version. For example, 22.01.
  • tfx is the version of TensorFlow. For example, tf1 or tf2.

Using version 22.04 and tensorflow 2:

nvidia-docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 -it --rm nvcr.io/nvidia/tensorflow:22.04-tf2-py3

When you run the docker command without the variables --ipc=host --ulimit memlock=-1 --ulimit stack=87108864, you are asked for them, so we pass them by default.

Check available versions here: https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/

Run Pytorch

Follow the steps above, but run the command:

Docker >= 19.03

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 -it --rm nvcr.io/nvidia/pytorch:xx.xx-py3

Docker < 19.03

nvidia-docker run -it --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 --rm nvcr.io/nvidia/pytorch:xx.xx-py3

Where:

xx.xx is the container version. For example, 22.01.

Test the container:

python
>>> import torch
>>> print(torch.cuda.is_available())
True

Use the link https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch for newer versions

Run docker

For each volume that you want to mount on the docker image, we use the `-v "":""

Exemple:

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=87108864 -it --rm -v .:/Mario/Scripts/ -v "/media/marinello/HD 20002/Data/Mario":/Mario/Data nvcr.io/nvidia/tensorflow:23.04-tf2-py3

For each -v, a volume is created in the container.

How to use opencv inside the docker image

Install these dependencies

apt-get update && apt-get install ffmpeg libsm6 libxext6 -y

Remove downloaded image

Each image is gigantic, so when we install a newer version and stop using the docker container, you can remove it using this command:

docker rmi nvcr.io/nvidia/tensorflow:22.04-tf2-py3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment