Skip to content

Instantly share code, notes, and snippets.

@haipnh
Last active June 1, 2022 10:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save haipnh/e865f38b17bd21f212777212066c5a15 to your computer and use it in GitHub Desktop.
Save haipnh/e865f38b17bd21f212777212066c5a15 to your computer and use it in GitHub Desktop.
Verify CUDA-enabled system for ML/DL framework
#!/bin/bash
# This gist provides several test cases to verify a host machine using NVIDIA graphics card if it's ready for tensorflow and pytorch.
### TC#1: Verify hardware information
# Expectation: The OS shows hardware bare information
lspci | grep -i nvidia
sudo lshw -C display
### TC#2: Verify nvidia-driver-X
# Prerequisite: Install the driver:
# sudo apt-get update & sudo apt-get install -y nvidia-driver-470/510
# sudo reboot now
# Expectation: nvidia-smi shows detailed information: GPU model, GPU loads, VRAM usage,...
nvidia-smi
### TC#3: Verify nvidia-docker-toolkit installation
# Prerequisite: Installed Docker - https://docs.docker.com/engine/install/ubuntu/
# Expectation: nvidia-smi shows detailed information: GPU model, GPU loads, VRAM usage,...
docker run --gpus all --rm nvidia/cuda:11.0-cudnn8-runtime-ubuntu18.04 nvidia-smi
### TC#4: Verify tensorflow-gpu
# Expectation: A random matrix will be generated by tensorflow using GPU
docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
### TC#5: Verify pytorch-gpu
# Expectation: Pytorch can detect GPU's info
docker run --gpus all -it --rm pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel \
python -c "import torch; print(torch.version.cuda); print(torch.cuda.is_available()); print(torch.cuda.device_count()); print(torch.cuda.get_device_name(0))"
docker run --gpus all -it --rm pytorch/pytorch:1.11.0-cuda11.3-cudnn8-devel \
python -c "import torch; print(torch.version.cuda); print(torch.cuda.is_available()); print(torch.cuda.device_count()); print(torch.cuda.get_device_name(0))"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment