Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save yspkm/ca527afada8c38d4410dc9f75cfa5399 to your computer and use it in GitHub Desktop.
Save yspkm/ca527afada8c38d4410dc9f75cfa5399 to your computer and use it in GitHub Desktop.
How to use CUDA in Ubuntu (with torch, cupy, cudnn.h + )

NVIDIA Setup Guide: Driver, CUDA, cuDNN & Data Libraries on Ubuntu 22.04

This guide will walk you through the process of setting up NVIDIA drivers, CUDA, cuDNN, and some popular data science libraries on Ubuntu 22.04.

Before You Begin

  • Ensure you have administrative privileges on your computer.
  • Make sure you are connected to the internet.

Installation Steps

1. Clear Out Old NVIDIA Drivers

Execute these commands to remove any existing NVIDIA drivers.

sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean

2. Set Up NVIDIA Driver

To install the NVIDIA driver, run:

sudo apt install nvidia-driver-535

3. Configure CUDA Repository

Fetch and set up the CUDA repository with these commands.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.2-535.104.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.2-535.104.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

4. Modify Environment Paths

Adjust your system's PATH and library directories.

echo "export PATH=/usr/local/cuda-12.2/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc

5. Install cuDNN

First, download the cuDNN package for CUDA 12. Look for a file named cudnn-linux-x86_64-8.9.4.25_cuda12-archive.tar.xz. Then, run:

tar -xvf cudnn-linux-x86_64-8.9.4.25_cuda12-archive.tar.xz
cp -P cudnn-linux-x86_64-8.9.4.25_cuda12-archive/include/cudnn* /usr/local/cuda-12.2/include/
cp -P cudnn-linux-x86_64-8.9.4.25_cuda12-archive/lib/libcudnn* /usr/local/cuda-12.2/lib64/
chmod a+r /usr/local/cuda-12.2/lib64/libcudnn*

6. Set Up Libraries

  • For CuPy:
pip3 install cupy-cuda12x
  • For PyTorch:
pip3 install torch torchvision torchaudio
  • For General Data Science:
pip3 install numpy pandas matplotlib seaborn

Validation

After setup, ensure everything works. Check the NVIDIA driver version, test CUDA samples, and try importing your new Python libraries.

Need Help?

If you run into problems, consult NVIDIA's official documentation or tap into the vast knowledge of online tech communities.

Friendly Reminder: Double-check the compatibility of your system with the software versions outlined in this guide. The NVIDIA site offers comprehensive compatibility data and system requirements.

Installing the NVIDIA Container Toolkit

The NVIDIA Container Toolkit allows containers to leverage the full capabilities of NVIDIA GPUs. Follow these steps to install it:

1. Configure the NVIDIA repository:

First, add the NVIDIA GPG key and repository to your system:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

2. (Optional) Enable experimental packages:

If you want to use experimental packages, enable them by modifying the sources list:

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

3. Update and install the toolkit:

Update your package list and install the NVIDIA Container Toolkit:

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

Configuring Docker

To use NVIDIA GPUs with Docker, you must configure the Docker daemon to use the NVIDIA runtime.

1. Configure the Docker daemon:

Use the nvidia-ctk command to configure the Docker runtime:

sudo nvidia-ctk runtime configure --runtime=docker

This command modifies /etc/docker/daemon.json to set the NVIDIA runtime as the default for Docker.

2. Restart Docker:

Apply the configuration changes by restarting the Docker daemon:

sudo systemctl restart docker

Running Docker in Rootless Mode

For users running Docker in rootless mode, follow these additional steps:

1. Configure the Docker daemon for rootless mode:

nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json

2. Restart the Rootless Docker daemon:

systemctl --user restart docker

3. Configure NVIDIA runtime settings:

sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place

Verifying the Installation

After setting up Docker and the NVIDIA Container Toolkit, verify the installation by running a sample CUDA container.

Running a Sample CUDA Container:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

You should see output similar to this, indicating that the NVIDIA GPU is accessible from within the container:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10    Driver Version: 535.86.10    CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This guide covers the installation and configuration of Docker with NVIDIA CUDA support. Make sure your system has an NVIDIA GPU and that the appropriate NVIDIA drivers are installed.

@yspkm
Copy link
Author

yspkm commented Oct 23, 2023

For Test

run.sh

#!/bin/bash
nvcc matrix_mul.cu -o matrix_mul -lcublas
./matrix_mul

matrix_mul.cu

#include <cublas_v2.h>
#include <stdio.h>
#include <cuda_runtime.h>
#include <stdlib.h> // For rand() and srand()
#include <time.h>   // For time() 

#define N 10000

void printBlock(float *matrix, int width, int blockWidth) {
    for (int i = 0; i < blockWidth; i++) {
        for (int j = 0; j < blockWidth; j++) {
            printf("%4.2f ", matrix[i * width + j]);
        }
        printf("\n");
    }
}

// Initialize matrix with random float numbers between 0 and 1
void randomInit(float *data, int size) {
    for (int i = 0; i < size; i++)
        data[i] = rand() / (float)RAND_MAX;
}

int main() {
    float *A, *B, *C;
    float *d_A, *d_B, *d_C;
    size_t size = N * N * sizeof(float);

    // Seed the random number generator
    srand(time(NULL));

    // Allocate host memory
    A = (float *)malloc(size);
    B = (float *)malloc(size);
    C = (float *)malloc(size);

    // Initialize the matrices with random values
    randomInit(A, N*N);
    randomInit(B, N*N);

    // Allocate device memory
    cudaMalloc((void **)&d_A, size);
    cudaMalloc((void **)&d_B, size);
    cudaMalloc((void **)&d_C, size);

    // Transfer matrix A and B to device
    cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice);
    cudaMemcpy(d_B, B, size, cudaMemcpyHostToDevice);

    // Create a handle for CUBLAS
    cublasHandle_t handle;
    cublasCreate(&handle);

    float alpha = 1.0;
    float beta = 0.0;

    // Perform matrix multiplication: C = alpha*A*B + beta*C
    cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, N, N, N, &alpha, d_A, N, d_B, N, &beta, d_C, N);

    // Transfer the result matrix C back to host
    cudaMemcpy(C, d_C, size, cudaMemcpyDeviceToHost);

    // Print a block from the result matrix C
    printf("10x10 block from top-left corner of matrix C:\n");
    printBlock(C, N, 10);
    // Clean up
    free(A);
    free(B);
    free(C);

    cudaFree(d_A);
    cudaFree(d_B);
    cudaFree(d_C);

    cublasDestroy(handle);

    return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment