Skip to content

Instantly share code, notes, and snippets.

View penberg's full-sized avatar

Pekka Enberg penberg

View GitHub Profile

Executing code on a GPU vs CPU

I have had working knowledge of GPUs for a very long time now. I remember playing around with NVIDIA RIVA 128 or something similar with DirectX when they were still 3D graphics accelerators. I have also tried to keep up with the times and did some basic shader programming on a contemporary NVIDIA or AMD GPU.

However, today's GPUs are necessary for another reason – the explosion of AI workloads, including large language models (LLMs). From a GPU perspective, AI workloads are just massive applications of tensor operations such as matrix addition and multiplication. However, how does the modern GPU execute them, which is much more efficient than running the workloads on a CPU?

Consider CUDA, NVIDIA's programming language that extends C to exploit data parallelism on GPUs. In CUDA, you write code for CPU (host code) or GPU (device code). CPU code is just mostly plain C, but CUDA extends the language in two ways: it allows you to define functions for GPUs (kernels) and also prov

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
static uint64_t timespec_to_ns(struct timespec *ts)
{
return ts->tv_sec * 1e9 + ts->tv_nsec;
}