In this Gist we compare the performance of an 8 core CPU and a RTX4000 GPU on vector addition.
We all know the CPU and GPU as basic building blocks of modern computing hardware. The CPU is the basic module for enabling the OS and the most common computation tasks. The Graphics Processing Unit (GPU) on the other hand is a newer, specialized hardware unit build with graphics in mind. But in recent years the demand for graphics computation has been replaced by the growing demand for AI related computation, in particular large scale linear algebra, i.e. matrix and vector multiplication. Why is the GPUs graphics focused design so uniquely capable for these highly parallel computation and how much faster is it than the CPU?
As the CPU originated as a single core processor it is fundamentally designed to run a single program as quickly and efficiently as possible. This is reflected in its hierachrical architecture where each core is equipped with multiple cac