Skip to content

Instantly share code, notes, and snippets.

@SharanSMenon
Created April 19, 2022 00:47
Show Gist options
  • Save SharanSMenon/0ce3dfdef8543b6a3ad3692156af3d30 to your computer and use it in GitHub Desktop.
Save SharanSMenon/0ce3dfdef8543b6a3ad3692156af3d30 to your computer and use it in GitHub Desktop.
Matrix Multiplication with OpenCL. Performance comparison test.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "1a530103-f585-4306-8416-2eb423b09ec4",
"metadata": {},
"source": [
"# Matrix Multiplication with OpenCL\n",
"\n",
"The following notebook demonstrates matrix multiplication on the GPU by using OpenCL."
]
},
{
"cell_type": "markdown",
"id": "ac559786-3ec4-45f2-afb7-b3436a2ea112",
"metadata": {},
"source": [
"> `pip install numpy pyopencl`. Make sure you have OpenCL installed on your system\n",
"\n",
"This notebook was run on `macOS 12.0` on an `arm64` system (Apple M1)"
]
},
{
"cell_type": "markdown",
"id": "19184a0b-cc2f-41e6-be48-ae501867525b",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "323c5a1b-e33d-45b1-869d-7830a126e7c4",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pyopencl as cl\n",
"import pyopencl.array"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6ae5b58f-df47-4350-b724-8057d8887cba",
"metadata": {},
"outputs": [],
"source": [
"%load_ext pyopencl.ipython_ext"
]
},
{
"cell_type": "markdown",
"id": "728289ad-f99c-43c6-b270-9b653766a7a7",
"metadata": {},
"source": [
"Creating the OpenCL \"device\" and the command queue"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "94a25fd1-994b-4adb-a752-7e3305337e15",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Choose platform:\n",
"[0] <pyopencl.Platform 'Apple' at 0x7fff0000>\n"
]
},
{
"name": "stdin",
"output_type": "stream",
"text": [
"Choice [0]: 0\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Set the environment variable PYOPENCL_CTX='0' to avoid being asked again.\n"
]
}
],
"source": [
"ctx = cl.create_some_context(interactive=True)\n",
"queue = cl.CommandQueue(ctx)"
]
},
{
"cell_type": "markdown",
"id": "8915b360-3c95-46b5-a87c-0d7149fbcfd5",
"metadata": {},
"source": [
"## Matrix Multiplication kernel\n",
"\n",
"The following cell contains the matrix multiplication kernel. Note the kernel takes in 3 **one**-dimensional arrays"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "03cb2128-21a1-48a6-875c-1cdc468cac3c",
"metadata": {},
"outputs": [],
"source": [
"%%cl_kernel -o \"-cl-fast-relaxed-math\"\n",
"\n",
"__kernel void multiply(ushort n,\n",
"ushort m, ushort p, __global float *a,\n",
"__global float *b, __global float *c)\n",
"{\n",
" int gid = get_global_id(0);\n",
" c[gid] = 0.0f;\n",
" int rowC = gid/p;\n",
" int colC = gid%p;\n",
" __global float *pA = &a[rowC*m];\n",
" __global float *pB = &b[colC];\n",
" for(int k=0; k<m; k++)\n",
" {\n",
" pB = &b[colC+k*p];\n",
" c[gid] += (*(pA++))*(*pB);\n",
" }\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "624d5e22-41c0-490e-b6e8-bfe5a5aea796",
"metadata": {},
"source": [
"## Matrix creation.\n",
"\n",
"Create some large matrices filled with some large numbers, for the GPU/CPU to compute"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "8c538f11-4c1d-40b2-afb7-3aba2b03b788",
"metadata": {},
"outputs": [],
"source": [
"ha = np.int32(1892)\n",
"wa = np.int32(1742)\n",
"wb = np.int32(922)\n",
"a_host = np.random.randint(0, 950, (ha, wa)).astype(np.float32)\n",
"a = cl.array.to_device(queue, a_host.flatten()) # Flattens a_host and creates a GPU buffer\n",
"b_host = np.random.randint(0, 50, (wa, wb)).astype(np.float32)\n",
"b = cl.array.to_device(queue, b_host.flatten()) # Flattens a_host and creates a GPU buffer"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "02008b82-734a-4cbd-8c0a-f37f506d0d29",
"metadata": {},
"outputs": [],
"source": [
"c = cl.array.empty(queue, (ha*wb), dtype=np.float32)"
]
},
{
"cell_type": "markdown",
"id": "47b4f3d1-ac4a-40af-b576-049f714e1dec",
"metadata": {},
"source": [
"## CPU Matrix Multiplication\n",
"\n",
"The following cell measures the performance of matrix multiplication on the CPU. This can be really fast or take a long time depending on the computer."
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "1805589b-01e2-46d7-b5a1-e3e3644cd7b2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 301 ms, sys: 8.06 ms, total: 309 ms\n",
"Wall time: 54.3 ms\n"
]
}
],
"source": [
"%%time\n",
"np_out = np.matmul(a_host, b_host)"
]
},
{
"cell_type": "markdown",
"id": "4d12c809-c741-4085-8970-ead27f4e04b2",
"metadata": {},
"source": [
"## GPU Matrix Multiplication\n",
"\n",
"The following code executes the kernel on the flattened arrays. The first 3 arguments are default OpenCL arguments, the other 6 are kernel-specific arguments"
]
},
{
"cell_type": "code",
"execution_count": 80,
"id": "ae1bac2f-03bb-4b1d-a155-f1ebb1b4744a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.35 ms, sys: 17 µs, total: 1.36 ms\n",
"Wall time: 1.36 ms\n"
]
}
],
"source": [
"%%time\n",
"resg = multiply(queue, (ha*wb,), None, np.uint16(ha), np.uint16(wa), np.uint16(wb), a.data, b.data, c.data)"
]
},
{
"cell_type": "markdown",
"id": "16a2854c-c299-4d26-a0d1-61218fbc9326",
"metadata": {},
"source": [
"It takes 120 ms to pull the data from the GPU, but only 1.3 ms to actually perform the multiplication.\n",
"\n",
"The GPU is almost 40x faster than the CPU, but the CPU would still be better for multiplying these matrices due to the latency in transferring the data."
]
},
{
"cell_type": "code",
"execution_count": 75,
"id": "11849fcd-2658-4c0a-86c7-5a712296f0ad",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.15 ms, sys: 2.79 ms, total: 3.95 ms\n",
"Wall time: 120 ms\n"
]
}
],
"source": [
"%%time\n",
"c_ans = c.get()"
]
},
{
"cell_type": "markdown",
"id": "feaad757-14a7-49d0-9cb1-7db2279d187a",
"metadata": {},
"source": [
"## Checks\n",
"\n",
"The following code checks that the CPU and the GPU compute the same values. If the output is 0, it me"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "40a54ead-5b47-4323-825c-f146d7dbc5c4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.linalg.norm(np_out - c_ans)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "5ea1dd44-962d-4131-a107-b5fe52856856",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(892, 922)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c_ans.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5d5fc98-6c45-46a7-aa9a-1c800594d7f2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@SharanSMenon
Copy link
Author

The last markdown cell was cut-off when uploading this notebook. It said the following:

The following code checks that the CPU and the GPU compute the same values. If the output is 0, it means that the GPU has computed the correct value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment