Skip to content

Instantly share code, notes, and snippets.

@zhongwen
Created June 4, 2014 19:45
Show Gist options
  • Save zhongwen/8b78c7a1eef6456795ce to your computer and use it in GitHub Desktop.
Save zhongwen/8b78c7a1eef6456795ce to your computer and use it in GitHub Desktop.
cuBLAS speed test
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "cuBLAS speed test"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pycuda.gpuarray as gpuarray\n",
"import pycuda.autoinit\n",
"import numpy as np\n",
"from scikits.cuda import linalg, misc\n",
"linalg.init()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a = np.asarray(np.random.rand(10000, 10000), np.float32)\n",
"b = np.asarray(np.random.rand(20000, 10000), np.float32)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, test the time to transfer data from main memory into GPU memory"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit -n 10 a_gpu = gpuarray.to_gpu(a)\n",
"%timeit -n 10 b_gpu = gpuarray.to_gpu(b.T)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 62.5 ms per loop\n",
"10 loops, best of 3: 124 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a_gpu = gpuarray.to_gpu(a)\n",
"b_gpu = gpuarray.to_gpu(b.T)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, I test the time to compute dot product between 10k points and 20k points with 10k dims"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit -n 10 c_gpu = linalg.dot(a_gpu, b_gpu)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 1.33 s per loop\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, I test the time to get the distance back into main memory"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"c_gpu = linalg.dot(a_gpu, b_gpu)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit -n 10 c = c_gpu.get()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 388 ms per loop\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print c_gpu.nbytes"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"800000000\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment