Skip to content

Instantly share code, notes, and snippets.

@brandon-b-miller
Created January 19, 2021 16:09
Show Gist options
  • Save brandon-b-miller/b976111113bf1718f6600e049cd14f6e to your computer and use it in GitHub Desktop.
Save brandon-b-miller/b976111113bf1718f6600e049cd14f6e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# More performance for less with cuDF Scalars"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook should provide an example of a realistic scenario where a cuDF Scalar can vastly improve the walltime of an iterative process without having to change much code at all. The setup is a fairly simplistic optimizatiom problem minimizing a \"parabola\" in one million dimensions, using basic out of the box no frills gradient descent. \n",
"\n",
"We'll run the optimization algorithm twice, once using standard Python scalars, and once using cuDF scalars. Then, we'll compare the runtime and quantify the difference in performance."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import cudf\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the published version of this blog, a single Tesla V100 GPU is used - but similar gains should be achievable on other hardware as well."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tue Jan 19 08:05:48 2021 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |\n",
"| N/A 30C P0 44W / 300W | 0MiB / 32510MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
"| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |\n",
"| N/A 30C P0 42W / 300W | 0MiB / 32510MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
"| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |\n",
"| N/A 29C P0 41W / 300W | 0MiB / 32510MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
"| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |\n",
"| N/A 28C P0 43W / 300W | 0MiB / 32510MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
"| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |\n",
"| N/A 28C P0 41W / 300W | 0MiB / 32510MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
"| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |\n",
"| N/A 30C P0 41W / 300W | 0MiB / 32510MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
"| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |\n",
"| N/A 33C P0 40W / 300W | 0MiB / 32510MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
"| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |\n",
"| N/A 30C P0 45W / 300W | 12MiB / 32510MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: GPU Memory |\n",
"| GPU PID Type Process name Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
],
"source": [
"!nvidia-smi"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this example, we're using a very high dimensional parabola. Mathematically, this works the exact same way as a parabola in one dimension, instead of $f(x) = x^2$, we just have $f(\\vec{x}) = x_{0}^2 + x_{1}^2...x_{n}^2$ where $n$ is the number of dimensions (we'll set this to be 1 million for this example).\n",
"\n",
"The fundamental equation of gradient descent is: $\\hspace{5mm} {\\vec{\\theta_{i+1}} = \\vec{\\theta_i} - \\alpha \\nabla \\vec{\\theta_{i}}}$. This equation involves taking a derivative of the function at a point - and since $\\frac{d}{dx} x^2 = 2x$, and the derivative is a linear operator, the gradient of the whole function is just $\\frac{d}{dx} f(\\vec{x}) = 2\\vec{x}$, which is just a constant (2) times the original vector. Multiply that by an adjustable learning rate parameter ($\\alpha$) and that's basically the equation for how to update the parameters.\n",
" \n",
"In plain English this basically says “after we take our step, the place we end up is equal to the place we started plus a step in the direction of steepest descent. We'll keep iterating like this until some condition we choose is fulfilled, and then to take the final values $x_f$ as the global “minimum”. As an easy example let’s start with the vector $\\vec{x} = \\{1, 1, 1 .... 1\\}$. \n",
"\n",
"Below is some simple code to perform this process. It simply performs this update step a set number of times ($n_{iter}), starting from a point in space (input_params), accepts a learning rate (lr) and contains a switch to promote the constant to a cuDF scalar. Let's run this function twice and time it - once with the cuDF scalar off, and once with it on:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"DIM = int(1e6)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def do_descent(n_iter, lr, input_params, use_cudf_scalar=False):\n",
" output_params = input_params\n",
"\n",
" constant = lr * 2\n",
" if use_cudf_scalar:\n",
" constant = cudf.Scalar(constant)\n",
" \n",
" for i in range(n_iter):\n",
" output_params = output_params - (constant * output_params)\n",
"\n",
" return output_params"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# A long vector of ones\n",
"initial_params = cudf.Series(np.ones(DIM))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Run the function once to test it\n",
"params = do_descent(1000, 0.001, initial_params)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 0.135065\n",
"1 0.135065\n",
"2 0.135065\n",
"3 0.135065\n",
"4 0.135065\n",
" ... \n",
"999995 0.135065\n",
"999996 0.135065\n",
"999997 0.135065\n",
"999998 0.135065\n",
"999999 0.135065\n",
"Length: 1000000, dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"params"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.22 s ± 105 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"# Python Scalar\n",
"no_scalar = %timeit -o do_descent(1000, 0.001, initial_params)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.66 s ± 50.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"# cudf.Scalar\n",
"yes_scalar = %timeit -o do_descent(1000, 0.001, initial_params, use_cudf_scalar=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's plot the above results and inspect the performance improvement from using the cuDF scalar."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"2.218248599142368"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"no_scalar.average"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.6562571938016586"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"yes_scalar.average"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1440x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"runtime_diff = no_scalar.average - yes_scalar.average\n",
"\n",
"\n",
"fig, ax = plt.subplots(figsize=(20,10))\n",
"bars = ax.bar([1,2], height=[no_scalar.average, yes_scalar.average], edgecolor='black', lw=5)\n",
"diff_bar = ax.bar([2], \n",
" height=runtime_diff, \n",
" bottom=bars[1].get_height(), \n",
" color='black', \n",
" alpha=0.5, \n",
" hatch='\\\\', \n",
" edgecolor='black', \n",
" lw=5)\n",
" \n",
"\n",
"plt.xticks(fontsize=16)\n",
"plt.yticks(fontsize=16)\n",
"\n",
"\n",
"bars[0].set_color('yellow')\n",
"bars[1].set_color('green')\n",
"\n",
"bars[0].set_alpha(0.5)\n",
"bars[1].set_alpha(0.5)\n",
"\n",
"bars[0].set_edgecolor('black')\n",
"bars[1].set_edgecolor('black')\n",
"\n",
"ax.set_xticks([1,2])\n",
"ax.set_xticklabels(['Python Scalar', 'cudf.Scalar'], rotation=45)\n",
"\n",
"plt.ylabel('Average Runtime (seconds)', fontsize=16)\n",
"\n",
"ax.annotate(str(round(runtime_diff, 3)) + \" seconds\", \n",
" xy=(1.91, bars[1].get_height() + runtime_diff / 2),\n",
" xycoords='data',\n",
" color='white',\n",
" fontsize=16)\n",
"title = f\"Runtime Difference = {round(100 * runtime_diff / bars[0].get_height(), 3)}%\"\n",
"\n",
"plt.title(title, fontsize=24)\n",
"\n",
"plt.show()\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment