Skip to content

Instantly share code, notes, and snippets.

@jcrist
Last active April 22, 2016 18:41
Show Gist options
  • Save jcrist/3dfea5a77cbffbc5f6a381f991dd439e to your computer and use it in GitHub Desktop.
Save jcrist/3dfea5a77cbffbc5f6a381f991dd439e to your computer and use it in GitHub Desktop.
Binning Implementations. Best viewed at nbviewer: http://nbviewer.jupyter.org/gist/jcrist/3dfea5a77cbffbc5f6a381f991dd439e
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Problem"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Given a number and a set of constant bins, what is the fastest way to do the binnning? Values outside of the bins should return `nan`, otherwise they should be inclusive on the left bound."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def to_bins(x):\n",
" if x < 0.0:\n",
" return np.nan\n",
" if x < 1.3:\n",
" return 0\n",
" elif x < 1.35:\n",
" return 1\n",
" elif x < 2.4:\n",
" return 2\n",
" # And so on ...\n",
" elif x < 84.3:\n",
" return 100\n",
" else:\n",
" return np.nan"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic test case"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def make_bins(n):\n",
" return np.cumsum(np.random.random(n))\n",
"\n",
"def make_data(n, bins):\n",
" return np.random.uniform(bins.min() - 0.5, bins.max() + 0.5, n)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bins = make_bins(10)\n",
"data = make_data(10000, bins)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.1829594 , 0.53392777, 0.87212596, 0.87952459, 1.18258028,\n",
" 1.84474015, 2.60266074, 2.84080941, 3.7679846 , 4.05534851])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bins"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.72821738, 2.38961742, 3.10126157, ..., 4.17100479,\n",
" 0.71565192, 2.08038063])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Numpy Solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Implement a binary search using numpy."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def make_numpy(bins, otherwise):\n",
" def numpy(data):\n",
" return np.where((data < bins[0]) | (data >= bins[-1]), otherwise,\n",
" np.digitize(data, bins[1:]))\n",
" return numpy"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"numpy = make_numpy(bins, np.nan)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1., 5., 7., ..., nan, 1., 5.])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom linear search"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Implement a linear search using numba."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numba as nb\n",
"\n",
"def make_linear(bins, otherwise):\n",
" @nb.vectorize('f8(f8)', nopython=True)\n",
" def linear_search(x):\n",
" if x < bins[0]:\n",
" return otherwise\n",
" for i in range(1, bins.shape[0]):\n",
" if x < bins[i]:\n",
" return i - 1\n",
" return otherwise\n",
" return linear_search"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"linear = make_linear(bins, np.nan)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1., 5., 7., ..., nan, 1., 5.])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linear(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom Binary Search"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Implement a binary search using numba."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def make_binary(bins, otherwise):\n",
" @nb.vectorize('f8(f8)', nopython=True)\n",
" def binary_search(x):\n",
" low = 0\n",
" high = len(bins)\n",
" mid = (low + high) // 2\n",
" while low < high:\n",
" if x < bins[mid]:\n",
" high = mid\n",
" else:\n",
" low = mid + 1\n",
" mid = (low + high) // 2\n",
" if mid == 0 or mid == len(bins):\n",
" return otherwise\n",
" return mid - 1\n",
" return binary_search"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"binary = make_binary(bins, np.nan)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1., 5., 7., ..., nan, 1., 5.])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"binary(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Metaprogrammed binary search"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unroll a binary search into cascading if statements using string metaprogramming. Probably a bad idea."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import sys\n",
"from itertools import count\n",
"\n",
"# In python 2, exec is a statement.\n",
"# This is needed for compatibility\n",
"\n",
"if sys.version_info[0] == 3:\n",
" def _exec(codestr, glbls):\n",
" exec(codestr, glbls)\n",
"else:\n",
" eval(compile(\"\"\"\n",
"def _exec(codestr, glbls):\n",
" exec codestr in glbls\n",
"\"\"\", \"<_exec>\", \"exec\"))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def make_meta(bins, otherwise, debug=False):\n",
" namespace = {}\n",
" _names = count()\n",
"\n",
" # A function for creating symbols and adding them to the namespace.\n",
" def gsym(x):\n",
" s = \"_%d\" % next(_names)\n",
" namespace[s] = x\n",
" return s\n",
"\n",
" body = _make_meta(bins, gsym(otherwise), 0, len(bins), gsym)\n",
" code = \"def meta(x):\\n{body}\".format(body=indent(body))\n",
" if debug:\n",
" print(code)\n",
" _exec(code, namespace)\n",
" return nb.vectorize('f8(f8)', nopython=True)(namespace['meta'])\n",
"\n",
"\n",
"indent = lambda s: '\\n'.join(' ' + l for l in s.splitlines())\n",
"\n",
"\n",
"def _make_meta(bins, otherwise, low, high, gsym):\n",
" mid = (high + low) // 2\n",
" if high > low:\n",
" return (\n",
" \"if x < {bound}:\\n\"\n",
" \"{left}\\n\"\n",
" \"else:\\n\"\n",
" \"{right}\").format(bound=gsym(bins[mid]),\n",
" left=indent(_make_meta(bins, otherwise, low, mid, gsym)),\n",
" right=indent(_make_meta(bins, otherwise, mid + 1, high, gsym)))\n",
" elif mid == 0 or mid == len(bins):\n",
" return 'return {0}'.format(otherwise)\n",
" return 'return {0}'.format(gsym(mid - 1))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"def meta(x):\n",
" if x < _1:\n",
" if x < _2:\n",
" if x < _3:\n",
" if x < _4:\n",
" return _0\n",
" else:\n",
" return _5\n",
" else:\n",
" return _6\n",
" else:\n",
" if x < _7:\n",
" if x < _8:\n",
" return _9\n",
" else:\n",
" return _10\n",
" else:\n",
" return _11\n",
" else:\n",
" if x < _12:\n",
" if x < _13:\n",
" if x < _14:\n",
" return _15\n",
" else:\n",
" return _16\n",
" else:\n",
" return _17\n",
" else:\n",
" if x < _18:\n",
" return _19\n",
" else:\n",
" return _0\n"
]
}
],
"source": [
"# Setting debug to True to print the generated code\n",
"meta = make_meta(bins, np.nan, debug=True)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1., 5., 7., ..., nan, 1., 5.])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"meta(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ensure Correctness"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n",
"True\n",
"True\n"
]
}
],
"source": [
"for f in [linear, binary, meta]:\n",
" print(np.allclose(numpy(data), f(data), equal_nan=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Benchmark"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Benchmark the various implementations using increasing number of bins."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from timeit import default_timer\n",
"\n",
"def time(f, *args, **kwargs):\n",
" n = kwargs.get('n', 10)\n",
" start = default_timer()\n",
" for i in range(n):\n",
" f(*args)\n",
" end = default_timer()\n",
" return (end - start) / n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"nbins = [5, 10, 50, 100, 250, 500, 1000]\n",
"\n",
"numpy_times = []\n",
"linear_times = []\n",
"binary_times = []\n",
"meta_times = []\n",
"\n",
"for b in nbins:\n",
" bins = make_bins(b)\n",
" data = make_data(1000000, bins)\n",
" # Build functions\n",
" numpy = make_numpy(bins, np.nan)\n",
" linear = make_linear(bins, np.nan)\n",
" binary = make_binary(bins, np.nan)\n",
" meta = make_meta(bins, np.nan)\n",
" # Precompile\n",
" linear(data)\n",
" binary(data)\n",
" meta(data)\n",
" # Timings\n",
" numpy_times.append(time(numpy, data))\n",
" linear_times.append(time(linear, data))\n",
" binary_times.append(time(binary, data))\n",
" meta_times.append(time(meta, data))"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/javascript": [
"\n",
"(function(global) {\n",
" function now() {\n",
" return new Date();\n",
" }\n",
"\n",
" if (typeof (window._bokeh_onload_callbacks) === \"undefined\") {\n",
" window._bokeh_onload_callbacks = [];\n",
" }\n",
"\n",
" function run_callbacks() {\n",
" window._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n",
" delete window._bokeh_onload_callbacks\n",
" console.info(\"Bokeh: all callbacks have finished\");\n",
" }\n",
"\n",
" function load_libs(js_urls, callback) {\n",
" window._bokeh_onload_callbacks.push(callback);\n",
" if (window._bokeh_is_loading > 0) {\n",
" console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n",
" return null;\n",
" }\n",
" if (js_urls == null || js_urls.length === 0) {\n",
" run_callbacks();\n",
" return null;\n",
" }\n",
" console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n",
" window._bokeh_is_loading = js_urls.length;\n",
" for (var i = 0; i < js_urls.length; i++) {\n",
" var url = js_urls[i];\n",
" var s = document.createElement('script');\n",
" s.src = url;\n",
" s.async = false;\n",
" s.onreadystatechange = s.onload = function() {\n",
" window._bokeh_is_loading--;\n",
" if (window._bokeh_is_loading === 0) {\n",
" console.log(\"Bokeh: all BokehJS libraries loaded\");\n",
" run_callbacks()\n",
" }\n",
" };\n",
" s.onerror = function() {\n",
" console.warn(\"failed to load library \" + url);\n",
" };\n",
" console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" }\n",
" };\n",
"\n",
" var js_urls = ['https://cdn.pydata.org/bokeh/release/bokeh-0.11.1.min.js', 'https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.11.1.min.js', 'https://cdn.pydata.org/bokeh/release/bokeh-compiler-0.11.1.min.js'];\n",
"\n",
" var inline_js = [\n",
" function(Bokeh) {\n",
" Bokeh.set_log_level(\"info\");\n",
" },\n",
" \n",
" function(Bokeh) {\n",
" Bokeh.$(\"#2726e06f-9c66-4bd6-b50b-3cd7a2e0f908\").text(\"BokehJS successfully loaded\");\n",
" },\n",
" function(Bokeh) {\n",
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.11.1.min.css\");\n",
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.11.1.min.css\");\n",
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.11.1.min.css\");\n",
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.11.1.min.css\");\n",
" }\n",
" ];\n",
"\n",
" function run_inline_js() {\n",
" for (var i = 0; i < inline_js.length; i++) {\n",
" inline_js[i](window.Bokeh);\n",
" }\n",
" }\n",
"\n",
" if (window._bokeh_is_loading === 0) {\n",
" console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n",
" run_inline_js();\n",
" } else {\n",
" load_libs(js_urls, function() {\n",
" console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n",
" run_inline_js();\n",
" });\n",
" }\n",
"}(this));"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
" <div class=\"plotdiv\" id=\"9d5ef7ee-233c-4088-a7ff-ce38cd235086\"></div>\n",
"<script type=\"text/javascript\">\n",
" \n",
" (function(global) {\n",
" function now() {\n",
" return new Date();\n",
" }\n",
" \n",
" if (typeof (window._bokeh_onload_callbacks) === \"undefined\") {\n",
" window._bokeh_onload_callbacks = [];\n",
" }\n",
" \n",
" function run_callbacks() {\n",
" window._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n",
" delete window._bokeh_onload_callbacks\n",
" console.info(\"Bokeh: all callbacks have finished\");\n",
" }\n",
" \n",
" function load_libs(js_urls, callback) {\n",
" window._bokeh_onload_callbacks.push(callback);\n",
" if (window._bokeh_is_loading > 0) {\n",
" console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n",
" return null;\n",
" }\n",
" if (js_urls == null || js_urls.length === 0) {\n",
" run_callbacks();\n",
" return null;\n",
" }\n",
" console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n",
" window._bokeh_is_loading = js_urls.length;\n",
" for (var i = 0; i < js_urls.length; i++) {\n",
" var url = js_urls[i];\n",
" var s = document.createElement('script');\n",
" s.src = url;\n",
" s.async = false;\n",
" s.onreadystatechange = s.onload = function() {\n",
" window._bokeh_is_loading--;\n",
" if (window._bokeh_is_loading === 0) {\n",
" console.log(\"Bokeh: all BokehJS libraries loaded\");\n",
" run_callbacks()\n",
" }\n",
" };\n",
" s.onerror = function() {\n",
" console.warn(\"failed to load library \" + url);\n",
" };\n",
" console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" }\n",
" };var element = document.getElementById(\"9d5ef7ee-233c-4088-a7ff-ce38cd235086\");\n",
" if (element == null) {\n",
" console.log(\"Bokeh: ERROR: autoload.js configured with elementid '9d5ef7ee-233c-4088-a7ff-ce38cd235086' but no matching script tag was found. \")\n",
" return false;\n",
" }\n",
" \n",
" var js_urls = [];\n",
" \n",
" var inline_js = [\n",
" function(Bokeh) {\n",
" Bokeh.$(function() {\n",
" var docs_json = {\"318f296a-25aa-439f-bd90-17180adc7875\":{\"roots\":{\"references\":[{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"58761fd7-eaca-4130-9aa4-0fd215fbd5f2\",\"type\":\"PanTool\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"17f4478c-92e3-4192-9f2a-bfcebae12af1\",\"type\":\"BasicTicker\"}},\"id\":\"39fc0867-922e-4f3c-9c10-f0208cb68bed\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x_values\",\"y_values\"],\"data\":{\"chart_index\":[{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"}],\"series\":[\"numpy\",\"numpy\",\"numpy\",\"numpy\",\"numpy\",\"numpy\",\"numpy\"],\"x_values\":[5,10,50,100,250,500,1000],\"y_values\":[0.02296450138092041,0.02864091396331787,0.04282679557800293,0.05167970657348633,0.06255319118499755,0.06536858081817627,0.07222120761871338]}},\"id\":\"cd0de9ba-6dda-4576-9a60-4bee09cbc51d\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x_values\",\"y_values\"],\"data\":{\"chart_index\":[{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"}],\"series\":[\"linear\",\"linear\",\"linear\",\"linear\",\"linear\",\"linear\",\"linear\"],\"x_values\":[5,10,50,100,250,500,1000],\"y_values\":[0.009573698043823242,0.012730002403259277,0.024196910858154296,0.04238870143890381,0.08969380855560302,0.1499985933303833,0.2988013982772827]}},\"id\":\"eb61ce55-2f23-4bee-871c-f0f84685657b\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"line_color\":{\"value\":\"#abdda4\"},\"line_width\":{\"value\":2},\"x\":{\"field\":\"x_values\"},\"y\":{\"field\":\"y_values\"}},\"id\":\"eaa16005-d19e-4adc-a0a3-7d129ac0550c\",\"type\":\"Line\"},{\"attributes\":{\"below\":[{\"id\":\"f79a8bd8-ebbf-49de-91bf-cf25a0a26b70\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"7202738a-0600-4522-93b6-c6d9208a953f\",\"type\":\"LinearAxis\"}],\"legend\":\"top_left\",\"renderers\":[{\"id\":\"2d06fdaf-abd4-4184-94f9-f1a9da93e3e2\",\"type\":\"BoxAnnotation\"},{\"id\":\"46a30a0c-fa7e-45ee-933b-7ea7ba48a0b9\",\"type\":\"GlyphRenderer\"},{\"id\":\"a915c778-8ae8-4120-adae-0470f9d847f5\",\"type\":\"GlyphRenderer\"},{\"id\":\"2a622bdb-95a7-44ff-ac00-4999132348e2\",\"type\":\"GlyphRenderer\"},{\"id\":\"21ad974d-c9cc-4af9-9616-619d62a26dce\",\"type\":\"GlyphRenderer\"},{\"id\":\"fb8f4e71-a2d0-422d-a0a0-71cef4b119f8\",\"type\":\"Legend\"},{\"id\":\"f79a8bd8-ebbf-49de-91bf-cf25a0a26b70\",\"type\":\"LinearAxis\"},{\"id\":\"7202738a-0600-4522-93b6-c6d9208a953f\",\"type\":\"LinearAxis\"},{\"id\":\"39fc0867-922e-4f3c-9c10-f0208cb68bed\",\"type\":\"Grid\"},{\"id\":\"f40f1a39-2214-4310-974e-361db53ef0e7\",\"type\":\"Grid\"}],\"title\":\"Binning Times\",\"title_text_font_size\":{\"value\":\"14pt\"},\"tool_events\":{\"id\":\"95e4e73c-a8e4-4e18-b8c6-bf17ac871bcb\",\"type\":\"ToolEvents\"},\"tools\":[{\"id\":\"58761fd7-eaca-4130-9aa4-0fd215fbd5f2\",\"type\":\"PanTool\"},{\"id\":\"1b9ec1e7-d2ce-448a-a9e5-4f5f89553947\",\"type\":\"WheelZoomTool\"},{\"id\":\"74008344-7e98-498e-9ad3-15336391f431\",\"type\":\"BoxZoomTool\"},{\"id\":\"48cc67d3-114b-4d99-ade9-c4c8596609ab\",\"type\":\"PreviewSaveTool\"},{\"id\":\"d0266c9e-1c0c-44ad-a083-86f6db35c576\",\"type\":\"ResizeTool\"},{\"id\":\"d006743b-1a49-40a3-a3a9-bb5474fb5644\",\"type\":\"ResetTool\"},{\"id\":\"e1e08504-86f4-41e5-adf9-6b460959dd31\",\"type\":\"HelpTool\"}],\"x_mapper_type\":\"auto\",\"x_range\":{\"id\":\"f5272d76-ba74-45b2-befe-10481545794f\",\"type\":\"Range1d\"},\"xscale\":\"auto\",\"y_mapper_type\":\"auto\",\"y_range\":{\"id\":\"271eeacd-8b37-4a69-9f36-d8721f1856c5\",\"type\":\"Range1d\"},\"yscale\":\"auto\"},\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"95e4e73c-a8e4-4e18-b8c6-bf17ac871bcb\",\"type\":\"ToolEvents\"},{\"attributes\":{\"callback\":null,\"end\":1099.5,\"start\":-94.5},\"id\":\"f5272d76-ba74-45b2-befe-10481545794f\",\"type\":\"Range1d\"},{\"attributes\":{\"overlay\":{\"id\":\"2d06fdaf-abd4-4184-94f9-f1a9da93e3e2\",\"type\":\"BoxAnnotation\"},\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"74008344-7e98-498e-9ad3-15336391f431\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"1b9ec1e7-d2ce-448a-a9e5-4f5f89553947\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"data_source\":{\"id\":\"22382249-efed-4c67-9ec5-89fe759819db\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"c5fee6df-ff3b-4c1a-9a96-60e8e4e2fb0a\",\"type\":\"Line\"},\"hover_glyph\":null,\"nonselection_glyph\":null,\"selection_glyph\":null},\"id\":\"2a622bdb-95a7-44ff-ac00-4999132348e2\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"line_color\":{\"value\":\"#2b83ba\"},\"line_width\":{\"value\":2},\"x\":{\"field\":\"x_values\"},\"y\":{\"field\":\"y_values\"}},\"id\":\"c5fee6df-ff3b-4c1a-9a96-60e8e4e2fb0a\",\"type\":\"Line\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"d006743b-1a49-40a3-a3a9-bb5474fb5644\",\"type\":\"ResetTool\"},{\"attributes\":{\"axis_label\":\"time (s)\",\"formatter\":{\"id\":\"54b9dc30-ded8-4f9f-81d2-92d0b581a3be\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"3703a3f6-3429-44cf-acb0-1dfc6eece83c\",\"type\":\"BasicTicker\"}},\"id\":\"7202738a-0600-4522-93b6-c6d9208a953f\",\"type\":\"LinearAxis\"},{\"attributes\":{\"legends\":[[\"numpy\",[{\"id\":\"46a30a0c-fa7e-45ee-933b-7ea7ba48a0b9\",\"type\":\"GlyphRenderer\"}]],[\"linear\",[{\"id\":\"a915c778-8ae8-4120-adae-0470f9d847f5\",\"type\":\"GlyphRenderer\"}]],[\"binary\",[{\"id\":\"2a622bdb-95a7-44ff-ac00-4999132348e2\",\"type\":\"GlyphRenderer\"}]],[\"meta\",[{\"id\":\"21ad974d-c9cc-4af9-9616-619d62a26dce\",\"type\":\"GlyphRenderer\"}]]],\"location\":\"top_left\",\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"fb8f4e71-a2d0-422d-a0a0-71cef4b119f8\",\"type\":\"Legend\"},{\"attributes\":{},\"id\":\"17f4478c-92e3-4192-9f2a-bfcebae12af1\",\"type\":\"BasicTicker\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x_values\",\"y_values\"],\"data\":{\"chart_index\":[{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"}],\"series\":[\"binary\",\"binary\",\"binary\",\"binary\",\"binary\",\"binary\",\"binary\"],\"x_values\":[5,10,50,100,250,500,1000],\"y_values\":[0.018324112892150878,0.02266721725463867,0.033489012718200685,0.042221808433532716,0.0470797061920166,0.05115189552307129,0.055628609657287595]}},\"id\":\"22382249-efed-4c67-9ec5-89fe759819db\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"2d06fdaf-abd4-4184-94f9-f1a9da93e3e2\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"line_color\":{\"value\":\"#d7191c\"},\"line_width\":{\"value\":2},\"x\":{\"field\":\"x_values\"},\"y\":{\"field\":\"y_values\"}},\"id\":\"417e8d66-3b70-4a7e-a19b-3bce7692dabe\",\"type\":\"Line\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x_values\",\"y_values\"],\"data\":{\"chart_index\":[{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"}],\"series\":[\"meta\",\"meta\",\"meta\",\"meta\",\"meta\",\"meta\",\"meta\"],\"x_values\":[5,10,50,100,250,500,1000],\"y_values\":[0.009740495681762695,0.013511300086975098,0.023372197151184083,0.02882649898529053,0.036826801300048825,0.039762687683105466,0.04576270580291748]}},\"id\":\"8396eed3-fb1c-44dc-b070-44cb4feb183a\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"e1e08504-86f4-41e5-adf9-6b460959dd31\",\"type\":\"HelpTool\"},{\"attributes\":{\"data_source\":{\"id\":\"eb61ce55-2f23-4bee-871c-f0f84685657b\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"eaa16005-d19e-4adc-a0a3-7d129ac0550c\",\"type\":\"Line\"},\"hover_glyph\":null,\"nonselection_glyph\":null,\"selection_glyph\":null},\"id\":\"a915c778-8ae8-4120-adae-0470f9d847f5\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"callback\":null,\"end\":0.3277241683006286,\"start\":-0.019349071979522705},\"id\":\"271eeacd-8b37-4a69-9f36-d8721f1856c5\",\"type\":\"Range1d\"},{\"attributes\":{\"line_color\":{\"value\":\"#fdae61\"},\"line_width\":{\"value\":2},\"x\":{\"field\":\"x_values\"},\"y\":{\"field\":\"y_values\"}},\"id\":\"1081696e-1cf7-46d5-b824-28233b16f59a\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"3703a3f6-3429-44cf-acb0-1dfc6eece83c\",\"type\":\"BasicTicker\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"3703a3f6-3429-44cf-acb0-1dfc6eece83c\",\"type\":\"BasicTicker\"}},\"id\":\"f40f1a39-2214-4310-974e-361db53ef0e7\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"9fa8ebcc-4c39-45ab-bc28-0a4fa2fab460\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"data_source\":{\"id\":\"8396eed3-fb1c-44dc-b070-44cb4feb183a\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1081696e-1cf7-46d5-b824-28233b16f59a\",\"type\":\"Line\"},\"hover_glyph\":null,\"nonselection_glyph\":null,\"selection_glyph\":null},\"id\":\"21ad974d-c9cc-4af9-9616-619d62a26dce\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"axis_label\":\"nbins\",\"formatter\":{\"id\":\"9fa8ebcc-4c39-45ab-bc28-0a4fa2fab460\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"17f4478c-92e3-4192-9f2a-bfcebae12af1\",\"type\":\"BasicTicker\"}},\"id\":\"f79a8bd8-ebbf-49de-91bf-cf25a0a26b70\",\"type\":\"LinearAxis\"},{\"attributes\":{\"data_source\":{\"id\":\"cd0de9ba-6dda-4576-9a60-4bee09cbc51d\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"417e8d66-3b70-4a7e-a19b-3bce7692dabe\",\"type\":\"Line\"},\"hover_glyph\":null,\"nonselection_glyph\":null,\"selection_glyph\":null},\"id\":\"46a30a0c-fa7e-45ee-933b-7ea7ba48a0b9\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"48cc67d3-114b-4d99-ade9-c4c8596609ab\",\"type\":\"PreviewSaveTool\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"d0266c9e-1c0c-44ad-a083-86f6db35c576\",\"type\":\"ResizeTool\"},{\"attributes\":{},\"id\":\"54b9dc30-ded8-4f9f-81d2-92d0b581a3be\",\"type\":\"BasicTickFormatter\"}],\"root_ids\":[\"c31037ad-6337-4dd7-9e81-262490cb179c\"]},\"title\":\"Bokeh Application\",\"version\":\"0.11.1\"}};\n",
" var render_items = [{\"docid\":\"318f296a-25aa-439f-bd90-17180adc7875\",\"elementid\":\"9d5ef7ee-233c-4088-a7ff-ce38cd235086\",\"modelid\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"notebook_comms_target\":\"2921af11-49ee-4714-9c07-001144cf4bb4\"}];\n",
" \n",
" Bokeh.embed.embed_items(docs_json, render_items);\n",
" });\n",
" },\n",
" function(Bokeh) {\n",
" }\n",
" ];\n",
" \n",
" function run_inline_js() {\n",
" for (var i = 0; i < inline_js.length; i++) {\n",
" inline_js[i](window.Bokeh);\n",
" }\n",
" }\n",
" \n",
" if (window._bokeh_is_loading === 0) {\n",
" console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n",
" run_inline_js();\n",
" } else {\n",
" load_libs(js_urls, function() {\n",
" console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n",
" run_inline_js();\n",
" });\n",
" }\n",
" }(this));\n",
"</script>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import bokeh.charts as bc\n",
"from bokeh.palettes import Spectral4\n",
"\n",
"bc.output_notebook(hide_banner=True)\n",
"\n",
"line = bc.Line(dict(nbins=nbins,\n",
" numpy=numpy_times,\n",
" linear=linear_times,\n",
" binary=binary_times,\n",
" meta=meta_times),\n",
" x='nbins', y=['numpy', 'linear', 'binary', 'meta'],\n",
" legend='top_left', palette=Spectral4,\n",
" title='Binning Times', ylabel='time (s)')\n",
"bc.show(line);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- There is a cost to looping and genericness, but this cost is a constant factor. Depending on application, this may or may not matter.\n",
"- Numpy is pretty fast, even with extra allocations. For my application, the constant factor between numpy and numba is negligible.\n",
"- Linear search is linear (no surprise).\n",
"- String metaprogramming can be pretty readable if your algorithm is written in a functional way. I'm actually pretty happy with how 1:1 the code from the binary search maps to the metaprogrammed version. Certainly not as clean as it could be in a language with macros/templates though."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment