Last active
April 22, 2016 18:41
-
-
Save jcrist/3dfea5a77cbffbc5f6a381f991dd439e to your computer and use it in GitHub Desktop.
Binning Implementations. Best viewed at nbviewer: http://nbviewer.jupyter.org/gist/jcrist/3dfea5a77cbffbc5f6a381f991dd439e
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## The Problem" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Given a number and a set of constant bins, what is the fastest way to do the binnning? Values outside of the bins should return `nan`, otherwise they should be inclusive on the left bound." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def to_bins(x):\n", | |
" if x < 0.0:\n", | |
" return np.nan\n", | |
" if x < 1.3:\n", | |
" return 0\n", | |
" elif x < 1.35:\n", | |
" return 1\n", | |
" elif x < 2.4:\n", | |
" return 2\n", | |
" # And so on ...\n", | |
" elif x < 84.3:\n", | |
" return 100\n", | |
" else:\n", | |
" return np.nan" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Basic test case" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"\n", | |
"def make_bins(n):\n", | |
" return np.cumsum(np.random.random(n))\n", | |
"\n", | |
"def make_data(n, bins):\n", | |
" return np.random.uniform(bins.min() - 0.5, bins.max() + 0.5, n)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"bins = make_bins(10)\n", | |
"data = make_data(10000, bins)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 0.1829594 , 0.53392777, 0.87212596, 0.87952459, 1.18258028,\n", | |
" 1.84474015, 2.60266074, 2.84080941, 3.7679846 , 4.05534851])" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"bins" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 0.72821738, 2.38961742, 3.10126157, ..., 4.17100479,\n", | |
" 0.71565192, 2.08038063])" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Numpy Solution" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Implement a binary search using numpy." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def make_numpy(bins, otherwise):\n", | |
" def numpy(data):\n", | |
" return np.where((data < bins[0]) | (data >= bins[-1]), otherwise,\n", | |
" np.digitize(data, bins[1:]))\n", | |
" return numpy" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"numpy = make_numpy(bins, np.nan)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 1., 5., 7., ..., nan, 1., 5.])" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"numpy(data)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Custom linear search" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Implement a linear search using numba." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import numba as nb\n", | |
"\n", | |
"def make_linear(bins, otherwise):\n", | |
" @nb.vectorize('f8(f8)', nopython=True)\n", | |
" def linear_search(x):\n", | |
" if x < bins[0]:\n", | |
" return otherwise\n", | |
" for i in range(1, bins.shape[0]):\n", | |
" if x < bins[i]:\n", | |
" return i - 1\n", | |
" return otherwise\n", | |
" return linear_search" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"linear = make_linear(bins, np.nan)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 1., 5., 7., ..., nan, 1., 5.])" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"linear(data)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Custom Binary Search" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Implement a binary search using numba." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def make_binary(bins, otherwise):\n", | |
" @nb.vectorize('f8(f8)', nopython=True)\n", | |
" def binary_search(x):\n", | |
" low = 0\n", | |
" high = len(bins)\n", | |
" mid = (low + high) // 2\n", | |
" while low < high:\n", | |
" if x < bins[mid]:\n", | |
" high = mid\n", | |
" else:\n", | |
" low = mid + 1\n", | |
" mid = (low + high) // 2\n", | |
" if mid == 0 or mid == len(bins):\n", | |
" return otherwise\n", | |
" return mid - 1\n", | |
" return binary_search" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"binary = make_binary(bins, np.nan)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 1., 5., 7., ..., nan, 1., 5.])" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"binary(data)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Metaprogrammed binary search" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Unroll a binary search into cascading if statements using string metaprogramming. Probably a bad idea." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import sys\n", | |
"from itertools import count\n", | |
"\n", | |
"# In python 2, exec is a statement.\n", | |
"# This is needed for compatibility\n", | |
"\n", | |
"if sys.version_info[0] == 3:\n", | |
" def _exec(codestr, glbls):\n", | |
" exec(codestr, glbls)\n", | |
"else:\n", | |
" eval(compile(\"\"\"\n", | |
"def _exec(codestr, glbls):\n", | |
" exec codestr in glbls\n", | |
"\"\"\", \"<_exec>\", \"exec\"))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def make_meta(bins, otherwise, debug=False):\n", | |
" namespace = {}\n", | |
" _names = count()\n", | |
"\n", | |
" # A function for creating symbols and adding them to the namespace.\n", | |
" def gsym(x):\n", | |
" s = \"_%d\" % next(_names)\n", | |
" namespace[s] = x\n", | |
" return s\n", | |
"\n", | |
" body = _make_meta(bins, gsym(otherwise), 0, len(bins), gsym)\n", | |
" code = \"def meta(x):\\n{body}\".format(body=indent(body))\n", | |
" if debug:\n", | |
" print(code)\n", | |
" _exec(code, namespace)\n", | |
" return nb.vectorize('f8(f8)', nopython=True)(namespace['meta'])\n", | |
"\n", | |
"\n", | |
"indent = lambda s: '\\n'.join(' ' + l for l in s.splitlines())\n", | |
"\n", | |
"\n", | |
"def _make_meta(bins, otherwise, low, high, gsym):\n", | |
" mid = (high + low) // 2\n", | |
" if high > low:\n", | |
" return (\n", | |
" \"if x < {bound}:\\n\"\n", | |
" \"{left}\\n\"\n", | |
" \"else:\\n\"\n", | |
" \"{right}\").format(bound=gsym(bins[mid]),\n", | |
" left=indent(_make_meta(bins, otherwise, low, mid, gsym)),\n", | |
" right=indent(_make_meta(bins, otherwise, mid + 1, high, gsym)))\n", | |
" elif mid == 0 or mid == len(bins):\n", | |
" return 'return {0}'.format(otherwise)\n", | |
" return 'return {0}'.format(gsym(mid - 1))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"def meta(x):\n", | |
" if x < _1:\n", | |
" if x < _2:\n", | |
" if x < _3:\n", | |
" if x < _4:\n", | |
" return _0\n", | |
" else:\n", | |
" return _5\n", | |
" else:\n", | |
" return _6\n", | |
" else:\n", | |
" if x < _7:\n", | |
" if x < _8:\n", | |
" return _9\n", | |
" else:\n", | |
" return _10\n", | |
" else:\n", | |
" return _11\n", | |
" else:\n", | |
" if x < _12:\n", | |
" if x < _13:\n", | |
" if x < _14:\n", | |
" return _15\n", | |
" else:\n", | |
" return _16\n", | |
" else:\n", | |
" return _17\n", | |
" else:\n", | |
" if x < _18:\n", | |
" return _19\n", | |
" else:\n", | |
" return _0\n" | |
] | |
} | |
], | |
"source": [ | |
"# Setting debug to True to print the generated code\n", | |
"meta = make_meta(bins, np.nan, debug=True)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([ 1., 5., 7., ..., nan, 1., 5.])" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"meta(data)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Ensure Correctness" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"True\n", | |
"True\n", | |
"True\n" | |
] | |
} | |
], | |
"source": [ | |
"for f in [linear, binary, meta]:\n", | |
" print(np.allclose(numpy(data), f(data), equal_nan=True))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Benchmark" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Benchmark the various implementations using increasing number of bins." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from timeit import default_timer\n", | |
"\n", | |
"def time(f, *args, **kwargs):\n", | |
" n = kwargs.get('n', 10)\n", | |
" start = default_timer()\n", | |
" for i in range(n):\n", | |
" f(*args)\n", | |
" end = default_timer()\n", | |
" return (end - start) / n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"nbins = [5, 10, 50, 100, 250, 500, 1000]\n", | |
"\n", | |
"numpy_times = []\n", | |
"linear_times = []\n", | |
"binary_times = []\n", | |
"meta_times = []\n", | |
"\n", | |
"for b in nbins:\n", | |
" bins = make_bins(b)\n", | |
" data = make_data(1000000, bins)\n", | |
" # Build functions\n", | |
" numpy = make_numpy(bins, np.nan)\n", | |
" linear = make_linear(bins, np.nan)\n", | |
" binary = make_binary(bins, np.nan)\n", | |
" meta = make_meta(bins, np.nan)\n", | |
" # Precompile\n", | |
" linear(data)\n", | |
" binary(data)\n", | |
" meta(data)\n", | |
" # Timings\n", | |
" numpy_times.append(time(numpy, data))\n", | |
" linear_times.append(time(linear, data))\n", | |
" binary_times.append(time(binary, data))\n", | |
" meta_times.append(time(meta, data))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"application/javascript": [ | |
"\n", | |
"(function(global) {\n", | |
" function now() {\n", | |
" return new Date();\n", | |
" }\n", | |
"\n", | |
" if (typeof (window._bokeh_onload_callbacks) === \"undefined\") {\n", | |
" window._bokeh_onload_callbacks = [];\n", | |
" }\n", | |
"\n", | |
" function run_callbacks() {\n", | |
" window._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n", | |
" delete window._bokeh_onload_callbacks\n", | |
" console.info(\"Bokeh: all callbacks have finished\");\n", | |
" }\n", | |
"\n", | |
" function load_libs(js_urls, callback) {\n", | |
" window._bokeh_onload_callbacks.push(callback);\n", | |
" if (window._bokeh_is_loading > 0) {\n", | |
" console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", | |
" return null;\n", | |
" }\n", | |
" if (js_urls == null || js_urls.length === 0) {\n", | |
" run_callbacks();\n", | |
" return null;\n", | |
" }\n", | |
" console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", | |
" window._bokeh_is_loading = js_urls.length;\n", | |
" for (var i = 0; i < js_urls.length; i++) {\n", | |
" var url = js_urls[i];\n", | |
" var s = document.createElement('script');\n", | |
" s.src = url;\n", | |
" s.async = false;\n", | |
" s.onreadystatechange = s.onload = function() {\n", | |
" window._bokeh_is_loading--;\n", | |
" if (window._bokeh_is_loading === 0) {\n", | |
" console.log(\"Bokeh: all BokehJS libraries loaded\");\n", | |
" run_callbacks()\n", | |
" }\n", | |
" };\n", | |
" s.onerror = function() {\n", | |
" console.warn(\"failed to load library \" + url);\n", | |
" };\n", | |
" console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", | |
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n", | |
" }\n", | |
" };\n", | |
"\n", | |
" var js_urls = ['https://cdn.pydata.org/bokeh/release/bokeh-0.11.1.min.js', 'https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.11.1.min.js', 'https://cdn.pydata.org/bokeh/release/bokeh-compiler-0.11.1.min.js'];\n", | |
"\n", | |
" var inline_js = [\n", | |
" function(Bokeh) {\n", | |
" Bokeh.set_log_level(\"info\");\n", | |
" },\n", | |
" \n", | |
" function(Bokeh) {\n", | |
" Bokeh.$(\"#2726e06f-9c66-4bd6-b50b-3cd7a2e0f908\").text(\"BokehJS successfully loaded\");\n", | |
" },\n", | |
" function(Bokeh) {\n", | |
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.11.1.min.css\");\n", | |
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.11.1.min.css\");\n", | |
" console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.11.1.min.css\");\n", | |
" Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.11.1.min.css\");\n", | |
" }\n", | |
" ];\n", | |
"\n", | |
" function run_inline_js() {\n", | |
" for (var i = 0; i < inline_js.length; i++) {\n", | |
" inline_js[i](window.Bokeh);\n", | |
" }\n", | |
" }\n", | |
"\n", | |
" if (window._bokeh_is_loading === 0) {\n", | |
" console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", | |
" run_inline_js();\n", | |
" } else {\n", | |
" load_libs(js_urls, function() {\n", | |
" console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n", | |
" run_inline_js();\n", | |
" });\n", | |
" }\n", | |
"}(this));" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"\n", | |
"\n", | |
" <div class=\"plotdiv\" id=\"9d5ef7ee-233c-4088-a7ff-ce38cd235086\"></div>\n", | |
"<script type=\"text/javascript\">\n", | |
" \n", | |
" (function(global) {\n", | |
" function now() {\n", | |
" return new Date();\n", | |
" }\n", | |
" \n", | |
" if (typeof (window._bokeh_onload_callbacks) === \"undefined\") {\n", | |
" window._bokeh_onload_callbacks = [];\n", | |
" }\n", | |
" \n", | |
" function run_callbacks() {\n", | |
" window._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n", | |
" delete window._bokeh_onload_callbacks\n", | |
" console.info(\"Bokeh: all callbacks have finished\");\n", | |
" }\n", | |
" \n", | |
" function load_libs(js_urls, callback) {\n", | |
" window._bokeh_onload_callbacks.push(callback);\n", | |
" if (window._bokeh_is_loading > 0) {\n", | |
" console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", | |
" return null;\n", | |
" }\n", | |
" if (js_urls == null || js_urls.length === 0) {\n", | |
" run_callbacks();\n", | |
" return null;\n", | |
" }\n", | |
" console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", | |
" window._bokeh_is_loading = js_urls.length;\n", | |
" for (var i = 0; i < js_urls.length; i++) {\n", | |
" var url = js_urls[i];\n", | |
" var s = document.createElement('script');\n", | |
" s.src = url;\n", | |
" s.async = false;\n", | |
" s.onreadystatechange = s.onload = function() {\n", | |
" window._bokeh_is_loading--;\n", | |
" if (window._bokeh_is_loading === 0) {\n", | |
" console.log(\"Bokeh: all BokehJS libraries loaded\");\n", | |
" run_callbacks()\n", | |
" }\n", | |
" };\n", | |
" s.onerror = function() {\n", | |
" console.warn(\"failed to load library \" + url);\n", | |
" };\n", | |
" console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", | |
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n", | |
" }\n", | |
" };var element = document.getElementById(\"9d5ef7ee-233c-4088-a7ff-ce38cd235086\");\n", | |
" if (element == null) {\n", | |
" console.log(\"Bokeh: ERROR: autoload.js configured with elementid '9d5ef7ee-233c-4088-a7ff-ce38cd235086' but no matching script tag was found. \")\n", | |
" return false;\n", | |
" }\n", | |
" \n", | |
" var js_urls = [];\n", | |
" \n", | |
" var inline_js = [\n", | |
" function(Bokeh) {\n", | |
" Bokeh.$(function() {\n", | |
" var docs_json = {\"318f296a-25aa-439f-bd90-17180adc7875\":{\"roots\":{\"references\":[{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"58761fd7-eaca-4130-9aa4-0fd215fbd5f2\",\"type\":\"PanTool\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"17f4478c-92e3-4192-9f2a-bfcebae12af1\",\"type\":\"BasicTicker\"}},\"id\":\"39fc0867-922e-4f3c-9c10-f0208cb68bed\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x_values\",\"y_values\"],\"data\":{\"chart_index\":[{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"},{\"series\":\"numpy\"}],\"series\":[\"numpy\",\"numpy\",\"numpy\",\"numpy\",\"numpy\",\"numpy\",\"numpy\"],\"x_values\":[5,10,50,100,250,500,1000],\"y_values\":[0.02296450138092041,0.02864091396331787,0.04282679557800293,0.05167970657348633,0.06255319118499755,0.06536858081817627,0.07222120761871338]}},\"id\":\"cd0de9ba-6dda-4576-9a60-4bee09cbc51d\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x_values\",\"y_values\"],\"data\":{\"chart_index\":[{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"},{\"series\":\"linear\"}],\"series\":[\"linear\",\"linear\",\"linear\",\"linear\",\"linear\",\"linear\",\"linear\"],\"x_values\":[5,10,50,100,250,500,1000],\"y_values\":[0.009573698043823242,0.012730002403259277,0.024196910858154296,0.04238870143890381,0.08969380855560302,0.1499985933303833,0.2988013982772827]}},\"id\":\"eb61ce55-2f23-4bee-871c-f0f84685657b\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"line_color\":{\"value\":\"#abdda4\"},\"line_width\":{\"value\":2},\"x\":{\"field\":\"x_values\"},\"y\":{\"field\":\"y_values\"}},\"id\":\"eaa16005-d19e-4adc-a0a3-7d129ac0550c\",\"type\":\"Line\"},{\"attributes\":{\"below\":[{\"id\":\"f79a8bd8-ebbf-49de-91bf-cf25a0a26b70\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"7202738a-0600-4522-93b6-c6d9208a953f\",\"type\":\"LinearAxis\"}],\"legend\":\"top_left\",\"renderers\":[{\"id\":\"2d06fdaf-abd4-4184-94f9-f1a9da93e3e2\",\"type\":\"BoxAnnotation\"},{\"id\":\"46a30a0c-fa7e-45ee-933b-7ea7ba48a0b9\",\"type\":\"GlyphRenderer\"},{\"id\":\"a915c778-8ae8-4120-adae-0470f9d847f5\",\"type\":\"GlyphRenderer\"},{\"id\":\"2a622bdb-95a7-44ff-ac00-4999132348e2\",\"type\":\"GlyphRenderer\"},{\"id\":\"21ad974d-c9cc-4af9-9616-619d62a26dce\",\"type\":\"GlyphRenderer\"},{\"id\":\"fb8f4e71-a2d0-422d-a0a0-71cef4b119f8\",\"type\":\"Legend\"},{\"id\":\"f79a8bd8-ebbf-49de-91bf-cf25a0a26b70\",\"type\":\"LinearAxis\"},{\"id\":\"7202738a-0600-4522-93b6-c6d9208a953f\",\"type\":\"LinearAxis\"},{\"id\":\"39fc0867-922e-4f3c-9c10-f0208cb68bed\",\"type\":\"Grid\"},{\"id\":\"f40f1a39-2214-4310-974e-361db53ef0e7\",\"type\":\"Grid\"}],\"title\":\"Binning Times\",\"title_text_font_size\":{\"value\":\"14pt\"},\"tool_events\":{\"id\":\"95e4e73c-a8e4-4e18-b8c6-bf17ac871bcb\",\"type\":\"ToolEvents\"},\"tools\":[{\"id\":\"58761fd7-eaca-4130-9aa4-0fd215fbd5f2\",\"type\":\"PanTool\"},{\"id\":\"1b9ec1e7-d2ce-448a-a9e5-4f5f89553947\",\"type\":\"WheelZoomTool\"},{\"id\":\"74008344-7e98-498e-9ad3-15336391f431\",\"type\":\"BoxZoomTool\"},{\"id\":\"48cc67d3-114b-4d99-ade9-c4c8596609ab\",\"type\":\"PreviewSaveTool\"},{\"id\":\"d0266c9e-1c0c-44ad-a083-86f6db35c576\",\"type\":\"ResizeTool\"},{\"id\":\"d006743b-1a49-40a3-a3a9-bb5474fb5644\",\"type\":\"ResetTool\"},{\"id\":\"e1e08504-86f4-41e5-adf9-6b460959dd31\",\"type\":\"HelpTool\"}],\"x_mapper_type\":\"auto\",\"x_range\":{\"id\":\"f5272d76-ba74-45b2-befe-10481545794f\",\"type\":\"Range1d\"},\"xscale\":\"auto\",\"y_mapper_type\":\"auto\",\"y_range\":{\"id\":\"271eeacd-8b37-4a69-9f36-d8721f1856c5\",\"type\":\"Range1d\"},\"yscale\":\"auto\"},\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"95e4e73c-a8e4-4e18-b8c6-bf17ac871bcb\",\"type\":\"ToolEvents\"},{\"attributes\":{\"callback\":null,\"end\":1099.5,\"start\":-94.5},\"id\":\"f5272d76-ba74-45b2-befe-10481545794f\",\"type\":\"Range1d\"},{\"attributes\":{\"overlay\":{\"id\":\"2d06fdaf-abd4-4184-94f9-f1a9da93e3e2\",\"type\":\"BoxAnnotation\"},\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"74008344-7e98-498e-9ad3-15336391f431\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"1b9ec1e7-d2ce-448a-a9e5-4f5f89553947\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"data_source\":{\"id\":\"22382249-efed-4c67-9ec5-89fe759819db\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"c5fee6df-ff3b-4c1a-9a96-60e8e4e2fb0a\",\"type\":\"Line\"},\"hover_glyph\":null,\"nonselection_glyph\":null,\"selection_glyph\":null},\"id\":\"2a622bdb-95a7-44ff-ac00-4999132348e2\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"line_color\":{\"value\":\"#2b83ba\"},\"line_width\":{\"value\":2},\"x\":{\"field\":\"x_values\"},\"y\":{\"field\":\"y_values\"}},\"id\":\"c5fee6df-ff3b-4c1a-9a96-60e8e4e2fb0a\",\"type\":\"Line\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"d006743b-1a49-40a3-a3a9-bb5474fb5644\",\"type\":\"ResetTool\"},{\"attributes\":{\"axis_label\":\"time (s)\",\"formatter\":{\"id\":\"54b9dc30-ded8-4f9f-81d2-92d0b581a3be\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"3703a3f6-3429-44cf-acb0-1dfc6eece83c\",\"type\":\"BasicTicker\"}},\"id\":\"7202738a-0600-4522-93b6-c6d9208a953f\",\"type\":\"LinearAxis\"},{\"attributes\":{\"legends\":[[\"numpy\",[{\"id\":\"46a30a0c-fa7e-45ee-933b-7ea7ba48a0b9\",\"type\":\"GlyphRenderer\"}]],[\"linear\",[{\"id\":\"a915c778-8ae8-4120-adae-0470f9d847f5\",\"type\":\"GlyphRenderer\"}]],[\"binary\",[{\"id\":\"2a622bdb-95a7-44ff-ac00-4999132348e2\",\"type\":\"GlyphRenderer\"}]],[\"meta\",[{\"id\":\"21ad974d-c9cc-4af9-9616-619d62a26dce\",\"type\":\"GlyphRenderer\"}]]],\"location\":\"top_left\",\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"fb8f4e71-a2d0-422d-a0a0-71cef4b119f8\",\"type\":\"Legend\"},{\"attributes\":{},\"id\":\"17f4478c-92e3-4192-9f2a-bfcebae12af1\",\"type\":\"BasicTicker\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x_values\",\"y_values\"],\"data\":{\"chart_index\":[{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"},{\"series\":\"binary\"}],\"series\":[\"binary\",\"binary\",\"binary\",\"binary\",\"binary\",\"binary\",\"binary\"],\"x_values\":[5,10,50,100,250,500,1000],\"y_values\":[0.018324112892150878,0.02266721725463867,0.033489012718200685,0.042221808433532716,0.0470797061920166,0.05115189552307129,0.055628609657287595]}},\"id\":\"22382249-efed-4c67-9ec5-89fe759819db\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"2d06fdaf-abd4-4184-94f9-f1a9da93e3e2\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"line_color\":{\"value\":\"#d7191c\"},\"line_width\":{\"value\":2},\"x\":{\"field\":\"x_values\"},\"y\":{\"field\":\"y_values\"}},\"id\":\"417e8d66-3b70-4a7e-a19b-3bce7692dabe\",\"type\":\"Line\"},{\"attributes\":{\"callback\":null,\"column_names\":[\"x_values\",\"y_values\"],\"data\":{\"chart_index\":[{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"},{\"series\":\"meta\"}],\"series\":[\"meta\",\"meta\",\"meta\",\"meta\",\"meta\",\"meta\",\"meta\"],\"x_values\":[5,10,50,100,250,500,1000],\"y_values\":[0.009740495681762695,0.013511300086975098,0.023372197151184083,0.02882649898529053,0.036826801300048825,0.039762687683105466,0.04576270580291748]}},\"id\":\"8396eed3-fb1c-44dc-b070-44cb4feb183a\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"e1e08504-86f4-41e5-adf9-6b460959dd31\",\"type\":\"HelpTool\"},{\"attributes\":{\"data_source\":{\"id\":\"eb61ce55-2f23-4bee-871c-f0f84685657b\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"eaa16005-d19e-4adc-a0a3-7d129ac0550c\",\"type\":\"Line\"},\"hover_glyph\":null,\"nonselection_glyph\":null,\"selection_glyph\":null},\"id\":\"a915c778-8ae8-4120-adae-0470f9d847f5\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"callback\":null,\"end\":0.3277241683006286,\"start\":-0.019349071979522705},\"id\":\"271eeacd-8b37-4a69-9f36-d8721f1856c5\",\"type\":\"Range1d\"},{\"attributes\":{\"line_color\":{\"value\":\"#fdae61\"},\"line_width\":{\"value\":2},\"x\":{\"field\":\"x_values\"},\"y\":{\"field\":\"y_values\"}},\"id\":\"1081696e-1cf7-46d5-b824-28233b16f59a\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"3703a3f6-3429-44cf-acb0-1dfc6eece83c\",\"type\":\"BasicTicker\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"3703a3f6-3429-44cf-acb0-1dfc6eece83c\",\"type\":\"BasicTicker\"}},\"id\":\"f40f1a39-2214-4310-974e-361db53ef0e7\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"9fa8ebcc-4c39-45ab-bc28-0a4fa2fab460\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"data_source\":{\"id\":\"8396eed3-fb1c-44dc-b070-44cb4feb183a\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1081696e-1cf7-46d5-b824-28233b16f59a\",\"type\":\"Line\"},\"hover_glyph\":null,\"nonselection_glyph\":null,\"selection_glyph\":null},\"id\":\"21ad974d-c9cc-4af9-9616-619d62a26dce\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"axis_label\":\"nbins\",\"formatter\":{\"id\":\"9fa8ebcc-4c39-45ab-bc28-0a4fa2fab460\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"17f4478c-92e3-4192-9f2a-bfcebae12af1\",\"type\":\"BasicTicker\"}},\"id\":\"f79a8bd8-ebbf-49de-91bf-cf25a0a26b70\",\"type\":\"LinearAxis\"},{\"attributes\":{\"data_source\":{\"id\":\"cd0de9ba-6dda-4576-9a60-4bee09cbc51d\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"417e8d66-3b70-4a7e-a19b-3bce7692dabe\",\"type\":\"Line\"},\"hover_glyph\":null,\"nonselection_glyph\":null,\"selection_glyph\":null},\"id\":\"46a30a0c-fa7e-45ee-933b-7ea7ba48a0b9\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"48cc67d3-114b-4d99-ade9-c4c8596609ab\",\"type\":\"PreviewSaveTool\"},{\"attributes\":{\"plot\":{\"id\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"subtype\":\"Chart\",\"type\":\"Plot\"}},\"id\":\"d0266c9e-1c0c-44ad-a083-86f6db35c576\",\"type\":\"ResizeTool\"},{\"attributes\":{},\"id\":\"54b9dc30-ded8-4f9f-81d2-92d0b581a3be\",\"type\":\"BasicTickFormatter\"}],\"root_ids\":[\"c31037ad-6337-4dd7-9e81-262490cb179c\"]},\"title\":\"Bokeh Application\",\"version\":\"0.11.1\"}};\n", | |
" var render_items = [{\"docid\":\"318f296a-25aa-439f-bd90-17180adc7875\",\"elementid\":\"9d5ef7ee-233c-4088-a7ff-ce38cd235086\",\"modelid\":\"c31037ad-6337-4dd7-9e81-262490cb179c\",\"notebook_comms_target\":\"2921af11-49ee-4714-9c07-001144cf4bb4\"}];\n", | |
" \n", | |
" Bokeh.embed.embed_items(docs_json, render_items);\n", | |
" });\n", | |
" },\n", | |
" function(Bokeh) {\n", | |
" }\n", | |
" ];\n", | |
" \n", | |
" function run_inline_js() {\n", | |
" for (var i = 0; i < inline_js.length; i++) {\n", | |
" inline_js[i](window.Bokeh);\n", | |
" }\n", | |
" }\n", | |
" \n", | |
" if (window._bokeh_is_loading === 0) {\n", | |
" console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", | |
" run_inline_js();\n", | |
" } else {\n", | |
" load_libs(js_urls, function() {\n", | |
" console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n", | |
" run_inline_js();\n", | |
" });\n", | |
" }\n", | |
" }(this));\n", | |
"</script>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"import bokeh.charts as bc\n", | |
"from bokeh.palettes import Spectral4\n", | |
"\n", | |
"bc.output_notebook(hide_banner=True)\n", | |
"\n", | |
"line = bc.Line(dict(nbins=nbins,\n", | |
" numpy=numpy_times,\n", | |
" linear=linear_times,\n", | |
" binary=binary_times,\n", | |
" meta=meta_times),\n", | |
" x='nbins', y=['numpy', 'linear', 'binary', 'meta'],\n", | |
" legend='top_left', palette=Spectral4,\n", | |
" title='Binning Times', ylabel='time (s)')\n", | |
"bc.show(line);" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Conclusion" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- There is a cost to looping and genericness, but this cost is a constant factor. Depending on application, this may or may not matter.\n", | |
"- Numpy is pretty fast, even with extra allocations. For my application, the constant factor between numpy and numba is negligible.\n", | |
"- Linear search is linear (no surprise).\n", | |
"- String metaprogramming can be pretty readable if your algorithm is written in a functional way. I'm actually pretty happy with how 1:1 the code from the binary search maps to the metaprogrammed version. Certainly not as clean as it could be in a language with macros/templates though." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.11" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment