Skip to content

Instantly share code, notes, and snippets.

@cobaltgit
Last active January 14, 2023 15:31
Show Gist options
  • Save cobaltgit/79dd5812dcc83284d0c125178a217614 to your computer and use it in GitHub Desktop.
Save cobaltgit/79dd5812dcc83284d0c125178a217614 to your computer and use it in GitHub Desktop.
A benchmark comparing linear and binary search algorithms - Source code compatible with Python 3.5+
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linear vs. Binary Search\n",
"### A brief comparison of linear and binary search algorithms\n",
"\n",
"Below is the source code for the functions we will be using for each algorithm. Binary search can be implemented in two ways - iterating using a while loop, and using a recursive function."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from typing import List, Optional\n",
"\n",
"\n",
"def iterative_binary_search(\n",
" arr: List[int], target: int, low: int = 0, high: Optional[int] = None\n",
") -> int:\n",
" \"\"\"Binary search an array and return the index of the target if found, -1 otherwise.\"\"\"\n",
" high = high or len(arr) - 1 # set high or last index of array\n",
" while low <= high: # repeat until target index found, else return -1\n",
" mid = (low + high) // 2 # midpoint of array indices\n",
" if arr[mid] == target:\n",
" return mid\n",
" elif arr[mid] < target:\n",
" low = mid + 1 # search second half of array\n",
" else:\n",
" high = mid - 1 # search first half of array\n",
" return -1\n",
"\n",
"\n",
"def recursive_binary_search(\n",
" arr: List[int], target: int, low: int = 0, high: Optional[int] = None\n",
") -> int:\n",
" \"\"\"Recursively binary search an array and return the index of the target if found, -1 otherwise.\"\"\"\n",
" high = high or len(arr) - 1 # set high or last index of array\n",
" if low > high:\n",
" return -1 # low index cannot be greater than high index\n",
" mid = (low + high) // 2\n",
" if arr[mid] == target:\n",
" return mid\n",
" elif arr[mid] < target:\n",
" return recursive_binary_search(\n",
" arr, target, mid + 1, high\n",
" ) # recursively search second half of array\n",
" else:\n",
" return recursive_binary_search(\n",
" arr, target, low, mid - 1\n",
" ) # recursively search first half of array\n",
"\n",
"\n",
"def linear_search(arr: List[int], target: int) -> int:\n",
" \"\"\"Iterate over an array and return the index of the target if found, -1 otherwise.\"\"\"\n",
" return next((i for i, v in enumerate(arr) if v == target), -1) # one-line implementation of linear search"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Why sort?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-1"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from random import shuffle\n",
"\n",
"unsorted_arr = [*range(100)]\n",
"shuffle(unsorted_arr) # disorder array\n",
"iterative_binary_search(unsorted_arr, 32)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we look at the above value, we can see that an incorrect result is returned, as if the binary search algorithm failed to find the target element \n",
"Interestingly enough, the linear search algorithm works even on unsorted data, as shown by the below value:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"66"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linear_search(unsorted_arr, 32)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is because the binary search algorithm cannot work with unsorted arrays, as the array is \"halved\" repeatedly either until the target element is found (target index returned) or until it can no longer (-1 index returned) \n",
"Normally, 32 would be found at index 32 in the sorted equivalent of this array. Let's find out the outcome of using a sorted array"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"32"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted_arr = sorted(unsorted_arr) # re-sort array\n",
"iterative_binary_search(sorted_arr, 32)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, if we look here, we get the correct index, great! \n",
"Binary search only works on sorted data as it repeatedly \"halves\" the input array until either the target is found (target index returned) or until it can no longer (-1 index returned) \n",
"Next, let's look at speed. We'll be using a sorted array of 250,000,000 numbers for this."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Benchmark"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.62 µs ± 81 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)\n",
"5.72 µs ± 70.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)\n",
"1.02 ms ± 5.91 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
]
}
],
"source": [
"AMOUNT = 250_000_000\n",
"HUGE_ARRAY = list(range(AMOUNT))\n",
"TARGET_VAL = 32767\n",
"\n",
"%timeit iterative_binary_search(HUGE_ARRAY, TARGET_VAL)\n",
"%timeit recursive_binary_search(HUGE_ARRAY, TARGET_VAL)\n",
"%timeit linear_search(HUGE_ARRAY, TARGET_VAL)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This shows that binary search is the best for working with very large sorted data sets due to its logarithmic time complexity (time reduces every loop) \n",
"The recursive approach is ever-so-slightly faster than recursion, but the speed difference is mere nanoseconds, too little to notice - not only that but in even larger data sets, the max recursion depth may be exceeded (if you even have to go that large) \n",
"Either way both approaches are hundreds of times faster than just looping over an array one-by-one until the target is found, which is the most common way to search an array"
]
}
],
"metadata": {
"interpreter": {
"hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
},
"kernelspec": {
"display_name": "Python 3.10.4 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment