Skip to content

Instantly share code, notes, and snippets.

@Shawn-Armstrong
Last active May 23, 2023 04:34
Show Gist options
  • Save Shawn-Armstrong/d9dab915f6f7cb6fd19e92c8347c4fcd to your computer and use it in GitHub Desktop.
Save Shawn-Armstrong/d9dab915f6f7cb6fd19e92c8347c4fcd to your computer and use it in GitHub Desktop.

#Haskell_binary_search.md

Overview

  • This document contains a detailed review and analysis of the binary search algorithm, using an example implemented in Haskell.

Binary Search Algorithm

  • Binary search is an algorithm used to search for a specific value in a sorted list, as it relies on the property that each step can exclude half of the remaining elements.
  • Given a list $L$ and a target value $T$ the algorithm will do the following:
  graph TB
      Start("Start binary search") --> IsEmpty{"Is list L empty?"}
      IsEmpty -- Yes --> NotFound["Target T does not exist within the list"]
      IsEmpty -- No --> FindMid["Find the middle of list L and call it m"]
      FindMid --> IsEqual{"Is the value at index m equivalent to T?"}
      IsEqual -- Yes --> Found["Target T exists within the list at index m"]
      IsEqual -- No --> IsGreater{"Is T greater than the value?"}
      IsGreater -- Yes --> DiscardLeft["Discard all indices on the left side of m"]
      IsGreater -- No --> DiscardRight["Discard all indices on the right side of m"]
      DiscardLeft --> IsEmpty
      DiscardRight --> IsEmpty
Loading
T = 36
L: [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50], L[m] = 40, T<40 = discard RHS
L: [30,31,32,33,34,35,36,37,38,39], L[m] = 35, T>35 = discard LHS
L: [36,37,38,39] = L[m] = 38, T<38 = discard RHS
L: [36,37] = L[m] = 37, T<37 = discard RHS
L: [36] = L[m] = 36, T==36 return 36.

Haskell Implementation

Live Demo

import Debug.Trace

binarySearch :: Int -> [Int] -> Maybe Int
binarySearch target = binarySearchHelper 0
where
  binarySearchHelper :: Int -> [Int] -> Maybe Int
  binarySearchHelper _ [] = Nothing
  binarySearchHelper s xs = 
    let midIndex = length xs `div` 2
        midValue = xs !! midIndex
        (lhs, rhs) = splitAt midIndex xs
    in trace (formatTrace s xs midValue) $
       case compare target midValue of
         LT -> binarySearchHelper s lhs
         GT -> binarySearchHelper (s + midIndex + 1) (tail rhs)
         EQ -> Just (s + midIndex)

  formatTrace :: Int -> [Int] -> Int -> String
  formatTrace s xs midValue = 
    "Target: " ++ show target ++
    "\tMiddle value: " ++ show midValue ++
    "\tCurrent list: " ++ show xs 

main :: IO ()
main = do
  print $ binarySearch 20 [1 .. 20]
  • This implementation is making use of partial functions which can be tricky to read.
    • binarySearch is a function that takes an Int and returns a function of type [Int] -> Maybe Int
    • binarySearchHelper is a function that takes an Int and a list of [Ints]. On call, we provide it 0 and then wait for a list of integers.
    • So, when we execute print $ binarySearch 20 [1..14] 20 is target and the list [1..14] is ultimately being handed to binarySearchHelper 0.
  • It's important to note that Haskell's lexical scoping rules allows binarySearchHelper to reference target because it is defined within binarySearch.
  • In midValue = xs !! midIndex the !! operator is used to get the element at a specific index in a list.
  • The variable s is used to keep track of the starting index of the current sublist within the context of the original list. It is initialized to 0, because at the start, the "sublist" is the entire list and thus starts at index 0.
  • Whenever the function makes a recursive call on the right half of the list (i.e., the half with larger values), it updates s to s + midIndex + 1. This is because the start of the right sublist is midIndex + 1 elements away from the start of the current list.

Time Complexity

  • In a binary search, with each step, the size of the search space is halved. This is because the algorithm either searches in the left half or the right half of the list $L$, and ignores the other half at each iteration.
  • It takes $log(n)$ steps or iterations to reduce the size of the list to 1 by continuously halving it.
    • For example, if you have a list of 8 elements, it would take 3 steps to get down to 1 element (8 -> 4 -> 2 -> 1).
  • So, in the worst-case scenario (when the target value is at one of the ends of the sorted list or isn't in the list at all), the binary search algorithm would need to make $log(n)$ comparisons meaning that it is a member of $O(log(n))$.

Space Complexity

  • Our Haskell implementation uses tail recursion, which means that it doesn't need to maintain additional context for each recursive call. Consequently, it doesn't consume excessive stack space, keeping the space complexity to a minimum (constant $O(1)$ ), regardless of the size of the input list.
  • In an iterative, non-tail recursive implementation the space complexity would be $O(log(n))$.

Summary / Keypoints

  • Binary search is an effective search algorithm in terms of performance but does require the list to be pre-sort beforehand.
  • Whenever you see an algorithm perform an operation that appears to be "halving" something in constant time then it's a decent indicator that you might be dealing with time complexity $O(log(n))$.
  • Our implementation is tail recursive because once we find (or don't find) our target it returns a result immediately and does not need to travel back up the stack frame to handle any additional work. This is a significant optimization because our space complexity is reduced to constant space.
  • Currying is the process of transforming a function that takes multiple arguments into a sequence of functions, each with a single argument. In Haskell, all functions are curried by default. So a function type like Int -> Int -> Int is actually a function that takes an Int and returns another function of type Int -> Int

Masters Theorem

LINE1: The Master Theorem applies to recurrences of the form

$T(n) = aT\left(\frac{n}{b}\right) + f(n)$

Here,

$n$ is the size of the problem. $a$ is the number of subproblems in the recursion. $\frac{n}{b}$ is the size of each subproblem where all subproblems are assumed to be of the same size. $f(n)$ is the cost of the work done outside the recursive calls, which includes the cost of dividing the problem and the cost of merging the solutions.

LINE2: For binary search, we have:

$a = 1$, because each step divides the problem into one smaller problem.

$b = 2$, because we divide the problem into two halves at each step.

$f(n) = O(1)$, because the cost of comparison and halving the array is constant.

The Master Theorem states that the solution of the recurrence is of the form

$T(n) = O(n^{log_{b}(a)} log_{2}(n)) = O(log_{2}(n))$

Hence, the time complexity of binary search is $O(log(n))$, as desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment