Instantly share code, notes, and snippets. hschafer/BinarySearch - 1-28-19.md Created Jan 28, 2019

In lecture, we talked about the idea of the binary search algorithm, but did not end up implementing it. This reading will look at the implementation for the algorithm since it's not uncommon to be asked to implement the code for this algorithm on an interview for a job involving programming. The end of the reading is a reference of the complexities of the operations we have learned so far

Binary Search

Review: Algorithm Idea

In lecture, we discussed how to search a sorted array for a value. The naive algorithm that just starts at the beginning and traverses to the end or until it finds the element. With the language we discussed in lecture, we would say that worst case this "linear search" would take O(n) time where n is the number of elements in the array.

We then discussed a smarter idea which divides the array in half repeatedly. Start by looking at the middle element of the array and compare it to the target value, there are three options

search(20)

[2, 7, 9, 9, 13, 14, 16, 20, 27, 36, 51]
|--------------|     |-----------------|
less than mid    ^    greater than mid
mid
• The middle element is the same as the target: you can return the index you found
• The middle element is larger than the target: you can eliminate the right half of the array because the array is sorted and all elements to the right of the middle will be larger than the target
• The middle element is smaller than the target: you can eliminate the left half of the array because the array is sorted and all elements to the left of the middle will be smaller than the target

In the latter two cases, we now only have half the array to search and can repeat the same process using the start and endpoint of the remaining half as the start and end of the range to search. We repeat this process until we find the target value at the mid-point or once the search range has no elements left.

Implementation

public static int binarySearch(int[] arr, int target) {
// Start looking at whole array
int low = 0;
int high = arr.length - 1;

// while low and high haven't met yet, there are still more indices to search
while (low <= high) {
// compute middle index and value
int mid = (low + high) / 2;
int midValue = arr[mid];

if (midValue < target) {
low = mid + 1;
} else if (midValue > target) {
high = mid - 1;
} else {
return mid; // value found!
}
}
}

Java's implementation of binarySearch has the added benefit of not returning -1 if the value is not found, but rather indicates where the element "should go" if inserted. To do this it returns the negative of the index that the value should be inserted at.

return -(low + 1);

Trace of the algorithm running

Consider the example above

search(20)

values  [2, 7, 9, 9, 13, 14, 16, 20, 27, 36, 51]
indices  0, 1, 2, 3,  4,  5,  6,  7,  8,  9, 10

The following is a trace of all the decisions made by the algorithm

--------- Iteration 1 ---------
low = 0, high = 10
Is low <= high? Yes
mid = 5
midValue = 14

Is midValue < target? Yes
low = 6

--------- Iteration 2 ---------
low = 6, high = 10
Is low <= high? Yes
mid = 8
midValue = 27

Is midValue < target? No
Is midValue > target? Yes
high = 7

--------- Iteration 3 ---------
low = 6, high = 7
Is low <= high? Yes
mid = 6
midValue = 16

Is midValue < target? Yes
low = 7

--------- Iteration 4 ---------
low = 7, high = 7
Is low <= high? Yes
mid = 7
midValue = 20

Is midValue < target> No
Is midValue > target? No
Else, return 7

Which gets us the value we wanted. Another way to see this is to visualize the ranges being considered in the visualization of the same trace below

Complexity

Unlike the naive solution that searches every index, the binary search algorithm is much faster because it eliminates half the array on each iteration. As discussed in lecture, this makes the complexity of the algorithm O(log(n)).

Data Structure Complexity

This section reviews the discussion of the complexities of the operations on ArrayList and LinkedList. You generally don't memorize this table, but rather know how the methods are implemented to figure out what the run-time is. For example for LinkedList.remove(index), we know we first have to traverse to the right index (worst case O(n)) and then change the references at that node to remove the value (worst case O(1)) for a total run-time of O(n).

Where the run-time complexities for the methods we implemented differs from Java's for the table, we show the complexity of our implement / complexity for Java's

add(value) O(1)* O(n) / O(1)
add(index, value) O(n) O(n)
get(index) O(1) O(n)
indexOf(value) O(n) O(n)
contains(value) O(n) O(n)
remove(index) O(n) O(n)

* Most of the time!

Looking at this table, it seems like LinkedList is always the worse of the two structures, so why do we ever use LinkedList? In certain cases, like being treated as a Queue, it is much faster. For operations a Queue needs, we can see the LinkedList is superior.