The book provides this first implementation, which as written traverses each element past i
two times per recursion:
def qsort(arr):
if len(arr) < 2:
return arr
pivot = arr[0]
less = [i for i in arr[1:] if i <= pivot]
# NOTE: this traverses all N elements twice in the list.
greater = [i for i in arr[1:] if i > pivot]
return qsort(less) + [pivot] + qsort(greater)
I wanted to optimize this, to reduce the total number of reads, so I did it the following way and added in some tracking to help visualize the output.
The key difference is here, where it traverses the list items only one time per recursion:
less = []
greater = []
for idx, value in enumerate(arr):
state["touches"] += 1
if not idx == random_index:
if value <= pivot:
less.append(value)
else:
greater.append(value)
Full listing:
# Use a random array index, as per the book's ultimate advice for reducing the number of reads
import random
def printp(pad, message):
padding = " " * pad
print(f"{padding}{message}")
def qsort(arr, parentRecursion=0, state=None):
if state is None:
state = {'recursion':0, 'touches':0, 'initialPivotIndex': None}
print("\n")
recursion = state['recursion'] + 1
state['recursion'] = recursion
if (recursion > 1):
printp(parentRecursion, f"Recursion: parent # {parentRecursion}; total #: {recursion}")
else:
printp(parentRecursion, f"Entering function for first time! (parentRecursion = {parentRecursion})")
arr_len = len(arr)
# This is obvious. If the length is 1 or 0, it's already sorted. Nothing to do
if arr_len < 2:
printp(parentRecursion, f"Found arr of 1 or fewer elements: {arr}")
return (arr, state)
else:
printp(parentRecursion, f"Found arr with > 1 elements: {arr}")
random_index = random.randint(0, arr_len - 1)
# Just take the 0th item in the list in this implementation, since we know
# there are at least indices 0 and 1 available at this point
pivot = arr[random_index]
if state['initialPivotIndex'] == None:
state['initialPivotIndex'] = random_index
printp(parentRecursion, f"the pivot arr[{random_index}]: {pivot}")
# Count the touch here
state['touches'] += 1
# Pot the the i values of the entire array in appropriate division
less = []
greater = []
for idx, value in enumerate(arr):
state["touches"] += 1
if not idx == random_index:
if value <= pivot:
less.append(value)
else:
greater.append(value)
printp(parentRecursion, f"touches after array division: {state['touches']}")
# recursively call qsort, and use array concatenation to combine to final sorted list
printp(parentRecursion, f"{less} + [{pivot}] + {greater} (less + pivot + greater)")
return (qsort(less, recursion, state)[0] + [pivot] + qsort(greater, recursion, state)[0], state)
result = qsort([10, 20, 30, 40, 50, 60, 70, 80])
print(f"Final result: {result}")
A few sample runs, redacted after the first one:
pivot = 6 => 9 recursions
Entering function for first time! (parentRecursion = 0)
Found arr with > 1 elements: [10, 20, 30, 40, 50, 60, 70, 80]
the pivot arr[6]: 70
touches after array division: 8
[10, 20, 30, 40, 50, 60] + [70] + [80] (less + pivot + greater)
Recursion: parent # 1; total #: 2
Found arr with > 1 elements: [10, 20, 30, 40, 50, 60]
the pivot arr[4]: 50
touches after array division: 14
[10, 20, 30, 40] + [50] + [60] (less + pivot + greater)
Recursion: parent # 2; total #: 3
Found arr with > 1 elements: [10, 20, 30, 40]
the pivot arr[2]: 30
touches after array division: 18
[10, 20] + [30] + [40] (less + pivot + greater)
Recursion: parent # 3; total #: 4
Found arr with > 1 elements: [10, 20]
the pivot arr[1]: 20
touches after array division: 20
[10] + [20] + [] (less + pivot + greater)
Recursion: parent # 4; total #: 5
Found arr of 1 or fewer elements: [10]
Recursion: parent # 4; total #: 6
Found arr of 1 or fewer elements: []
Recursion: parent # 3; total #: 7
Found arr of 1 or fewer elements: [40]
Recursion: parent # 2; total #: 8
Found arr of 1 or fewer elements: [60]
Recursion: parent # 1; total #: 9
Found arr of 1 or fewer elements: [80]
Final result: [10, 20, 30, 40, 50, 60, 70, 80]
More results, just showing the initial pivot and total number of recurions and touches based on that initial pivot. Keep in mind that since we are only tracking the initial pivot, there will still be variation in the total touches
count. However, the number of recursions should always stay the same, I think. (TODO: verify)
{'initialPivotIndex': 0, 'recursion': 11, 'touches': 30})
{'initialPivotIndex': 1, 'recursion': 11, 'touches': 26})
{'initialPivotIndex': 2, 'recursion': 11, 'touches': 25})
{'initialPivotIndex': 3, 'recursion': 11, 'touches': 24})
{'initialPivotIndex': 4, 'recursion': 11, 'touches': 24})
{'initialPivotIndex': 5, 'recursion': 13, 'touches': 30})
{'initialPivotIndex': 6, 'recursion': 11, 'touches': 28})
{'initialPivotIndex': 7, 'recursion': 11, 'touches': 29})
- In the output notice the parent recursion is listed followed by the total recursion count.
- The sample uses
None
as the default value forstate
because default parameters in Python functions are essentially "static". We want to avoid shared state between top-level invocations.
- The sample uses
- I idented by the parent count to get a clear sense of how the control flow occurs in relation to the total number.
- This allows you to see how the
less
array in the final, recursive functions statement,return qsort(less, recursion, state) + [pivot] + qsort(greater, recursion, state)
gets broken down all the way before we see any logging of what happens to thegreater
array. - For example in the case of pivot = 6 we can just 9 recursions, examine the relationship of each recursion to its parent recursion number. The easiest relationship to see is the "deepest" one, which shows a maximum depth of 4, where both 5 and 6 were children of the 4th recursion:
0 | 1 => 0 | * 2 => 1 | ** 3 => 2 | *** 4 => 3 | **** 5 => 4 | **** 6 => 4 | *** 7 => 3 | ** 8 => 2 | * 9 => 1
- This allows you to see how the