Heaps are an implementation of the priority queue abstract data type:
insert(entry, priority)
.priority
is an integer priority. Entries with "higher priority" will be extracted before entries with "lower priority."- Lower number often is "higher priority." This makes sense because the priority is often a "cost."
- For instance: the distance to a city.
extract
: Removes the element with with highest priority.update_priority(entry, priority)
: change priority of an already added entry.
- Priority queue can be used for scheduling work or threads.
- Want to run highest priority thread.
- But new threads are being started concurrently, and existing threads' priorities may be changing.
- Dijkstra's algorithm.
- Want to find shortest paths to cities.
- Priority is "distance" to the city. But as you "explore" the graph, you discover shorter paths to the cities.
- Keep track of
k
-th minimum in a stream. - Broadly: any problem where the "best" thing to be doing may change over time.
- Self-balancing binary search tree allows logarithmic
insert
,remove
, and access to smallest element. - Keeps things entirely sorted, which is more than you need for a priority queue.
- We'll see a PQ implementation called a binary heap which has some
advantages:
- The binary heap will have worst-case logarithmic
insert
andremove.
- The binary heap will have
O(1)
peek
to look at the highest priority entry. A BST can be modified to do this though... - Binary heap will have average case
O(1)
insert
though! This is a major advantage over BSTs. - The binary heap will not need to make nodes with pointers to heap memory. This is more cache efficient and a big advantage.
- The binary heap will have worst-case logarithmic
- Binary heap implementation is a complete binary tree. All interior levels are totally filled in. Last level is filled in left-to-right.
- The heap property is that the entry of a parent node always is higher priority than the entries in its children.
- A left child may have higher priority than the right child, or vice versa. This is the difference from BST.
- Insert: do
heapify_up
.- Append node, swap with parent entry if the new entry has higher priority.
- Repeat.
- Worst case:
O(log n)
swaps. - Average case:
O(1)
.- Hand-wavy argument: to move up from bottom level, an entry must be higher priority than 50% of current entries.
- Extract: remove root entry, replace with last leaf node element.
- Do a
heapify_down
to push down this element until it is higher priority than both its children. - Worst case:
O(log n)
. - Why do we need to first move the last node to the root?
- Answer: This avoids "holes" in the complete tree.
- Do a
- All
(entry, priority)
pairs live in an array. - First element is root. Second element is left child, third is right child.
- Fourth is left child of left child. Sixth is left child of right child.
- Simple calculation to calculate child and parent positions from
idx
. - This only works with a complete tree.
decrease_key
: just throw a hash map in front to map entries to index positions.
- Just add all the items one-by-one, then extract one-by-one.
- Clearly worst case is
O(n log n)
time. - You can do this in place. Just use the same array as the heap.
- When you extract one-by-one, just place at the end of the array each
time.
- You can either use a max heap, or just reverse at the end.
- Downside: not particularly cache friendly. Thus quicksort is often
a better choice.
- Especially hurts for external sorting.
- No pointers. Better cache behavior. Less overhead.
- Faster expected insert or
update_priority
time.- Graph algorithms often make a lot of
update_priority
calls.
- Graph algorithms often make a lot of
- Can be used for in-place sorting.
- If you want to build a heap from
n
elements, you can just insert one-by-one. - This is
O(n log n)
worst case. But on average it isO(n)
by then reasoning described above. - Floyd's algorithm ensures worst case
O(n)
building. - You put all your items in an array. Say this has
k
levels: the last is levelk-1
. - You can consider every element in level
k-2
as a 2-level heap. - Swap to fix these 2-level heaps.
- Then you can move up to level
k-3
. You need to fix a 3-level heap. - At level
k-1
there aren/2
nodes. Levelk-2
hasn/4
nodes. Levelk-3
hasn/8
nodes. - At level
k-i
you may have to push down up toi - 1
levels. - Consider infinite sum of all terms
k * (1/2**k)
. This is bounded by a constant. - Thus
O(n)
overall. - You start from bottom and "sift up".
- Binomial Heap
- Amortized
O(1)
insert
. Not justO(1)
average. - Logarithmic
peek
though.
- Amortized
- Fibonacci Heap
- Worst case
O(1)
insert. extract
is amortizedO(log n)
; before was worst case logarithmic.
- Worst case
- Neither is very practical because adds a lot of overhead.