Considering that make_heap
runs in O(n) time, the function sort_heap
dominates the complexity when sorting a
sequence from scratch, so it naturally deserves the most attention when trying to find optimizations. I did not manage
to find a better algorithm that did not use extra memory for the job, but I was nevertheless able to obtain a 8~50%
speedup by optimizing bit_floor
with compiler intrinsics.
The bit_floor
implementation shown previously runs in O(log k) time when computing the bit floor of a k-bit unsigned
integer. Using intrinsices it is possible to compute the bit floor in O(1) by computing the position of the highest set
bit and shifting a single bit to that position. The algorithm below shows how to compute the bit floor of an unsigned