The performance of modern software can be bewildering. Tiny bits of plastic and wire routinely astound, outperforming the supercomputers of yesteryear. Far from a rarity, our systems make such a feat the default. We trust things to be fast, considering almost any delay a fault. We are all told about the evils of premature optimization.
This performance largely comes from leveraging a small number of important optimizations. When people hear about “parallelism” in the context of computer hardware, they think about multithreaded programs running on multicore multiprocessors. The performance of code that isn’t so “multi” relies just as much on parallelism. When a processor runs sequential user code, operations that don’t depend on each other will run at the same time.
In order to avoid having unused hardware idling while drawing power, code has to be reordered. There always must be enough work to do in parallel, or performance drops very quickly. The rea