torchlight/Bitonic merge sort correctness.md Secret

## Bitonic merge sort correctness.md

      
    Raw
  

              Bitonic merge sort correctness.md
            
          
    On the correctness of bitonic sort when sorting blocks of data

Introduction

Bitonic sort (aka "bitonic sorter", "bitonic merge sort", "bitonic sorting network") is a data-oblivious sorting algorithm introduced by K. E. Batcher in 1968. While having a higher comparator count than similar sorting networks (e.g. Batcher's odd-even merge sort), it has a very easily described structure on power-of-two-sized data.
While the odd-even merge sort is also correct when adapted to block sorting, the proof involves checking more cases and the structure of the sorting network is slightly more complicated, unless we include some redundant comparators, in which case the odd-even merge sort becomes as inefficient as bitonic sort.
Block sorting

In the stated problem, rather than a primitive operation of compare-and-swap, the primitive operation we have access to here is to sort two (arbitrary) blocks of data. In the extreme case where the block size is 1, this is equivalent to a usual sorting network.
The blocks are assumed to have the same size, except possibly the last block, which may be smaller than the rest.
The 0-1 principle

The 0-1 principle is an often-used theorem to simplify analyses of data-oblivious sorting algorithms and order statistic algorithms. Rather than considering all possible weak orderings of n elements (of which there are Θ(n! (1/log 2)ⁿ) cases; cf. ordered Bell numbers), the 0-1 principle asserts that checking the algorithm against sequences over the alphabet {0, 1} is sufficient to guarantee correctness for any data.
A standard proof of the 0-1 principle is a contrapositive proof: suppose there is a sequence a[0..n-1] that the comparator network fails to sort. Then there exist indices i and i+1 such that a[i] > a[i+1]. Let f be a threshold function, such that f(x) == (1 if x <= a[i] else 0). Applying f to the data gives a 0-1 sequence that the comparator network fails to sort. Taking the contrapositive, if a comparator network sorts all 0-1 sequences, it must sort all arbitrary sequences.
This carries over to block-sorting, as sorting two blocks of data can itself be performed with a sorting network, since the block sizes are known ahead of time.
Unless otherwise specified, we will henceforth restrict ourselves to considering sorting / block-sorting networks on data over the alphabet {0, 1} without any loss of generality.
Simplifications from padding

We say a comparator network is monotonic if, for every comparator (i, j), we have that i < j. Monotonicity allows the simplification of certain arguments through the use of padding, and an equivalent definition for monotonicity is that any sorted sequence is a fixed point of the comparator network.
Crucially, we note that a common representation of the bitonic sorting network is not monotonic (as the name suggests) and padding simplifications do not directly apply. Nevertheless, a monotonic version of bitonic sort will be described later.
Same-size blocks

If the last block is smaller than the other blocks, we may pad it with 1's. By monotonicity, for any block-comparator network, the padding will remain unchanged, so a block-sorting network on blocks all of equal size will also correctly sort if the last block is of smaller size. Henceforth, we will assume the blocks are all of equal size.
Out-of-bounds network reduction

Given a monotonic sorting network on N elements, for any 0 < n < N, we may obtain a sorting network on n elements by excluding the comparators referencing out-of-bounds elements; correctness follows from padding the n elements to N elements with 1's. The same argument applies for block-sorting networks. Some sorting network constructions based on the divide-and-conquer paradigm are much more easily described on power-of-two sizes, so this allows us to consider just the power-of-two-size cases.
Such reduced networks sometimes contain many redundant comparators—comparators that can be proven to have no effect after applying earlier comparators in the sorting network—but for polynomially-bounded sorting networks, this amounts to a constant-factor overhead and thus should not be much of a concern. (For sorting networks which are not polynomially bounded, replace the whole network with one that is polynomially bounded, like bitonic sort.)
The bitonic sort algorithm

Definitions

A 0-1 sequence is increasing if it is the concatenation of an all-0 sequence with an all-1 sequence, and likewise a 0-1 sequence is decreasing if it is the concatenation of an all-1 sequence with an all-0 sequence. Such sequences are collectively referred to as being monotonic if the distinction is irrelevant.
A 0-1 sequence is bi-increasing if it is the concatenation of an all-0 sequence, an all-1 sequence and an all-0 sequence; likewise, a 0-1 sequence is bi-decreasing if it is the concatenation of an all-1 sequence, an all-0 sequence and an all-1 sequence. Again, such sequences are collectively referred to as being bitonic.
Note that we allow the sequences being concatenated to be of zero length; consequently, any increasing sequence is both bi-increasing and bi-decreasing, as is any decreasing sequence. Furthermore, a bitonic sequence is always a cyclic shift of a monotonic sequence, and every cyclic shift of a monotonic sequence is bitonic.
Overview

Bitonic sort is essentially a merge sort with a data-oblivious merge routine. For a sequence with the first half sorted in one direction and the second half sorted in the opposite direction, the sequence as a whole is bitonic. The merge routine itself exploits the divide-and-conquer paradigm: at each step a bitonic sequence is partially sorted and then divided into two bitonic subsequences such that any element of the first subsequence is not greater than any element of the second subsequence, this routine recurses into itself until it reaches subsequences of length 2.
As it turns out, the odd-even merge sort can also be described as "essentially a merge sort with a data-oblivious merge routine", but the odd-even merge sort keeps both halves of the sequence sorted in the same order, and its recursive merge routine recurses into subsequences defined by the parity of the index.
Divide

(Disclaimer: This is probably not standard terminology; this stage is usually left unnamed.)
The bitonic divide comparator network for a bitonic input sequence a[0..2n-1] consists of the comparators (i, i+n) for i in range(n). Application of the comparator network divides the sequence into the subsequences a[0..n-1] and a[n..2n-1], where both subsequences are bitonic and either the first subsequence is all-0 or the second subsequence is all-1. Consequently, any element of the first subsequence is not greater than any element of the subsequence.
This may be verified by checking these two cases: (i) at least half of the elements are 0; (ii) at least half of the elements are 1. (The cases are not mutually exclusive.)
Case (i): at least half of the elements are 0

Visualise the sequence as two rows of n elements each. Any cyclic shift of the sequence will preserve the number of 1's in each column up to a cyclic shift of the columns, and since the sequence is bitonic, there exists a cyclic shift that makes the sequence increasing. With such a cyclic shift, and taking note that at most half of the elements are 1, there cannot be any column with more than one 1.
If the 1's in the sequence are all in the first row or all in the second row, then these 1's are necessarily contiguous, and applying the bitonic divide comparator network shifts all the 1's to the second row while retaining contiguity. Therefore the first row is all-0 and the second row is bi-increasing.
If the sequence is bi-increasing and has 1's in both rows, then the 1's are still contiguous, where the first row is bi-increasing and the second row is bi-decreasing. As each column has at most one 1, applying the comparator network shifts the 1's in the first row to the second row, whence the first row is all-0 and the second row is bi-decreasing.
Otherwise, the sequence is bi-decreasing, in which case the 1's at the start of the sequence stop at some point in the first row and all these 1's are shifted to the second row by the comparator network, leading to an all-0 first row and a bi-decreasing second row.
Case (ii): at least half of the elements are 1

The same argument as case (i), but conjugated with swapping 0 and 1, and reversing the sequence.
Merge

The bitonic merge algorithm for a bitonic sequence of length 2^m is to apply a bitonic divide to the whole sequence, then recursively apply a bitonic merge to both halves of the sequence. As the halves do not interact with each other, these merges can be done concurrently. The base case m = 0 should be taken to be a no-op.
The output of the merge algorithm is necessarily sorted. To see why this is the case, each time a bitonic divide is applied, half of the sequence is constant and the other half is bitonic, with the additional postcondition that the first half is less than the second half. Whatever happens to the constant half is irrelevant since it will remain constant, and we effectively halve the size of the bitonic half until it becomes a sequence of length one, at which point the whole sequence is sorted.
Note that it is not possible to exploit the fact that half of the sequence after the bitonic divide is applied is constant in order to reduce the number of comparisons needed, as that would make the algorithm not data-oblivious, in which case the 0-1 principle does not apply. The seemingly redundant comparators on the constant-value half are actually not redundant on general inputs not over the alphabet {0, 1}.
Sort

With the base case m = 1 being a no-op, for an arbitrary sequence of length 2^m, the bitonic sort algorithm consists of the following steps:

Sort the first half in ascending order.
Sort the second half in descending order.
Apply a bitonic merge.

The first two steps may be run concurrently, as they operate on disjoint subsequences. As the inductive hypothesis, suppose that after the first two steps, the first half of the sequence is increasing and the second half is decreasing; then the whole sequence is bitonic (specifically, bi-increasing), so we may apply a bitonic merge to end up with a fully sorted sequence. By induction, the bitonic sort algorithm sorts all 0-1 sequences with power-of-two lengths.
The second step requires the list to be sorted in the opposite order, which forces the bitonic sorting network to fail to be monotonic. However, we have thus far only defined the bitonic sort on power-of-two sizes, and the lack of monotonicity in the comparators prevents the usage of padding tricks. This may be overcome by reversing certain sections of the sorting network and exploiting the fact that the reverse of a bitonic sequence is still bitonic, but it may be clearer to explicitly describe a modified bitonic sort where the comparators are monotonic.
Modified divide/merge/sort

The m-divide comparator network for an input sequence a[0..2n-1] where the subsequences a[0..n-1] and a[n..2n-1] are sorted is given by the comparators (i, 2n-1-i) for i in range(n). Application of the network results in the same postconditions as the original divide comparator network (each half is bitonic and the first half is not greater than the second), as this is just the conjugation of the original divide comparator network with reversing the second half of the sequence.
The m-merge algorithm, again with the precondition that the two halves of the sequence are sorted, consists of applying the m-divide comparator network to the whole sequence, and then applying the (original) bitonic merge algorithm to each half. This results in a sorted sequence.
To conclude the section, the modified bitonic sort algorithm applies itself to both halves of the input, then applies the m-merge algorithm to the whole sequence. This results in a sorted sequence for any input, and as the comparator network is monotonic, it directly generalises to non-power-of-two sizes.
Generalising to blocks

The definition of monotonicity and bitonicity need to be slightly revised when considering block-sorting. We say that a block-sequence is increasing if it is sorted, and that it is decreasing if reversing the order of the blocks makes it sorted. Likewise, a bi-increasing block-sequence is the concatenation of an increasing block-sequence with a decreasing block-sequence, and a bi-decreasing block-sequence is the concatenation of a decreasing block sequence with an increasing block-sequence.
We need only show the correctness of the bitonic divide, as every other part depends on the correctness of divide, but not on the comparators working element-wise.
Let a[0..2bn-1] be a bitonic block-sequence (with block size b) and, without loss of generality, suppose that the majority of elements are 0. (As in the above analysis, if the majority of the elements are 1, we may conjugate with reversing the order of the blocks and swapping 0 and 1.) Arranging the elements in two rows of n blocks, we may apply the same argument as before to show that for any two blocks in the same column, the total number of 1's among the blocks must be at most n. Sorting by block-columns then results in an all-0 first row. To show that the second row is bitonic when considered as a block-sequence, consider the following cases.
Case (i): all the 1's are in the same row

If the 1's are in the second row, then since any truncation of a bitonic block-sequence is still bitonic, the second row must be bitonic. If the 1's are in the first row, then applying the bitonic divide effectively swaps the two rows, again resulting in a bitonic second row.
Case (ii): block-sequence is bi-increasing with 1's in both rows

Call a block nonconstant if it is neither all-0 nor all-1. By the definition of bitonicity and the stated assumption, there can be at most one nonconstant block in each of the two rows. If the two nonconstant blocks are in the same column, then applying the divide algorithm collates all the 1's in that block-column into one nonconstant/all-1 block with all the other blocks in the second row being all-1; thus the second row is bitonic.
If the two nonconstant blocks are in different columns or there are fewer than two nonconstant blocks, then the second row will consist of a subsequence of all-1 blocks, an optional nonconstant block, a subsequence of all-0 blocks, another optional nonconstant block, followed by a subsequence of all-1 blocks. This is again bitonic.
Case (iii): block-sequence is bi-decreasing with 1's in both rows

As above, if the two nonconstant blocks are in the same column, then applying the divide comparator network reduces the second row to being all-1 with the exception of one block possibly being nonconstant. Otherwise, the second row will be reduced to being bi-decreasing.
Other than the above three cases, we also need the divide comparator network to work correctly when n = 2. Unlike the element-wise sorting case, it is not the case that a single block is necessarily sorted. However, by definition, a block-comparator sorts the two blocks, and so the output still satisfies the same postconditions of both halves being bitonic and any element in the first half being not greater than any element in the second half.
A nonrecursive representation of the bitonic sorting network

As a notational convention, exponentiation will be represented with a caret (^) and bitwise xor will be represented with a circled plus (⊕).
Rather than a top-down, recursive view of the algorithm, we may also consider this bottom-up view, where the (modified) bitonic sort on 2^m elements consists of an m-divide over chunks of 2 elements, followed by an m-divide over chunks of 4 elements and a divide over chunks of 2 elements, followed by an m-divide over 8 elements, a divide over chunks of 4 elements and a divide over chunks of 2 elements, et cetera.
The m-divide over chunks of 2^k elements compares an index i with q * 2^k + (2^k-1) - r, where q and r are the quotient and remainder of dividing i by 2^k. This may be rewritten as simply i ⊕ (2^k-1). Similarly for the divide over chunks of 2^k elements, the element at index i is compared with q * 2^k + r + 2^(k-1) if r < 2^(k-1) and q * 2^k + r - 2^(k-1) otherwise, which can also be rewritten without a conditional as i ⊕ (2^(k-1)). We may then, for each stage, go through the indices one at a time and emit a comparator if the indices are within bounds. To wit, in Python:
def generate_bitonic_stage(n, i, j):
    # n is the total number of elements (or blocks); i and j specify the current
    # stage of the comparator network.
    chunk_size = 1 << (i-j)
    opp = (chunk_size - 1) if j == 0 else (chunk_size >> 1)
    for x in range(n):
        y = x ^ opp
        if not x < y < n:
            # if x > y, we have already emitted the comparator.
            # if y >= n, the comparator is out of bounds and may be pruned.
            continue
        yield (x, y)
At this point, to generate the whole sorting network, we need only iterate over all the stages.
def generate_bitonic_sort(n):
    if n < 2:
        return
    logn = (n-1).bit_length() # equivalent to ⌈log_2(n)⌉
    for i in range(1, logn+1):
        for j in range(i):
            yield from generate_bitonic_stage(n, i, j)
The same comparators may be used for block-sorting to produce a sorted sequence, as earlier demonstrated.
Did you really just spend over three thousand words to introduce fifteen lines of code?

Yes; yes I did. You're welcome.
References

Lang, H. W. (1998). Sequential and parallel sorting algorithms. Retrieved from http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/bitonic/bitonicen.htm.