Skip to content

Instantly share code, notes, and snippets.

# avinoamr/radix.go Created May 19, 2016

Radix LSD sort implementation excercise written in Go.
 /** * Radix LSD sort implementation excercise written in Go. * * A Radix sort is a non-comparative integer sorting algorithm that sorts data * with integer keys by grouping keys by the individual digits which share the * same significant position and value. Instead of using complex logical * comparison operations, which are relatively costly, the Radix sort uses * binary operations to compare single keys in binary order. In this * implementation, the keys are generated arbitrarly by the caller, and since * they represent order-preserving binary characters, they can be converted to * integers and compared as such, digit by digit. * See: https://en.wikipedia.org/wiki/Radix_sort * * Every iteration scans the entire array (N), and there are as many iterations * as the maximum number of digits in the keys (K). Therefore, Radix sort takes * O(NK) operations. When Log N > K, it will have better theoretical performance * then the best known sorting algorithms (O(NLogN)). Since K is the number of * digits, it's capped at 21 (uint64) in the worst case. Thus Log N > K = 21 * when N > 2,097,152. Of course, it's possible (yet not implemented here) to * radix-sort using a variable length byte array that goes beyond uint64 in * size. In this implementation, keys are ints (up to 64), so K = 20, Log N > K * when N > 1,048,576. * * Beyond the theoretical performance, engineering-wise, Radix sort has the * benefit of reducing the number of CPU cycles due to its use of binary * operations instead of logical comparisons. It's also better suited for cache * memory optimization because only the key is required in the sort itself, * instead of the entire sorted structure. * * To exploit the latter, we don't sort the full input array as it's likely to * contain elaborate structures that don't participate in the sort as all we * care about is the key. Instead, we sort just key-value pairs, where the value * is a pointer (or index) back into the original array in order to complete the * sort. However, this approach may be counter-productive on some cases, see * below * * NB: After implemention, it becomes clear that redix sort, and its internal * count sort consume a large amount of memory, because neither can be done in * place. This may undermine the aforementioned potential of better cache * memory utilization. */ package main import "fmt" /** * RadixKeyer is an interface used by the callers to describe types that can be * sorted by Radix sort. It's RadixKey() function must return an order * preserving binary string, represented as an integer, the digits of which will * be conceptually compared to the digits of the other keys to determine the * sort order */ type RadixKeyer interface { RadixKey() int } // Key-value pairs that are actually being sorted, in order to utilize cache // memory better. We don't keep the value itself, but rather a pointer (or // index) to it. Initially, we using a pointer, but since we don't need to cover // the entire memory space, we can use the smaller uint // 9-byte (instead of 13 when using pointers) // // NB: perhaps we can use smaller keys and value indices based on user input? type SortPair struct { key int // 4-bytes vidx uint // 4-bytes // historical artifact, kept here just for discussion. // vptr *RadixKeyer // 8-bytes unused // temporary reusable space for memoizing the computation of the // current digit. // TODO: Reconsider. See countSort. digit uint8 // 1-byte } // Sorts the input array of Key Encoders by first reading the keys into a thin // key-value array of values, then sorting this array, digit by digit (LSD) with // count sort. Finally, the sorted pairs are used to sort the original array. func RadixSort(arr []RadixKeyer) []RadixKeyer { // in order to utilitize cache better, we don't want to sort the original // array as it may includes large values that don't participate in the sort // and will just consume cache memory space. A thinner array of key-value // pairs is used instead, where the value is just a pointer to the original // value. A fortunate side-effect is that we can compute the encoded key // once - instead of on every iteration. The down-side is that we pay in // additional main memory: 13-bytes per item (see comment in SortItem // type definition). // // In fact, since a pointer has to cover the entire memory space, it's // likely to requires more values than we actually need to retrieve the // value (assuming the input array fits in memory - otherwise, external // sorting is in order). So instead of using a pointer, we can use an index // into the original array, reducing the memory overhead from 13-bytes, to 9 // // NB: In case of simple, small primiative types (like ints, bools, etc.) // this representation will actually use up more cache memory. One var key int var N = uint(len(arr)) var max = arr.RadixKey() var items = make([]SortPair, N) for i := uint(0) ; i < N ; i += 1 { key = arr[i].RadixKey() items[i] = SortPair{key, i, 0} // calculate the max key while we're at it if key > max { max = key } } // count sort iteratively for each significant-digit (in increasing // order; LSD). Instead of passing in the actual digit number, which will // require us to count digits of every key, we use powers of 10: 10, // 100, 1000, ... until we reach a number that's greater than our maximum // key. Another benefit is that the countSort can just divide by that number // to extract the digit value of each key in the pair. for expr := 1 ; expr < max ; expr *= 10 { countSort(items, expr) } // at this point our key-value pairs are sorted, so we need to finally sort // the input array in the same order. Radix LSD can't be done in-place, so // we must use an auxiliary output array for the result aux := make([]RadixKeyer, N) for i, _ := range items { aux[i] = arr[items[i].vidx] } // finally, we can copy over the original array and free-up the auxilliary // array. copy(arr, aux) return arr } // count sort sorts the pairs array on a single digit, represented here as a // power of 10 in the input expr. Using the power of 10 simplified extracting // the digit value of the key in the pairs by simple division and modulo. // See: https://en.wikipedia.org/wiki/Counting_sort func countSort(pairs []SortPair, expr int) { // we use an auxiliary array of counters, one element per digit 0-9 counts := int{} // count the occurrences of each digit for i, _ := range pairs { // extract the current digit of the key in each pair and memoize it to // eliminate the need to compute it again later. Memoization saves 2 // additional operations (1 division + 1 modulo) per item during the // repositioning phase, at the expense of additional 8-bits per item. // // NB: 4-bits should suffice to represent a single digit (9 = 1001). So // we could look for a data structure that would optimize that. // // NB: instead of saving it on the pairs themselves, we could use // another auxiliary array of uint8 that will be used only in this scope // thus maybe saving some of the extra memory during operations outside // of this function. Seems negligible. pairs[i].digit = uint8(pairs[i].key / expr % 10) // increment the number of occurrences of that digit counts[pairs[i].digit] += 1 } // currently, for each digit in the counter we have the numebr of times it // appears in the key. We want to repurpose this counters array such that // each digit will instead map to its location in the final output array. // we observe that if there are N keys at digit 0, digit 1 will start at // index N, since the previous N indices will be populated by these keys. So // on for the rest of the digits. In order to achieve this representation, // we want to increase the of each digit counter by the sum of all counters // that preceded it. AKA Prefix sum, or Cumulative sum. // See: https://en.wikipedia.org/wiki/Prefix_sum sum := 0 // cumulative sum of all counters at this point. for i := 0 ; i < 10 ; i += 1 { // save the current value. count := counts[i] // set the new value to the current cumulative sum counts[i] = sum // increase the cumulative sum of the next digit. sum += count } // reposition the items in their correct location in the output // count-sort can't be done in-place while keeping it a stable sort. The // stabilitiy characteristic is required for the iterative approach used by // radix sort. output := make([]SortPair, len(pairs)) for i, _ := range pairs { // read the current position of the digit value of this key, and use it // as the index at which to place the pair. output[counts[pairs[i].digit]] = pairs[i] // increment the that position in order for the following pairs with the // same digit to not overwrite this pair. counts[pairs[i].digit] += 1 } // copy the output back onto the items list copy(pairs, output) } // ------ test & play // // // func main() { arr := []RadixKeyer{Int(7), Int(11), Int(5), Int(1)} fmt.Println(arr) RadixSort(arr) fmt.Println(arr) } // alias to int that implements the RadixKeyer interface by encoding each int to // itself. Construct a similar key encoding function for every type to be sorted type Int int func (i Int) RadixKey() int { return int(i) }
to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.