Last active
September 24, 2015 17:08
-
-
Save oskarth/f06156fcc7651fe1bb5e to your computer and use it in GitHub Desktop.
Question on Uniform in-place shuffle of array in C - stats vs empirical results
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <stdio.h> | |
#include <stdlib.h> | |
#include <time.h> | |
/* | |
We want to do an in-place uniform shuffle of an array. | |
Claim: with three elements, we have 3 calls to get_random(0, 2). Three random | |
choices and three possibilities. Total number of possible sets of choices is | |
3*3*3=27 How many possible outcomes? 3! = 6. 27 not evenly divisible with 6, so | |
not evenly distributed. | |
But empirically: | |
> gcc shuffle.c && ./a.out > out | |
> sort out | uniq -c # <30 seconds on my MBA | |
166034 1 2 3 # low | |
167677 1 3 2 # high | |
166060 2 1 3 | |
166878 2 3 1 | |
166699 3 1 2 | |
166652 3 2 1 | |
# abs(166034 - 167677) = 1643 | |
... | |
166741 1 2 3 | |
166732 1 3 2 | |
166980 2 1 3 # high | |
166917 2 3 1 | |
166212 3 1 2 # low | |
166418 3 2 1 | |
# abs(166212 - 166980) = 768 | |
Which strikes me as being uniform, both due to the low variance and the fact | |
that the high and low isn't the same. | |
Question: What is wrong? The claim, the method, or the empirical conclusion? | |
TODO for OP: Read about Fisher-Yates_shuffle proof | |
*/ | |
void swap(int *xp, int *yp) { | |
int temp; | |
temp = *xp; | |
*xp = *yp; | |
*yp = temp; | |
} | |
int get_random(int floor, int ceiling) { | |
int range, rand_range, r; | |
range = (ceiling - floor) + 1; | |
rand_range = rand() % range; | |
r = floor + rand_range; // [floor, ceiling] | |
return r; | |
} | |
void shuffle(int arr[], int n) { | |
int i, r; | |
for (i = 0; i < n; i++) { | |
r = get_random(0, n-1); | |
swap(&arr[i], &arr[r]); | |
} | |
} | |
void print_array(int *arr, int n) { | |
int i; | |
for (i = 0; i < n; i++) { | |
printf("%d ", arr[i]); | |
} | |
printf("\n"); | |
} | |
int main() { | |
int i, n; | |
int foo[3] = { 1, 2, 3}; | |
srand(time(NULL)); // seed: only call once for uniformity | |
n = sizeof foo / sizeof(foo[0]); | |
for (i = 0; i < 1; i++) { | |
shuffle(foo, n); | |
print_array(foo, n); | |
} | |
return 0; | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Minor fix: i < 1000000 in on line 82.
Problem was not resetting array, which made count uniform.