Skip to content

Instantly share code, notes, and snippets.

@teryror
Last active June 19, 2018 11:54
Show Gist options
  • Save teryror/183ecb324f8222c716d8cc1509c97d8b to your computer and use it in GitHub Desktop.
Save teryror/183ecb324f8222c716d8cc1509c97d8b to your computer and use it in GitHub Desktop.

Random Sampling Without Replacement

Last time we talked about rolling unbiased virtural dice, now let's turn to opening virtual booster packs in a collectible card game. If you're a nerd like me, you may be interested in reading about the specifics of how cards are distributed in Magic: The Gathering and Yu-Gi-Oh!, but our main concern here is to randomly select n items from an array of N options. I'm told that it's also useful in scientific computing or whatever.

Luckily, The Art of Computer Programming Volume 2: Semi-Numerical Algorithms has got us covered, with what Knuth unimaginatively calls Algorithm S, which works by iterating over the source array, and, for each item, generating a random number to decide whether or not to include it in the sample:

#define Inc(P, S) (P) = (void *)((uint8_t *)(P) + (S))

void rand_select(void * src, int N,
                 void * dst, int n,
                 size_t elem_size)
{
    assert(0 < n); assert(n <= N);
    
    while (n > 0) {
        uint32_t r = randr(N);
        if (r < n) {
            memcpy(dst, src, elem_size);
            Inc(dst, elem_size); n--;
        }
        
        Inc(src, elem_size); N--;
    }
}

In my opinion, this is fairly straightforward and intuitive; I've even been known to type it inline from memory. It has some problems, though, like the O(N) runtime and the sheer number of random samples it requires.

We can do better, as shown by Jeffrey Scott Vitter in his paper Faster Methods for Random Sampling and the follow-up An Efficienet Algorithm for Sequential Random Sampling. The main idea here is to generate a random number S, which is the gap between selected items, the crux being its probability distribution. He outlines a few different algorithms for doing this; the one he recommends for general use he calls Algorithm D. Thinking about this problem too hard seems to be really bad for the creative mind.

His benchmarks were made on an IBM 3081 and are over 30 years out of date now, so I figured I'd give this a spin myself. I will test his algorithms A, C, and D (both sub-variants), as well as the implementation of Algorithm S shown above. My benchmarks will run on a Lenovo ThinkPad running Windows 10, with an Intel i5-7200U and 8GB of RAM. The code was compiled with MSVC version 19.13.26131.1 on optimization level 2. Here's what I found:

  • For small values of N, (less than 2000 or so), Algorithm A is fastest, irrespective of the value of n.
  • While he recommends the first variant of Algorithm D, the second one seems to be faster on modern machines, at least for the majority of inputs I tried.
  • Algorithm C is blazing fast for very small n, but unreasonably slow for large n (so much so that I included it only in my first round of tests). I suspect that's because it's the only algorithm that runs in O(n*n), though my implementation could be buggy as well.

As a result, I recommend the second variant of Algorithm D, with an additional check to process small inputs with Algorithm A. Another optimization we can do is to check for n == N, in which case we can simply copy all remaining items. This is a relatively rare case, but it should accelerate sampling when n / N approaches 1, which all algorithms I tested do poorly with. Here's my implementation:

#define randf() ((double)rand() / (double)(1ULL << 32))
#define AlphaInverse 12

void rand_select(void * src, int N,
                 void * dst, int n,
                 size_t elem_size)
{
    assert(0 < n); assert(n <= N);
    if (N <= 2048) goto method_a;
    
    { // Method D:
        double V_prime = log(1.0 - randf());
        double q1 = N - n + 1;
        int threshold = AlphaInverse * n;
        
        while (n > 1 && threshold < N) {
            double q2 = (q1 - 1) / ((double)N - 1);
            double q3 = log(q2);
            
            int S; double LHS, y;
            do {
                // Generate U and X:
                for (;;) {
                    S = (int)(V_prime / q3);
                    if (S < q1) break;
                    V_prime = log(1.0 - randf());
                }
                LHS = log(1.0 - randf());
                
                // Accept S early?
                double RHS = S * (log((q1 - S) / (double)(N - S)) - q3);
                if (LHS <= RHS) {
                    V_prime = LHS - RHS; break;
                }
                
                // Accept S at all?
                y = 1.0;
                int bottom, limit;
                if (n - 1 > S) {
                    bottom = N - n;      limit = N - S;
                } else {
                    bottom = N - S - 1;  limit = (int)q1;
                }
                
                for (int top = N - 1; top >= limit; --top, --bottom) {
                    y *= (double)top / (double)bottom;
                }
                
                V_prime = log(1.0 - randf());
            } while (q3 > -(log(y) + LHS) / S);
            
            // Skip S records and select the next one
            Inc(src, elem_size * S);
            memcpy(dst, src, elem_size);
            Inc(src, elem_size);
            Inc(dst, elem_size);
            
            N -= S + 1; n -= 1; q1 -= S;
            threshold -= AlphaInverse;
        }
    }
    
    method_a: {
        int top = N - n;
        while (1 < n && n < N) {
            double V = randf();
            
            int S = 0;
            double quot = (double)top / (double)N;
            while (quot > V) {
                S++; top--; N--;
                quot *= (double)top / (double)N;
            }
            
            // Skip S records and select the next one
            Inc(src, elem_size * S);
            memcpy(dst, src, elem_size);
            Inc(src, elem_size);
            Inc(dst, elem_size);
            
            N--; n--;
        }
    }
    
    // Fast special cases:
    if (n > 1) { // n == N ; do a fast copy of all remaining items
        memcpy(dst, src, elem_size * n);
    } else {     // n == 1 ; S is uniform in [0, N)
        int S = randr(N);
        Inc(src, elem_size * S);
        memcpy(dst, src, elem_size);
    }
}

// Safe-ish macro that prevents wrong elem_size,
// with (void *) casts for C++ compatibility
#define RandSelect(Src, SrcCount, Dst, DstCount) \
    rand_select((void *) Src, SrcCount,          \
                (void *) Dst, DstCount,          \
                sizeof(*(Src)))

Of course, this is much longer, and much more complicated code, but it's also around 2 to 5 times faster for most inputs, and easily dozens of times faster when N is large and n / N approaches 0. In none of my test cases has it ever run more than a few percent slower than Algorithm S.

For more details on how this works, see Vitter's paper; I couldn't even hope to do a better job explaining it. Also, as always, use this code at your own risk; I didn't trust it myself when I first wrote it, and, while my initial tests didn't show an easily detectable bias, there may still be nasty bugs in it.

We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
# Cycle timings are the best of 128 runs measured with __rdtsc()
n; N; Method S; Method A; Method D1; Method D2
32; 64; 1199; 712; 498; 531
16; 64; 746; 309; 309; 317
8; 64; 713; 242; 246; 243
4; 64; 698; 137; 135; 141
2; 64; 634; 32; 31; 35
1; 64; 640; 19; 19; 21
1; 64; 639; 20; 19; 20
128; 256; 4191; 2423; 2218; 2877
64; 256; 3352; 1409; 1421; 1546
32; 256; 3019; 1178; 1142; 1255
16; 256; 2910; 880; 797; 874
8; 256; 2794; 881; 857; 914
4; 256; 2523; 472; 474; 498
2; 256; 2484; 71; 69; 73
512; 1024; 22332; 19232; 18785; 21230
256; 1024; 16946; 10079; 9778; 11500
128; 1024; 13548; 6485; 6362; 7468
64; 1024; 12167; 5573; 5400; 5886
32; 1024; 11463; 4592; 4641; 4879
16; 1024; 10592; 3345; 3434; 3495
8; 1024; 10053; 3271; 3279; 3339
2048; 4096; 95620; 102535; 102195; 106622
1024; 4096; 71894; 61242; 60366; 65682
512; 4096; 57257; 39067; 38319; 40186
256; 4096; 48362; 26725; 26616; 27808
128; 4096; 46377; 20632; 18500; 13409
64; 4096; 41243; 18090; 8670; 8168
32; 4096; 40246; 15511; 4304; 3964
8192; 16384; 389604; 423554; 423655; 444740
4096; 16384; 289127; 265239; 264952; 278870
2048; 16384; 230349; 165614; 165858; 172072
1024; 16384; 200680; 114189; 114271; 117173
512; 16384; 173540; 88382; 87892; 59351
256; 16384; 165319; 74939; 36781; 29943
128; 16384; 161071; 67645; 17515; 15404
32768; 65536; 1563605; 1702371; 1705923; 1786723
16384; 65536; 1158174; 1067576; 1071668; 1120929
8192; 65536; 921210; 666581; 666649; 692912
4096; 65536; 788762; 457985; 458199; 474218
2048; 65536; 694324; 356689; 369341; 235859
1024; 65536; 660594; 300243; 170063; 118936
512; 65536; 645255; 275020; 76850; 59609
131072; 262144; 6262652; 6849169; 6856107; 7183388
65536; 262144; 4653875; 4276142; 4266440; 4497788
32768; 262144; 3697511; 2682024; 2681406; 2786451
16384; 262144; 3216539; 1838556; 1837907; 1897022
8192; 262144; 2807249; 1426520; 1554821; 919154
4096; 262144; 2647786; 1216239; 719543; 472479
2048; 262144; 2594443; 1101944; 345988; 229614
524288; 1048576; 25134406; 27477865; 27453719; 28809979
262144; 1048576; 18572246; 17098052; 17104752; 18009254
131072; 1048576; 14782997; 10728903; 10729915; 11114613
65536; 1048576; 12863335; 7379707; 7425323; 7633840
32768; 1048576; 11562808; 5693612; 6267886; 3720767
16384; 1048576; 10785682; 4920032; 2952058; 1895571
8192; 1048576; 10533216; 4500471; 1459342; 970113
2097152; 4194304; 100647601; 109992025; 109972330; 115199561
1048576; 4194304; 74433585; 68479257; 68527102; 72172319
524288; 4194304; 59206802; 42833816; 42862236; 44583978
262144; 4194304; 51609907; 29613321; 29555045; 30536135
131072; 4194304; 47068071; 23163535; 25131119; 14882598
65536; 4194304; 44869653; 21353959; 12740620; 8453649
32768; 4194304; 44314096; 20575597; 6631770; 4930347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment