Skip to content

Instantly share code, notes, and snippets.

@teryror
Last active June 13, 2018 10:25
Show Gist options
  • Save teryror/6020169c921c29215b81ab824f0cd904 to your computer and use it in GitHub Desktop.
Save teryror/6020169c921c29215b81ab824f0cd904 to your computer and use it in GitHub Desktop.

Unbiased Range Reduction of Pseudo-Random Numbers

Suppose you're working on a virtual board game and want to simulate the roll of a die. The first programming book I ever owned suggested the following:

int result = rand() % 6;

If you've spent any time reading about random numbers in computer science, you've probably heard that this is a bad idea. The reason that's usually given is that rand() (just like the standard RNGs of many other languages, such as java.util.Random) is implemented as a linear congruential generator, the problem being that the low bits of its output exhibit the least random behaviour.

The problem persists, though, even if you were to use a state-of-the-art algorithm (such as permuted congruential generators or xororshiro; I'm not picking a side in this feud today). This becomes apparent with a naive solution to the original problem: "I'll just use the top bits, then!"

int result = (rand() >> N) % 6;

Assuming N = 29, this just leaves the top 3 bits of the output space, which then get mapped to the 6 possible results. That means that 0 and 1 are twice as likely to occur as any other result. This is called modulo bias.

You can reduce it by choosing a smaller value of N (such as zero, assuming you're not using an LCG), but you can never elminate it entirely. Worse, it always favours the low numbers of your final output range.

A common solution is to do the range reduction in floating point:

uint32_t randr(uint32_t max) {
    // Bit-level hackery would probably be preferable here,
    // but this is more readable
    double f = (double)rand() / (double)(1ULL << 32);
    
    return (uint32_t)(f * max);
}

This will distribute the bias throughout the output range more or less evenly, but it won't get rid of it entirely.

Daniel Lemire suggests an interesting alternative that does not need a division, and should therefore execute much faster, while also distributing the bias evenly:

uint32_t randr(uint32_t max) {
    uint32_t r = rand();
    return ((uint64_t)r * (uint64_t)max) >> 32;
}

He gives a rather abstract argument for why this works, but I find it easiest to interpret this as the equivalent of the previous method in 32.32 fixed point.

That's good enough for most applications. Just for today though, let's have some fun and let perfect be the enemy of good, shall we?

Reducing Bias

Unless the output range evenly divides the input range, getting rid of the bias with a simple mapping is impossible, no matter how clever you are about it. This should be apparent by now, I hope.

It's easy to do when we're allowed to use any number of random samples, though:

uint32_t randr(uint32_t max) {
    uint32_t result;
    do {
        result = rand();
    } while (result >= max);
    return result;
}

...don't use this code. For low values of max, this will easily take millions of iterations before returning. We can make it viable with just a few small modifications, though.

First, we can choose an intermediate range that is evenly divided by our target range, and do one final range reduction at the end:

uint32_t randr(uint32_t max) {
    // There's more efficient ways of calculating these, but...
    uint32_t scale = (RAND_MAX + 1) / max;
    uint32_t scaled_max = max * scale;

    uint32_t result;
    do {
        result = rand();
    } while (result >= scaled_max);
    
    return result / scale;
}

Now we have a worst case behaviour when max = exp2(32) / 2 + 1, where there's a near-50% chance of a generated number being outside the allowed range, and we'll take, on average, two iterations to find one that's inside.

Again, Professor Lemire presents an alternative method without any divisions on the common path:

uint32_t randr(uint32_t max) {
    uint64_t r = rand();
    uint64_t multiresult = r * max;
    
    uint32_t fractional = (uint32_t) multiresult;
    if (fractional < max) {
        uint32_t threshold = -max % max ;
        while (fractional < threshold) {
             r =  rand();
             multiresult = r * max;
             fractional = (uint32_t) multiresult;
        }
    }
    
    return multiresult >> 32;
}

He doesn't really explain this one, and it's a bit harder to understand why this works at all. The basic idea is to detect values of multiresult whose fractional parts are so low that their integral part (i.e. the would-be result of the reduction) has to be one of the values that occur too frequently. This is possible because multiresult is a multiple of max, and thus all possible values are evenly spaced.

The key insight is that each value of r will produce a unique value of fractional, so threshold is just the number of distinct values of r we mean to reject. This is, of course, always less than max, allowing us to put the slow calculation behind a branch.

Except that's not actually true when max is a multiple of exp2(n). In that case, fractional will always also be a multiple of exp2(n), and exp2(n) values of r will map to the same value of fractional. However, threshold will also be a multiple of the same power of two, which balances out so that there will still be threshold values of r such that fractional < threshold.

The actual threshold calculation is a bit obtuse as well: What we essentially want is exp2(32) % max, which we can't easily calculate in 32-bit arithmetic, so we instead calculate (exp2(32) - max) % max, which is of course identical, and simplifies to -max % max in C's unsigned semantics. Cheeky!

That's pretty good, but if you're anything like me, that loop condition will still make you a bit queasy. If this code runs on your game server, for example, and an attacker can not only figure out the state of the RNG, but also manipulate when and how often it runs, this can potentially be made to run for many, many iterations, leaving you open for a DoS attack! Then there's the fact that it can also happen, you know, randomly, during normal operation.

We could do a lot of complicated maths to prove a definite worst-case behaviour for a particular PRNG family, but I'm not really qualified to do that. It also seems a bit pointless to me, since just knowing the behaviour isn't good enough if it turns out to still be a problem.

The obvious solution is to set a maximum number of iterations:

uint32_t randr(uint32_t max, int samples) {
    uint64_t r = rand();
    uint64_t multiresult = r * max;
    
    uint32_t fractional = (uint32_t) multiresult;
    if (fractional < max) {
        uint32_t threshold = -max % max ;
        for (int i = 1; i < samples && fractional < threshold; ++i) {
             r = rand();
             multiresult = r * max;
             fractional = (uint32_t) multiresult;
        }
    }
    
    return multiresult >> 32;
}

This allows us to finely control how much bias we want to allow. Consider again the case max = exp2(32) / 2 + 1; taking only one sample, exp2(32) / 2 - 1 values are outside the target range and get mapped to other values. Two values remain unmapped, and are therefore half as likely as all the others.

Each additional sample (approximately) halves the probability of hitting this biased case; effectively we're mixing a uniform distribution and the biased distribution with a proportion of exp2(T) - 1 : 1, where T = samples - 1. Different values of max will show different behaviour, but invariably, will be closer to a uniform distribution, or at least, approach it faster (e.g. max = exp2(32) / 3 * 2 is twice as likely to generate one half of all possible output values as the other, but mixes in a uniform distribution with proportion exp3(T) - 1 : 1).

Utilizing Rejected Samples

Suppose that, in addition to rolling dice, your game also required shuffling cards. Whether you use an explicit shuffling algorithm or amortize the cost over the course of a full game, this will put a lot of strain on your RNG, and you'd probably like to use fewer samples, if possible. You'll be glad to hear that we can save a few bits of entropy from each rejected sample. (NOTE: At this point, we're definetely in overkill land. I just think this is cool, not that you should ever do it.)

We already know that when we reject a sample, fractional is a random number in [0, threshold) and therefore contains log2(threshold) bits of entropy. By saving these bits, we can synthesize new samples to interject into our random sequence:

uint32_t randr(uint32_t max, int samples) {
    static uint64_t synthetic_sample = 0;
    static uint64_t synthetic_max = 1;
    
    uint64_t r = rand();
    uint64_t multiresult = r * max;
    
    uint32_t fractional = (uint32_t) multiresult;
    if (fractional < max) {
        uint32_t threshold = -max % max ;
        for (int i = 1; i < samples && fractional < threshold; ++i) {
             synthetic_max *= threshold;
             synthetic_sample *= threshold;
             synthetic_sample += fractional;
             
             if (synthetic_max >= (1ULL << 32)) {
                 if (synthetic_sample < (1ULL << 32)) {
                     r = synthetic_sample;
                 } else {
                     r = rand();
                 }
                 
                 synthetic_sample = 0;
                 synthetic_max = 1;
             } else {
                 r = rand();
             }
             
             multiresult = r * max;
             fractional = (uint32_t) multiresult;
        }
    }
    
    return multiresult >> 32;
}

With slightly more complicated tracking logic, you could save even the left-over bits after synthesizing a sample, instead of resetting everything to zero, but I felt this was good enough to illustrate the point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment