teryror/prng_unbiased_range_reduction.md

## prng_unbiased_range_reduction.md

      
    Raw
  

              prng_unbiased_range_reduction.md
            
          
    Unbiased Range Reduction of Pseudo-Random Numbers

Suppose you're working on a virtual board game and want to simulate the roll of
a die. The first programming book I ever owned suggested the following:
int result = rand() % 6;

If you've spent any time reading about random numbers in computer science, you've
probably heard that this is a bad idea. The reason that's usually given is that
rand() (just like the standard RNGs of many other languages, such as
java.util.Random) is implemented as a linear congruential generator, the
problem being that the low bits of its output exhibit the least random behaviour.
The problem persists, though, even if you were to use a state-of-the-art algorithm
(such as permuted congruential generators or xororshiro; I'm not picking
a side in this feud today). This becomes apparent with a naive solution to the
original problem: "I'll just use the top bits, then!"
int result = (rand() >> N) % 6;

Assuming N = 29, this just leaves the top 3 bits of the output space, which then
get mapped to the 6 possible results. That means that 0 and 1 are twice as likely
to occur as any other result. This is called modulo bias.
You can reduce it by choosing a smaller value of N (such as zero, assuming you're
not using an LCG), but you can never elminate it entirely. Worse, it always favours
the low numbers of your final output range.
A common solution is to do the range reduction in floating point:
uint32_t randr(uint32_t max) {
    // Bit-level hackery would probably be preferable here,
    // but this is more readable
    double f = (double)rand() / (double)(1ULL << 32);
    
    return (uint32_t)(f * max);
}

This will distribute the bias throughout the output range more or less evenly, but
it won't get rid of it entirely.
Daniel Lemire suggests an interesting alternative that does not need a division,
and should therefore execute much faster, while also distributing the bias evenly:
uint32_t randr(uint32_t max) {
    uint32_t r = rand();
    return ((uint64_t)r * (uint64_t)max) >> 32;
}

He gives a rather abstract argument for why this works, but I find it easiest to
interpret this as the equivalent of the previous method in 32.32 fixed point.
That's good enough for most applications. Just for today though, let's have some
fun and let perfect be the enemy of good, shall we?
Reducing Bias

Unless the output range evenly divides the input range, getting rid of the bias
with a simple mapping is impossible, no matter how clever you are about it. This
should be apparent by now, I hope.
It's easy to do when we're allowed to use any number of random samples, though:
uint32_t randr(uint32_t max) {
    uint32_t result;
    do {
        result = rand();
    } while (result >= max);
    return result;
}

...don't use this code. For low values of max, this will easily take millions
of iterations before returning. We can make it viable with just a few small
modifications, though.
First, we can choose an intermediate range that is evenly divided by our target
range, and do one final range reduction at the end:
uint32_t randr(uint32_t max) {
    // There's more efficient ways of calculating these, but...
    uint32_t scale = (RAND_MAX + 1) / max;
    uint32_t scaled_max = max * scale;

    uint32_t result;
    do {
        result = rand();
    } while (result >= scaled_max);
    
    return result / scale;
}

Now we have a worst case behaviour when max = exp2(32) / 2 + 1, where there's
a near-50% chance of a generated number being outside the allowed range, and we'll
take, on average, two iterations to find one that's inside.
Again, Professor Lemire presents an alternative method without any divisions on
the common path:
uint32_t randr(uint32_t max) {
    uint64_t r = rand();
    uint64_t multiresult = r * max;
    
    uint32_t fractional = (uint32_t) multiresult;
    if (fractional < max) {
        uint32_t threshold = -max % max ;
        while (fractional < threshold) {
             r =  rand();
             multiresult = r * max;
             fractional = (uint32_t) multiresult;
        }
    }
    
    return multiresult >> 32;
}

He doesn't really explain this one, and it's a bit harder to understand why this
works at all. The basic idea is to detect values of multiresult whose fractional
parts are so low that their integral part (i.e. the would-be result of the reduction)
has to be one of the values that occur too frequently. This is possible because
multiresult is a multiple of max, and thus all possible values are evenly spaced.
The key insight is that each value of r will produce a unique value of fractional,
so threshold is just the number of distinct values of r we mean to reject. This
is, of course, always less than max, allowing us to put the slow calculation behind
a branch.
Except that's not actually true when max is a multiple of exp2(n). In that case,
fractional will always also be a multiple of exp2(n), and exp2(n) values of r
will map to the same value of fractional. However, threshold will also be a
multiple of the same power of two, which balances out so that there will still be
threshold values of r such that fractional < threshold.
The actual threshold calculation is a bit obtuse as well: What we essentially want
is exp2(32) % max, which we can't easily calculate in 32-bit arithmetic, so we
instead calculate (exp2(32) - max) % max, which is of course identical, and simplifies
to -max % max in C's unsigned semantics. Cheeky!
That's pretty good, but if you're anything like me, that loop condition will still
make you a bit queasy. If this code runs on your game server, for example, and an
attacker can not only figure out the state of the RNG, but also manipulate when
and how often it runs, this can potentially be made to run for many, many iterations,
leaving you open for a DoS attack! Then there's the fact that it can also happen,
you know, randomly, during normal operation.
We could do a lot of complicated maths to prove a definite worst-case behaviour
for a particular PRNG family, but I'm not really qualified to do that. It also seems
a bit pointless to me, since just knowing the behaviour isn't good enough if it
turns out to still be a problem.
The obvious solution is to set a maximum number of iterations:
uint32_t randr(uint32_t max, int samples) {
    uint64_t r = rand();
    uint64_t multiresult = r * max;
    
    uint32_t fractional = (uint32_t) multiresult;
    if (fractional < max) {
        uint32_t threshold = -max % max ;
        for (int i = 1; i < samples && fractional < threshold; ++i) {
             r = rand();
             multiresult = r * max;
             fractional = (uint32_t) multiresult;
        }
    }
    
    return multiresult >> 32;
}

This allows us to finely control how much bias we want to allow. Consider again
the case max = exp2(32) / 2 + 1; taking only one sample, exp2(32) / 2 - 1
values are outside the target range and get mapped to other values. Two values
remain unmapped, and are therefore half as likely as all the others.
Each additional sample (approximately) halves the probability of hitting this
biased case; effectively we're mixing a uniform distribution and the biased
distribution with a proportion of exp2(T) - 1 : 1, where T = samples - 1.
Different values of max will show different behaviour, but invariably, will
be closer to a uniform distribution, or at least, approach it faster (e.g.
max = exp2(32) / 3 * 2 is twice as likely to generate one half of all possible
output values as the other, but mixes in a uniform distribution with proportion
exp3(T) - 1 : 1).
Utilizing Rejected Samples

Suppose that, in addition to rolling dice, your game also required shuffling cards.
Whether you use an explicit shuffling algorithm or amortize the cost over the
course of a full game, this will put a lot of strain on your RNG, and you'd probably
like to use fewer samples, if possible. You'll be glad to hear that we can save
a few bits of entropy from each rejected sample. (NOTE: At this point, we're
definetely in overkill land. I just think this is cool, not that you should ever
do it.)
We already know that when we reject a sample, fractional is a random number in
[0, threshold) and therefore contains log2(threshold) bits of entropy. By saving
these bits, we can synthesize new samples to interject into our random sequence:
uint32_t randr(uint32_t max, int samples) {
    static uint64_t synthetic_sample = 0;
    static uint64_t synthetic_max = 1;
    
    uint64_t r = rand();
    uint64_t multiresult = r * max;
    
    uint32_t fractional = (uint32_t) multiresult;
    if (fractional < max) {
        uint32_t threshold = -max % max ;
        for (int i = 1; i < samples && fractional < threshold; ++i) {
             synthetic_max *= threshold;
             synthetic_sample *= threshold;
             synthetic_sample += fractional;
             
             if (synthetic_max >= (1ULL << 32)) {
                 if (synthetic_sample < (1ULL << 32)) {
                     r = synthetic_sample;
                 } else {
                     r = rand();
                 }
                 
                 synthetic_sample = 0;
                 synthetic_max = 1;
             } else {
                 r = rand();
             }
             
             multiresult = r * max;
             fractional = (uint32_t) multiresult;
        }
    }
    
    return multiresult >> 32;
}

With slightly more complicated tracking logic, you could save even the left-over
bits after synthesizing a sample, instead of resetting everything to zero, but I
felt this was good enough to illustrate the point.