Skip to content

Instantly share code, notes, and snippets.

@nanis
Created August 14, 2015 21:12
Show Gist options
  • Save nanis/ef22347b5b11c1390e44 to your computer and use it in GitHub Desktop.
Save nanis/ef22347b5b11c1390e44 to your computer and use it in GitHub Desktop.
Originally at http://mumble.net/~campbell/tmp/random_real.c Saved here for reference (as allowed by the copyright statement)
/*
* Uniform random floats: How to generate a double-precision
* floating-point number in [0, 1] uniformly at random given a uniform
* random source of bits.
*
* Copyright (c) 2014, Taylor R Campbell
*
* Verbatim copying and distribution of this entire article are
* permitted worldwide, without royalty, in any medium, provided
* this notice, and the copyright notice, are preserved.
*
* The naive approach of generating an integer in {0, 1, ..., 2^k - 1}
* for some k and dividing by 2^k, as used by, e.g., Boost.Random and
* GCC 4.8's implementation of C++11 uniform_real_distribution, gives a
* nonuniform distribution:
*
* - If k is 64, a natural choice for the source of bits, then because
* the set {0, 1, ..., 2^53 - 1} is represented exactly, whereas many
* subsets of {2^53, 2^53 + 1, ..., 2^64 - 1} are rounded to common
* floating-point numbers, outputs in [0, 2^-11) are underrepresented.
*
* - If k is 53, in an attempt to avoid nonuniformity due to rounding,
* then numbers in (0, 2^-53) and 1 will never be output. These
* outputs have very small, but nonnegligible, probability: the Bitcoin
* network today randomly guesses solutions every ten minutes to
* problems for which random guessing has much, much lower probability
* of success, closer to 2^-64.
*
* What is the `uniform' distribution we want, anyway? It is obviously
* not the uniform discrete distribution on the finite set of
* floating-point numbers in [0, 1] -- that would be silly. For our
* `uniform' distribution, we would like to imagine[*] drawing a real
* number in [0, 1] uniformly at random, and then choosing the nearest
* floating-point number to it.
*
* To formalize this, start with the uniform continuous distribution on
* [0, 1] in the real numbers, whose measure is mu([a, b]) = b - a for
* any [a, b] contained in [0, 1]. Next, let rho be the default
* (round-to-nearest/ties-to-even) rounding map from real numbers to
* floating-point numbers, so that, e.g., rho(0.50000000000000001) =
* 0.5.
*
* Note that the preimage under rho of any floating-point number --
* that is, for a floating-point number x, the set of real numbers that
* will be rounded to x, or { u \in [0, 1] : rho(u) = x } -- is an
* interval. Its measure, mu(rho^-1(x)), is the measure of the set of
* numbers in [0, 1] that will be rounded to x, and that is precisely
* the probability we want for x in our `uniform' distribution.
*
* Instead of drawing from real numbers in [0, 1] uniformly at random,
* we can imagine drawing from the space of infinite sequences of bits
* uniformly at random, say (0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, ...), and
* interpreting the result as the fractional part of the binary
* expansion of a real number in [0, 1]:
*
* 0*(1/2) + 0*(1/4) + 0*(1/8) + 1*(1/16) + 0*(1/32) + 1*(1/64) + ...
*
* Then if we round that number, we'll get a floating-point number with
* the desired probability. But we can't just directly compute that
* sum one term at a time from left to right: once we get to 53
* significant bits, further bits will be rounded away by addition, so
* the result will be biased to be `even', i.e. to have zero least
* significant bit. And we obviously can't choose uniformly at random
* from the space of infinite sequences of bits and round the result.
*
* We can, however, choose finite sequences of bits uniformly at
* random, and, except for a detail about ties, after a certain
* constant number of bits, no matter what further bits we choose they
* won't change what floating-point number we get by rounding. So we
* can choose a sufficiently large finite sequence of bits and round
* that, knowing that the result of rounding would not change no matter
* how many extra bits we add.
*
* The detail about ties is that if we simply choose, say, 1088 bits --
* the least multiple of 64 greater than the exponent of the smallest
* possible (subnormal) floating-point number, 2^-1074 -- and round it
* in the default round-to-nearest/ties-to-even mode, the result would
* be biased to be even.
*
* The reason is that the probability of seeing a tie in any finite
* sequence of bits is nonzero, but but real ties occur only in sets of
* measure zero -- they happen only when *every* remaining bit in the
* infinite sequence is also zero. To avoid this, before rounding, we
* can simulate a sticky bit by setting the least significant bit of
* the finite sequence we choose, in order to prevent rounding what
* merely looks approximately like a tie to even.
*
* One little, perhaps counterintuitive, detail remains: the boundary
* values, 0 and 1, have half the probability you might naively expect,
* because we exclude from consideration the half of the interval
* rho^-1(0) below 0 and the half of the interval rho^-1(1) above 1.
*
* Some PRNG libraries provide variations on the naive algorithms that
* omit one or both boundary values, in order to allow, e.g., computing
* log(x) or log(1 - x). For example, instead of drawing from {0, 1,
* ..., 2^53 - 1} and dividing by 2^53, some
*
* - divide by 2^53 - 1 instead, to allow both boundary points;
* - draw from {1, 2, ..., 2^53} instead, to omit 0 but allow 1; or
* - draw from {-2^52, -2^52 + 1, ..., 2^52 - 2, 2^52 - 1} (i.e., the
* signed rather than unsigned 53-bit integers) and then add 1/2 before
* dividing, to omit both boundary points.
*
* These algorithms still have the biases described above, however.
* Given a uniform distribution on [0, 1], you can always turn it into
* a uniform distribution on (0, 1) or (0, 1] or [0, 1) by rejection
* sampling if 0 or 1 is a problem.
*
* The code below demonstrates all three algorithms with brief notes:
* - random_real_64, which chooses and normalizes a 64-bit integer;
* - random_real_53, which chooses and normalizes a 53-bit integer; and
* - random_real, which chooses and rounds a binary expansion.
* They are written in terms of a uniform random source of 64-bit
* integers spelled random64. This is usually more convenient than a
* source of fair coin flips.
*
* WARNING: I have only proved the code correct; I have not tested it.
*
* Generating 64-bit integers uniformly at random is left as an
* exercise for the crypto-minded reader.
*
* Thanks to Steve Canon and Alex Barnett for explaining the binary
* expansion approach to me and talking the problem over.
*
* [*] For the pedantic who may stumble upon this before reading ahead,
* we cannot, of course, assign a meaningful probability to a choice of
* real number in [0, 1]. What we can do, however, is imagine that it
* were meaningful, and then formalize it in the next sentence with the
* correct sense of measure.
*/
#include <math.h>
#include <stdint.h>
uint64_t random64(void);
/*
* random_real_64: Pick an integer in {0, 1, ..., 2^64 - 1} uniformly
* at random, convert it to double, and divide it by 2^64. Values in
* [2^-11, 1] are overrepresented, small exponents have low precision,
* and exponents below -64 are not possible.
*/
double
random_real_64(void)
{
return ldexp((double)random64(), -64);
}
/*
* random_real_53: Pick an integer in {0, 1, ..., 2^53 - 1} uniformly
* at random, convert it to double, and divide it by 2^53. Many
* possible outputs are not represented: 2^-54, 1, &c. There are a
* little under 2^62 floating-point values in [0, 1], but only 2^53
* possible outputs here.
*/
double
random_real_53(void)
{
return ldexp((double)(random64() & ((1ULL << 53) - 1)), -53);
}
#define clz64 __builtin_clzll /* XXX GCCism */
/*
* random_real: Generate a stream of bits uniformly at random and
* interpret it as the fractional part of the binary expansion of a
* number in [0, 1], 0.00001010011111010100...; then round it.
*/
double
random_real(void)
{
int exponent = -64;
uint64_t significand;
unsigned shift;
/*
* Read zeros into the exponent until we hit a one; the rest
* will go into the significand.
*/
while (__predict_false((significand = random64()) == 0)) {
exponent -= 64;
/*
* If the exponent falls below -1074 = emin + 1 - p,
* the exponent of the smallest subnormal, we are
* guaranteed the result will be rounded to zero. This
* case is so unlikely it will happen in realistic
* terms only if random64 is broken.
*/
if (__predict_false(exponent < -1074))
return 0;
}
/*
* There is a 1 somewhere in significand, not necessarily in
* the most significant position. If there are leading zeros,
* shift them into the exponent and refill the less-significant
* bits of the significand. Can't predict one way or another
* whether there are leading zeros: there's a fifty-fifty
* chance, if random64 is uniformly distributed.
*/
shift = clz64(significand);
if (shift != 0) {
exponent -= shift;
significand <<= shift;
significand |= (random64() >> (64 - shift));
}
/*
* Set the sticky bit, since there is almost surely another 1
* in the bit stream. Otherwise, we might round what looks
* like a tie to even when, almost surely, were we to look
* further in the bit stream, there would be a 1 breaking the
* tie.
*/
significand |= 1;
/*
* Finally, convert to double (rounding) and scale by
* 2^exponent.
*/
return ldexp((double)significand, exponent);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment