Created
August 14, 2015 21:12
-
-
Save nanis/ef22347b5b11c1390e44 to your computer and use it in GitHub Desktop.
Originally at http://mumble.net/~campbell/tmp/random_real.c Saved here for reference (as allowed by the copyright statement)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
* Uniform random floats: How to generate a double-precision | |
* floating-point number in [0, 1] uniformly at random given a uniform | |
* random source of bits. | |
* | |
* Copyright (c) 2014, Taylor R Campbell | |
* | |
* Verbatim copying and distribution of this entire article are | |
* permitted worldwide, without royalty, in any medium, provided | |
* this notice, and the copyright notice, are preserved. | |
* | |
* The naive approach of generating an integer in {0, 1, ..., 2^k - 1} | |
* for some k and dividing by 2^k, as used by, e.g., Boost.Random and | |
* GCC 4.8's implementation of C++11 uniform_real_distribution, gives a | |
* nonuniform distribution: | |
* | |
* - If k is 64, a natural choice for the source of bits, then because | |
* the set {0, 1, ..., 2^53 - 1} is represented exactly, whereas many | |
* subsets of {2^53, 2^53 + 1, ..., 2^64 - 1} are rounded to common | |
* floating-point numbers, outputs in [0, 2^-11) are underrepresented. | |
* | |
* - If k is 53, in an attempt to avoid nonuniformity due to rounding, | |
* then numbers in (0, 2^-53) and 1 will never be output. These | |
* outputs have very small, but nonnegligible, probability: the Bitcoin | |
* network today randomly guesses solutions every ten minutes to | |
* problems for which random guessing has much, much lower probability | |
* of success, closer to 2^-64. | |
* | |
* What is the `uniform' distribution we want, anyway? It is obviously | |
* not the uniform discrete distribution on the finite set of | |
* floating-point numbers in [0, 1] -- that would be silly. For our | |
* `uniform' distribution, we would like to imagine[*] drawing a real | |
* number in [0, 1] uniformly at random, and then choosing the nearest | |
* floating-point number to it. | |
* | |
* To formalize this, start with the uniform continuous distribution on | |
* [0, 1] in the real numbers, whose measure is mu([a, b]) = b - a for | |
* any [a, b] contained in [0, 1]. Next, let rho be the default | |
* (round-to-nearest/ties-to-even) rounding map from real numbers to | |
* floating-point numbers, so that, e.g., rho(0.50000000000000001) = | |
* 0.5. | |
* | |
* Note that the preimage under rho of any floating-point number -- | |
* that is, for a floating-point number x, the set of real numbers that | |
* will be rounded to x, or { u \in [0, 1] : rho(u) = x } -- is an | |
* interval. Its measure, mu(rho^-1(x)), is the measure of the set of | |
* numbers in [0, 1] that will be rounded to x, and that is precisely | |
* the probability we want for x in our `uniform' distribution. | |
* | |
* Instead of drawing from real numbers in [0, 1] uniformly at random, | |
* we can imagine drawing from the space of infinite sequences of bits | |
* uniformly at random, say (0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, ...), and | |
* interpreting the result as the fractional part of the binary | |
* expansion of a real number in [0, 1]: | |
* | |
* 0*(1/2) + 0*(1/4) + 0*(1/8) + 1*(1/16) + 0*(1/32) + 1*(1/64) + ... | |
* | |
* Then if we round that number, we'll get a floating-point number with | |
* the desired probability. But we can't just directly compute that | |
* sum one term at a time from left to right: once we get to 53 | |
* significant bits, further bits will be rounded away by addition, so | |
* the result will be biased to be `even', i.e. to have zero least | |
* significant bit. And we obviously can't choose uniformly at random | |
* from the space of infinite sequences of bits and round the result. | |
* | |
* We can, however, choose finite sequences of bits uniformly at | |
* random, and, except for a detail about ties, after a certain | |
* constant number of bits, no matter what further bits we choose they | |
* won't change what floating-point number we get by rounding. So we | |
* can choose a sufficiently large finite sequence of bits and round | |
* that, knowing that the result of rounding would not change no matter | |
* how many extra bits we add. | |
* | |
* The detail about ties is that if we simply choose, say, 1088 bits -- | |
* the least multiple of 64 greater than the exponent of the smallest | |
* possible (subnormal) floating-point number, 2^-1074 -- and round it | |
* in the default round-to-nearest/ties-to-even mode, the result would | |
* be biased to be even. | |
* | |
* The reason is that the probability of seeing a tie in any finite | |
* sequence of bits is nonzero, but but real ties occur only in sets of | |
* measure zero -- they happen only when *every* remaining bit in the | |
* infinite sequence is also zero. To avoid this, before rounding, we | |
* can simulate a sticky bit by setting the least significant bit of | |
* the finite sequence we choose, in order to prevent rounding what | |
* merely looks approximately like a tie to even. | |
* | |
* One little, perhaps counterintuitive, detail remains: the boundary | |
* values, 0 and 1, have half the probability you might naively expect, | |
* because we exclude from consideration the half of the interval | |
* rho^-1(0) below 0 and the half of the interval rho^-1(1) above 1. | |
* | |
* Some PRNG libraries provide variations on the naive algorithms that | |
* omit one or both boundary values, in order to allow, e.g., computing | |
* log(x) or log(1 - x). For example, instead of drawing from {0, 1, | |
* ..., 2^53 - 1} and dividing by 2^53, some | |
* | |
* - divide by 2^53 - 1 instead, to allow both boundary points; | |
* - draw from {1, 2, ..., 2^53} instead, to omit 0 but allow 1; or | |
* - draw from {-2^52, -2^52 + 1, ..., 2^52 - 2, 2^52 - 1} (i.e., the | |
* signed rather than unsigned 53-bit integers) and then add 1/2 before | |
* dividing, to omit both boundary points. | |
* | |
* These algorithms still have the biases described above, however. | |
* Given a uniform distribution on [0, 1], you can always turn it into | |
* a uniform distribution on (0, 1) or (0, 1] or [0, 1) by rejection | |
* sampling if 0 or 1 is a problem. | |
* | |
* The code below demonstrates all three algorithms with brief notes: | |
* - random_real_64, which chooses and normalizes a 64-bit integer; | |
* - random_real_53, which chooses and normalizes a 53-bit integer; and | |
* - random_real, which chooses and rounds a binary expansion. | |
* They are written in terms of a uniform random source of 64-bit | |
* integers spelled random64. This is usually more convenient than a | |
* source of fair coin flips. | |
* | |
* WARNING: I have only proved the code correct; I have not tested it. | |
* | |
* Generating 64-bit integers uniformly at random is left as an | |
* exercise for the crypto-minded reader. | |
* | |
* Thanks to Steve Canon and Alex Barnett for explaining the binary | |
* expansion approach to me and talking the problem over. | |
* | |
* [*] For the pedantic who may stumble upon this before reading ahead, | |
* we cannot, of course, assign a meaningful probability to a choice of | |
* real number in [0, 1]. What we can do, however, is imagine that it | |
* were meaningful, and then formalize it in the next sentence with the | |
* correct sense of measure. | |
*/ | |
#include <math.h> | |
#include <stdint.h> | |
uint64_t random64(void); | |
/* | |
* random_real_64: Pick an integer in {0, 1, ..., 2^64 - 1} uniformly | |
* at random, convert it to double, and divide it by 2^64. Values in | |
* [2^-11, 1] are overrepresented, small exponents have low precision, | |
* and exponents below -64 are not possible. | |
*/ | |
double | |
random_real_64(void) | |
{ | |
return ldexp((double)random64(), -64); | |
} | |
/* | |
* random_real_53: Pick an integer in {0, 1, ..., 2^53 - 1} uniformly | |
* at random, convert it to double, and divide it by 2^53. Many | |
* possible outputs are not represented: 2^-54, 1, &c. There are a | |
* little under 2^62 floating-point values in [0, 1], but only 2^53 | |
* possible outputs here. | |
*/ | |
double | |
random_real_53(void) | |
{ | |
return ldexp((double)(random64() & ((1ULL << 53) - 1)), -53); | |
} | |
#define clz64 __builtin_clzll /* XXX GCCism */ | |
/* | |
* random_real: Generate a stream of bits uniformly at random and | |
* interpret it as the fractional part of the binary expansion of a | |
* number in [0, 1], 0.00001010011111010100...; then round it. | |
*/ | |
double | |
random_real(void) | |
{ | |
int exponent = -64; | |
uint64_t significand; | |
unsigned shift; | |
/* | |
* Read zeros into the exponent until we hit a one; the rest | |
* will go into the significand. | |
*/ | |
while (__predict_false((significand = random64()) == 0)) { | |
exponent -= 64; | |
/* | |
* If the exponent falls below -1074 = emin + 1 - p, | |
* the exponent of the smallest subnormal, we are | |
* guaranteed the result will be rounded to zero. This | |
* case is so unlikely it will happen in realistic | |
* terms only if random64 is broken. | |
*/ | |
if (__predict_false(exponent < -1074)) | |
return 0; | |
} | |
/* | |
* There is a 1 somewhere in significand, not necessarily in | |
* the most significant position. If there are leading zeros, | |
* shift them into the exponent and refill the less-significant | |
* bits of the significand. Can't predict one way or another | |
* whether there are leading zeros: there's a fifty-fifty | |
* chance, if random64 is uniformly distributed. | |
*/ | |
shift = clz64(significand); | |
if (shift != 0) { | |
exponent -= shift; | |
significand <<= shift; | |
significand |= (random64() >> (64 - shift)); | |
} | |
/* | |
* Set the sticky bit, since there is almost surely another 1 | |
* in the bit stream. Otherwise, we might round what looks | |
* like a tie to even when, almost surely, were we to look | |
* further in the bit stream, there would be a 1 breaking the | |
* tie. | |
*/ | |
significand |= 1; | |
/* | |
* Finally, convert to double (rounding) and scale by | |
* 2^exponent. | |
*/ | |
return ldexp((double)significand, exponent); | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment