Skip to content

Instantly share code, notes, and snippets.

@fuglede
Last active May 14, 2018 20:56
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save fuglede/772402ecc3997ada82a03ce65361781b to your computer and use it in GitHub Desktop.
Thoughts on the .NET RNG

Randomness of System.Random

In this post, we will take a close look at the default random number generator in the .NET Framework, System.Random, and see that it is probably insufficient for use in simulation when asked to provide positive signed 32-bit integers and floating points, yet might still be useful when prompted for any signed 32-bit integers.

I should preface the post by saying that I am by no means an expert on random number generation and only put together this post to resolve some of my own questions on .NET for which no documentation appears to be available anywhere.

Background

In my work, a good amount of time is spent doing simulation modeling: Analyses of computational algorithms whose underlying variables are governed by probability distributions, and which may be studied by running the systems with concrete samples when actual analysis in infeasible. In fact, even when analytical studies of such systems are possible, simulation tends to be faster to implement. Moreover, working in a Microsoft-oriented company, much time is spent together with the .NET Framework whose class library includes a pseudo-random number generator, which we will refer to as System.Random.

For such a generator to be useful in the context of simulation, it must have certain properties. In particular we require that it has no apparent bias. For example, if a generator can produce integers, one way to use it for simulating the flipping of a fair coin would be to take such an integer, and associate it to the events of heads/tails if the integer is odd/even respectively. In this case, if the generator is biased towards even integers, it is clearly inadequate for the task at hand.

Now, one does not have to search long on StackOverflow to find claims that System.Random is not useful in the context of simulation, but what is harder to find are good reasons for why this would be the case; in particular, the Microsoft documentation of the generator makes no such disclaimers.

Here, we will take a look at some possible pitfalls when using System.Random and see how it manages to come through in the clutch when used in a very particular manner. In particular, we will see that one should probably avoid using System.Random.Next() which produces positive integers and System.Random.NextDouble() which produces floating-point numbers between 0 and 1, but that System.Random.Next(Int32.MinValue, Int32.MaxValue), which can generate any 32-bit pattern (except 011...11), performs as well on statistical tests as one could hope for and thus might actually still be useful.

Additive and subtractive number generators

To evaluate System.Random, we must take a close look at the algorithm underlying its random number generation. As Microsoft states in its documentation, this is based on an algorithm described in Donald Knuth's The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Section 3.2.2. The actual implementation in .NET Framework 4.7 is available in the Microsoft Reference Source.

Assuming that one has a procedure of determining $n-1$ random numbers $X_1, \dots, X_{n-1}$, the additive number generator, attributed by Knuth to Mitchell and Moore, determines a new random number $X_n$ by

$$X_n = X_{n-l} + X_{n-k} \mod m,$$

where $l$, $k$, and $m$ are integers to be chosen in such a way that ensures that the algorithm produces useful numbers.

The first thing we notice is that by knowing $l$, $k$ and $m$ and observing sufficiently many outputs of the generator, one can determine the outputs that follow. This property means that the generator should not be used for cryptographical purposes, which Microsoft rightfully notes in their documentation.

Given the finiteness of the space of states involved in the generation of numbers, it is clear that there exists an integer $p$, called the period, so that $X_n = X_{n+p}$ for all sufficiently large $p$. For the generator to be used for simulation, this number $p$ must be greater than the number of integers we expect to draw from the generator. If, for instance, we were in the situation that our simulation requires the sampling of 100 integers, and if $p = 100$, then we would end up getting the same results for every rerun of our simulation.

Luckily, Exercise 3.2.2.30 in Knuth's book states that as long as $f(x) = x^k \pm x^{k-l} \pm 1$ is a so-called primitive polynomial modulo 2, and $m = 2^d$ for some $d$, then the period of the generator is $2^{d-1}(2^k-1)$. In particular, this means that the least significant bit in $X_n$, which determines if $X_n$ is odd or even, has period $2^k-1$. Let us not worry about what a primitive polynomial is but instead just note that one primitive example is given by $k = 55$, $l = 24$. That is, disregarding all the issues we mention below, by using

$$X_n = X_{n-55} + X_{n-24} \mod m,$$

the exercise ensures that we can simulate $2^{55}-1$ coin tosses without seeing periodicity, and if that is not enough, higher values of $k$ can also be chosen.

Before moving on, let us note that System.Random uses a subtractive version of the algorithm, in which

$$X_n = X_{n-k} + X_{n-l} \mod m.$$

All other things equal, the solution to Exercise 3.2.2.23 in Knuth's book notes that in the case of primitive polynomials, the period is the same for the additive and subtractive generators.

Warning lights

So far everything seems alright, but we do face some concrete pitfalls.

A rather unfortunate typo

In their implementation of the Mitchel--Moore generator, Microsoft made a rather unfortunate typo, using $k = 55$, $l = 34$, rather than $k = 55$, $l = 24$. This is bad, because unlike what was the situation above, $x^{55} + x^{21} - 1$ is not a primitive polynomial, meaning that we have no a priori theoretical knowledge about the period of the generator. Nonetheless, we will see below that in the best possible case, we can generate $2^{35}$ integers without this being an obvious problem.

It might be that other arguments can be applied to make deductions about the period in this non-primitive case, but I am not particularly knowledgable about the relevant mathematics (nor do I find them particularly appealing); the solution in such cases is to ask online. At the time of writing no answer has been posted.

In fact, even if Microsoft had used $l = 24$, we would still not be able to deduce anything about the period: As mentioned above, the theory applies whenever the modulus is a power of two, but in the Microsoft implementation, all operations are modulo $2^{31}-1$. As we will see in the next section, this deviation from the original algorithm might actually make the number generation more robust.

Microsoft acknowledged the existence of the typo more than 5 years ago but have not yet fixed it due to concerns over compatibility.

Inherent bias

It is obvious from the definition of $X_n$ that if $m = 2^d$ for some $d$, then out of $X_n$, $X_{n-k}$, and $X_{n-l}$, 0 or 2 of the three numbers will be odd. Moreover, assuming perfect random generation, seeing 2 odds is three times as likely as seeing 0 odds. In Exercise 3.3.2.31, Knuth uses this fact to show that given a random sample of 79 consecutive draws from the generator, it is very likely that the sample contains more odd numbers than even numbers, which is an example of the exact kind of bias that renders simulation a scary endeavor for a provider of business critical simulations.

As mentioned above, in the Microsoft implementation, the modulus is an odd number, so Microsoft makes its way of out the odd/even bias safely. Whether this is on purpose or not is hard to tell as no part of the documentation explains this particular addition, but it certainly does look like we just got very lucky here. In fact, for the most part, the implementation is taken ad verbatim from Press, Teukolsky, Vetterling, Flannery - Numerical Recipes in C, in which the authors perform the arithmetic modulo $1000000000$; an even number.

At the same time, Knuth refers to Normand, Herrmann, Hajjar - Precise Calculation of the Dynamic Exponent of Two-Dimensional Perculation in which it is stated that the generator would produce observable bias for sample sizes of $10^{11}$, even though the period of the generator is much larger.

Enter statistics

Determining the quality of random number generators is tricky business, and besides trying to be as systematic as Knuth when defining the generator, the best bet for determining the actual quality of it is to use statistical tests. In the example given above, we saw that certain ways of generating random numbers would cause the generator to be insufficient for simulating coin flipping, since heads might occur more often than tails. On the other hand, we could easily come up with a generator creating exactly as many heads and tails, but this is probably also not desirable, since if we were to perform the experiment on an actual fair coin (noting that thought experiments on actual experiments are allowed to use nonexistent objects), we would expect some discrepancy between the number of heads and tails. But, we expect that if we see too many heads in one experiment, we would be just as likely to see too many tails once we repeat it.

Being able to compare observed outcomes with theoretical ones is at the heart of statistical hypothesis testing which quantifies the question of "Given these coin flip outcomes, can we conclude that the coin is not fair?". In the context of random number generator testing, the approach is to set up a battery of such tests, thus probing the generator for various kinds of bias.

PractRand

For the purpose of this post, we will make use of PractRand. Another popular alternative is TestU01.

In the basic operation of PractRand, one provides a sample of maybe-random bytes upon which it runs its collection of statistical tests. If no tests show significant deviations from a true uniform generator, the sample size is doubled, and the tests are run again. PractRand terminates when at least one test fails, if a sample of 32 TB shows no failures, or if the user runs out of patience and closes it.

PractRand can be used in various ways; I found that the simplest way was to download the source and compile RNG_test, which can read a stream of random bytes from standard input.

Upon first glance, the outputs from PractRand are about as incomprehensible as the statistical tests they are based on, and rather than actually using the exact outputs, we will be more interested in how they compare to the outputs from other generators. As such, this John D. Cook post, and this PCG post serve as useful references for examples of PractRand outputs.

Warming up

Let us first give an example of what PractRand output looks like when it sees something it does not like. For this purpose, we use the specification from this 2001 Dilbert comic:

Nine nine nine nine nine nine nine

To wire this into PractRand, we just need a program which continuously writes to standard output the number nine. All code is contained in the appendix at the bottom of this post.

Compiling the example into an executable called nine, the result of piping the output into RNG_test is seen below.

> nine | RNG_test stdin8
RNG_test using PractRand version 0.93
RNG = RNG_stdin8, seed = 0x7e26e416
test set = normal, folding = standard (8 bit)

rng=RNG_stdin8, seed=0x7e26e416
length= 32 megabytes (2^25 bytes), time= 2.7 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN(2+0,13-4,T)                  R=+5636982 p = 0           FAIL !!!!!!!!
  BCFN(2+1,13-4,T)                  R=+2664412 p = 0           FAIL !!!!!!!!
  BCFN(2+2,13-5,T)                  R=+1609820 p = 0           FAIL !!!!!!!!
  BCFN(2+3,13-5,T)                  R=+781907 p = 0           FAIL !!!!!!!!
  BCFN(2+4,13-5,T)                  R=+382945 p = 0           FAIL !!!!!!!!
  BCFN(2+5,13-6,T)                  R=+233930 p = 0           FAIL !!!!!!!!
  BCFN(2+6,13-6,T)                  R=+115740 p = 0           FAIL !!!!!!!!
  BCFN(2+7,13-7,T)                  R=+69900  p = 0           FAIL !!!!!!!!
  BCFN(2+8,13-8,T)                  R=+41294  p = 0           FAIL !!!!!!!!
  BCFN(2+9,13-8,T)                  R=+20560  p =  6e-5219    FAIL !!!!!!!!
  BCFN(2+10,13-9,T)                 R=+11783  p =  1e-2648    FAIL !!!!!!!!
  BCFN(2+11,13-9,T)                 R= +5870  p =  4e-1320    FAIL !!!!!!!!
  DC6-9x1Bytes-1                    R=+2205341 p = 0           FAIL !!!!!!!!
  Gap-16:A                          R=+1071012 p = 0           FAIL !!!!!!!!
  Gap-16:B                          R=+5233812 p = 0           FAIL !!!!!!!!
  FPF-14+6/16:(0,14-0)              R=+2543810 p = 0           FAIL !!!!!!!!
  FPF-14+6/16:all                   R=+2725520 p = 0           FAIL !!!!!!!!
  FPF-14+6/16:cross                 R=+3876356 p = 0           FAIL !!!!!!!!
  BRank(12):128(4)                  R= +5257  p~=  1e-2796    FAIL !!!!!!!!
  BRank(12):256(2)                  R= +7614  p~=  4e-2293    FAIL !!!!!!!!
  BRank(12):384(1)                  R= +8139  p~=  3e-2451    FAIL !!!!!!!!
  BRank(12):512(2)                  R=+15408  p~=  3e-4639    FAIL !!!!!!!!
  BRank(12):768(1)                  R=+16406  p~=  1e-4939    FAIL !!!!!!!!
  BRank(12):1K(1)                   R=+21917  p~=  1e-6598    FAIL !!!!!!!!
  [Low1/8]BCFN(2+0,13-6,T)          R=+732279 p = 0           FAIL !!!!!!!!
  [Low1/8]BCFN(2+1,13-6,T)          R=+389751 p = 0           FAIL !!!!!!!!
  [Low1/8]BCFN(2+2,13-6,T)          R=+203559 p = 0           FAIL !!!!!!!!
  [Low1/8]BCFN(2+3,13-6,T)          R=+104928 p = 0           FAIL !!!!!!!!
  [Low1/8]BCFN(2+4,13-7,T)          R=+65226  p = 0           FAIL !!!!!!!!
  [Low1/8]BCFN(2+5,13-8,T)          R=+39325  p =  4e-9981    FAIL !!!!!!!!
  [Low1/8]BCFN(2+6,13-8,T)          R=+19863  p =  7e-5042    FAIL !!!!!!!!
  [Low1/8]BCFN(2+7,13-9,T)          R=+11499  p =  7e-2585    FAIL !!!!!!!!
  [Low1/8]BCFN(2+8,13-9,T)          R= +5769  p =  1e-1297    FAIL !!!!!!!!
  [Low1/8]DC6-9x1Bytes-1            R=+483643 p = 0           FAIL !!!!!!!!
  [Low1/8]Gap-16:A                  R=+223041 p = 0           FAIL !!!!!!!!
  [Low1/8]Gap-16:B                  R=+888483 p = 0           FAIL !!!!!!!!
  [Low1/8]FPF-14+6/16:(0,14-2)      R=+545115 p = 0           FAIL !!!!!!!!
  [Low1/8]FPF-14+6/16:all           R=+590546 p = 0           FAIL !!!!!!!!
  [Low1/8]FPF-14+6/16:cross         R=+530789 p = 0           FAIL !!!!!!!!
  [Low1/8]BRank(12):128(2)          R= +3717  p~=  5e-1120    FAIL !!!!!!!!
  [Low1/8]BRank(12):256(2)          R= +7614  p~=  4e-2293    FAIL !!!!!!!!
  [Low1/8]BRank(12):384(1)          R= +8139  p~=  3e-2451    FAIL !!!!!!!!
  [Low1/8]BRank(12):512(1)          R=+10895  p~=  1e-3280    FAIL !!!!!!!!

Sometimes you can in fact be sure.

General setup

Below, we will run PractRand on variants of the System.Random implementation. For the reader who is interested in reproducing the results, we note that in all cases, we ported the .NET algorithm to C++ in the interest of performance. Indeed, we were able to achieve x15 speed-ups over our reference C# implementations before PractRand itself became a bottleneck.

To ensure that we were in fact testing the .NET algorithm, the C++ output streams would be compared to those produced by a reference C# implementation.

In each case, we would use a seed value of 0 for reproducibility.

Testing the additive generator

Our first victim will be the raw additive generator. The state initialization is that of the implementation of System.Random, except we remove the curious issues on signedness mentioned above.

additive | RNG_test stdin32
RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0xf53f2eab
test set = normal, folding = standard (32 bit)

rng=RNG_stdin32, seed=0xf53f2eab
length= 64 megabytes (2^26 bytes), time= 2.0 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low8/32]BCFN(2+1,13-5,T)         R= +31.9  p =  1.1e-12    FAIL
  [Low8/32]BCFN(2+2,13-5,T)         R= +21.9  p =  9.3e-9    VERY SUSPICIOUS
  [Low8/32]BCFN(2+3,13-5,T)         R= +10.5  p =  2.7e-4   unusual
  [Low8/32]BRank(12):512(2)         R=+124.7  p~=  1.4e-38    FAIL !!!
  [Low8/32]BRank(12):768(1)         R=+777.1  p~=  6.0e-235   FAIL !!!!!!
  [Low1/32]BCFN(2+0,13-6,T)         R= +1665  p =  1.1e-570   FAIL !!!!!!!
  [Low1/32]BCFN(2+1,13-6,T)         R=+443.5  p =  2.6e-152   FAIL !!!!!
  [Low1/32]BCFN(2+2,13-7,T)         R=+361.3  p =  2.5e-109   FAIL !!!!!
  [Low1/32]BCFN(2+3,13-7,T)         R=+312.2  p =  1.5e-94    FAIL !!!!!
  [Low1/32]BCFN(2+4,13-8,T)         R=+191.0  p =  2.0e-49    FAIL !!!!
  [Low1/32]BCFN(2+5,13-8,T)         R=+102.8  p =  4.9e-27    FAIL !!
  [Low1/32]BCFN(2+6,13-9,T)         R= +38.4  p =  9.7e-10   VERY SUSPICIOUS
  [Low1/32]DC6-9x1Bytes-1           R=+744.9  p =  9.6e-437   FAIL !!!!!!!
  [Low1/32]BRank(12):128(2)         R= +2073  p~=  4.2e-625   FAIL !!!!!!!
  [Low1/32]BRank(12):256(2)         R= +5970  p~=  3e-1798    FAIL !!!!!!!!
  [Low1/32]BRank(12):384(1)         R= +6977  p~=  2e-2101    FAIL !!!!!!!!
  ...and 92 test result(s) without anomalies

While not quite as dramatically as for the Dilbert generator, the tests still quickly fail, as we would expect given our knowledge on output bias.

Testing Next and NextDouble

The method System.Random.Next provides positive integers between $1$ and $2^{31}-1$. As mentioned, the main difference from the algorithm above is that in some cases, an odd number is added to the random value before it is used.

Before putting PractRand on the task, we note that as a 32-bit signed int generator, something which outputs only positive numbers is necessarily quite biased, so in order to get comparable results, we remove the 8 most significant bits from the generated numbers; in terms of bias, if the 24 remaining bits are biased, then the same would be the case for the full 32 bits.

>next | RNG_test stdin
RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0xb5946b40
test set = normal, folding = standard(unknown format)

rng=RNG_stdin, seed=0xb5946b40
length= 64 megabytes (2^26 bytes), time= 3.3 seconds
  Test Name                         Raw       Processed     Evaluation
  Gap-16:B                          R=  -4.2  p =1-1.2e-3   unusual
  ...and 138 test result(s) without anomalies

rng=RNG_stdin, seed=0xb5946b40
length= 128 megabytes (2^27 bytes), time= 7.1 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low4/32]DC6-9x1Bytes-1           R=  +5.7  p =  5.1e-3   unusual
  ...and 150 test result(s) without anomalies

rng=RNG_stdin, seed=0xb5946b40
length= 256 megabytes (2^28 bytes), time= 14.4 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/8]BCFN(2+0,13-4,T)          R= +15.4  p =  1.1e-6   very suspicious
  ...and 161 test result(s) without anomalies

rng=RNG_stdin, seed=0xb5946b40
length= 512 megabytes (2^29 bytes), time= 29.2 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/8]BCFN(2+0,13-3,T)          R= +25.1  p =  1.1e-11    FAIL
  ...and 170 test result(s) without anomalies

This time around, the generator puts up a fight but still succumbs faster than one would desire. At this point, let us compare the results with those obtained by patching the typo, changing inextp = 21 to inextp = 31. The results then look as follows:

>next31 | RNG_test stdin

RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0x3f53a52b
test set = normal, folding = standard(unknown format)

rng=RNG_stdin, seed=0x3f53a52b
length= 64 megabytes (2^26 bytes), time= 3.2 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/8]DC6-9x1Bytes-1            R=  +5.8  p =  6.1e-3   unusual
  ...and 138 test result(s) without anomalies

rng=RNG_stdin, seed=0x3f53a52b
length= 128 megabytes (2^27 bytes), time= 7.1 seconds
  no anomalies in 151 test result(s)

rng=RNG_stdin, seed=0x3f53a52b
length= 256 megabytes (2^28 bytes), time= 13.9 seconds
  no anomalies in 162 test result(s)

rng=RNG_stdin, seed=0x3f53a52b
length= 512 megabytes (2^29 bytes), time= 28.1 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/8]BCFN(2+0,13-3,T)          R= +15.3  p =  5.1e-7   very suspicious
  ...and 170 test result(s) without anomalies

rng=RNG_stdin, seed=0x3f53a52b
length= 1 gigabyte (2^30 bytes), time= 55.6 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/8]BCFN(2+0,13-3,T)          R= +35.0  p =  2.3e-16    FAIL !
  [Low1/8]BCFN(2+1,13-3,T)          R= +11.6  p =  2.7e-5   mildly suspicious
  ...and 181 test result(s) without anomalies

This lasts a bit longer than its $l = 34$ counterpart, although I doubt that one can conclude much on the basis of this.

The very careful reader will have noticed that we are specifying the chunk size of the produced bits. If we erroneously claimed that the random numbers above came in chunks of 32 bits, this would in fact fool the tests:

next | RNG_test stdin32  <-- Note the "32"

RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0xfc4bc868
test set = normal, folding = standard (32 bit)

rng=RNG_stdin32, seed=0xfc4bc868
length= 64 megabytes (2^26 bytes), time= 2.5 seconds
  Test Name                         Raw       Processed     Evaluation
  Gap-16:B                          R=  -4.2  p =1-1.2e-3   unusual
  ...and 107 test result(s) without anomalies

rng=RNG_stdin32, seed=0xfc4bc868
length= 128 megabytes (2^27 bytes), time= 5.7 seconds
  no anomalies in 117 test result(s)

...

rng=RNG_stdin32, seed=0xfc4bc868
length= 4 gigabytes (2^32 bytes), time= 169 seconds
  no anomalies in 156 test result(s)

rng=RNG_stdin32, seed=0xfc4bc868
length= 8 gigabytes (2^33 bytes), time= 338 seconds
  no anomalies in 165 test result(s)

As is evident from the reference source, NextDouble uses Next directly, so any worries about bias in one would also be relevant for the other. We chose to focus on Next simply to avoid having to think too hard about the impact on randomness of integer-to-floating-point conversions.

Testing Next(Int32, Int32)

Finally, let us see what happens when we sample integers within the full 32-bit range, using System.Random.Next(Int32.MinValue, Int32.MaxValue). This is a slightly different beast, since it is based on the signed-int generator from above. In practice, the generator runs twice; in the first iteration, the least significant 31 bits are determined, and in the second iteration, a new number is drawn and its least significant bit is used as the most significant bit for the first output. That is, we are calculating something similar to $Y_n = \lvert X_n \rvert (-1)^{X_{n-1} \mod 2}$, $n = 2, 4, 6, \dots$, discarding roughly half of the information from the generator. Looking only at the least significant 31 bits, this amounts to skipping every second number, but curiously, according to PractRand, this has a positive impact on randomness:

RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0x8ff01b73
test set = normal, folding = standard (32 bit)

rng=RNG_stdin32, seed=0x8ff01b73
length= 64 megabytes (2^26 bytes), time= 2.2 seconds
  no anomalies in 108 test result(s)

[... No warnings between 128MB and 32GB ... ]

rng=RNG_stdin32, seed=0x8ff01b73
length= 64 gigabytes (2^36 bytes), time= 2865 seconds
  no anomalies in 189 test result(s)

rng=RNG_stdin32, seed=0x8ff01b73
length= 128 gigabytes (2^37 bytes), time= 5857 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/32]DC6-9x1Bytes-1           R=  -4.1  p =1-6.8e-3   unusual
  ...and 195 test result(s) without anomalies

rng=RNG_stdin32, seed=0x8ff01b73
length= 256 gigabytes (2^38 bytes), time= 11670 seconds
  no anomalies in 204 test result(s)

rng=RNG_stdin32, seed=0x8ff01b73
length= 512 gigabytes (2^39 bytes), time= 23534 seconds
  no anomalies in 213 test result(s)

That is, within the limits of my own patience, no real problems were seen by PractRand. It would however be interesting to run this all the way to 32 TB to see if any difference is seen. Note that compiling the RNG into PractRand itself does significantly speed up testing:

rng=SystemRandom, seed=0x78683ddd
length= 64 gigabytes (2^36 bytes), time= 1244 seconds
  no anomalies in 189 test result(s)

The same tests were run with both the patched and unpatched value for $l$; the unpatched is the one shown above, but the patched gives almost identical results.

Conclusion

If anybody were to claim that one should avoid using vanilla additive/subtractive generators in the context of simulation, the above would suggest that they are in the right. However, if batteries of statistical tests are taken to be the gold standard for determining quality of random generation, the modifications applied by Microsoft does improve the situation; in the case of Next(Int32, Int32) so much so that no quantitative argument can be made against it.

Nevertheless, relying on using Next(Int32, Int32) with large ranges in order to obtain randomness is bound to fail in practice, and if I were asked by someone to provide a recommendation for a generator, I would first tell them to find someone more knowledgable on the topic, and when pushed a bit further, I would probably suggest having a look at something like the battle-tested Mersenne Twister (which fails at 512 GB), or the relative newcomer PCG.

Appendix: Code

Nine

#include <iostream>

int main() {
    char nine[1] = { 9 };
    while (true) fwrite(nine, 1, 1, stdout);
}

Additive

In the below, the notation is chosen to follow the .NET implementation as closely as possible.

int main() {
    // Force binary output on stdout; otherwise `fwrite` turns
    // 0x0A into 0x0D0A (\n -> \r\n). Thanks Windows.
    _setmode(_fileno(stdout), O_BINARY);

    int ii;
    unsigned int mj = 161803398;
    unsigned int mk = 1;
    unsigned int seedArray[56];
    for (int k = 1; k < 55; k++)
        seedArray[k] = 0;

    seedArray[55] = mj;
    for (int i = 1; i < 55; i++) {
        ii = (21 * i) % 55;
        seedArray[ii] = mk;
        mk = mj - mk;
        mj = seedArray[ii];
    }
    for (int k = 1; k < 5; k++) {
        for (int i = 1; i < 56; i++) {
            seedArray[i] -= seedArray[1 + (i + 30) % 55];
        }
    }

    int inext = 0;
    int inextp = 55 - 24;  // That is, k-l

    while (true) {
        if (++inext >= 56) inext = 1;
        if (++inextp >= 56) inextp = 1;
        unsigned int retVal = seedArray[inext] + seedArray[inextp];
        seedArray[inext] = retVal;

        // Convert to byte array and output
        char output[4];
        for (int i = 0; i < 4; i++)
            output[i] = (retVal >> (i * 8));
        fwrite(output, 1, 4, stdout);
    }
}

Next

C++ version:

int main() {
    _setmode(_fileno(stdout), O_BINARY);

    int seed = 0;
    const int MBIG = 2147483647;

    int ii;
    int mj = 161803398;
    int mk = 1;
    int seedArray[56];
    for (int k = 1; k < 55; k++)
        seedArray[k] = 0;

    seedArray[55] = mj;
    for (int i = 1; i < 55; i++) {
        ii = (21 * i) % 55;
        seedArray[ii] = mk;
        mk = mj - mk;
        if (mk < 0) mk += MBIG;
        mj = seedArray[ii];
    }
    for (int k = 1; k < 5; k++) {
        for (int i = 1; i < 56; i++) {
            seedArray[i] -= seedArray[1 + (i + 30) % 55];
            if (seedArray[i] < 0) seedArray[i] += MBIG;
        }
    }

    int inext = 0;
    int inextp = 21; // Note: Knuth wants this to be 31

    while (true) {
        if (++inext >= 56) inext = 1;
        if (++inextp >= 56) inextp = 1;

        int retVal = seedArray[inext] - seedArray[inextp];
        if (retVal == MBIG) retVal--;
        if (retVal < 0) retVal += MBIG;

        seedArray[inext] = retVal;
        char output[3];
        for (int i = 0; i < 3; i++)
            output[i] = (retVal >> (i * 8));
        fwrite(output, 1, 3, stdout);
    }
}

Reference .NET implementation:

static void Main(string[] args)
{
    var rng = new Random(0);
    var buffer = new byte[3];
    using (var outputStream = Console.OpenStandardOutput())
    {
        while (true)
        {
            int next = rng.Next();
            for (int i = 0; i < 3; i++)
                buffer[i] = (byte)(next >> (i * 8));
            outputStream.Write(buffer, 0, 3);
        }
    }
}

Next(Int32, Int32).

Inline ALL the things:

int main() {
    _setmode(_fileno(stdout), O_BINARY);

    int seed = 0;
    const int MSEED = 161803398;
    const int MBIG = 2147483647;
    const int MLOW = -2147483647 - 1;
    const long long RANGE = 4294967295L;

    // Begin Random.Random
    int ii;
    int mj = 161803398;
    int mk = 1;
    int seedArray[56];
    for (int k = 0; k < 55; k++) {
        seedArray[k] = 0;
    }

    seedArray[55] = mj;
    for (int i = 1; i < 55; i++) {
        ii = (21 * i) % 55;
        seedArray[ii] = mk;
        mk = mj - mk;
        if (mk < 0) mk += MBIG;
        mj = seedArray[ii];
    }
    for (int k = 1; k < 5; k++) {
        for (int i = 1; i < 56; i++) {
            seedArray[i] -= seedArray[1 + (i + 30) % 55];
            if (seedArray[i] < 0) seedArray[i] += MBIG;
        }
    }
    int inext = 0;
    int inextp = 21; // Here's the fix: Knuth uses 31; System.Random uses 21
    // End Random.Random

    int next;
    bool shouldWrite = true;
    while (true) {
        // Begin Random.Next
        // Begin Random.GetSampleForLongRange
        // Begin Random.InternalSample
        int retVal;
        int locINext = inext;
        int locINextp = inextp;

        if (++locINext >= 56) locINext = 1;
        if (++locINextp >= 56) locINextp = 1;

        retVal = seedArray[locINext] - seedArray[locINextp];

        if (retVal == MBIG) retVal--;
        if (retVal < 0) retVal += MBIG;

        seedArray[locINext] = retVal;

        inext = locINext;
        inextp = locINextp;
        //double sample = retVal * (1.0 / MBIG);

        // End Random.InternalSample
        // Begin Random.InternalSample
        // .NET runs this method again only to determine the sign of the new random value.
        int retVal2;
        int locINext2 = inext;
        int locINextp2 = inextp;

        if (++locINext2 >= 56) locINext2 = 1;
        if (++locINextp2 >= 56) locINextp2 = 1;

        retVal2 = seedArray[locINext2] - seedArray[locINextp2];

        if (retVal2 == MBIG) retVal2--;
        if (retVal2 < 0) retVal2 += MBIG;

        seedArray[locINext2] = retVal2;

        inext = locINext2;
        inextp = locINextp2;
        // End Random.InternalSample round 2
        bool negative = retVal2 % 2 == 0;
        if (negative) {
            retVal = -retVal;
        }
        double sample = retVal;
        sample += (MBIG - 1);
        sample /= 2 * (unsigned int)MBIG - 1;
        // End Random.GetSampleForLongRange
        next = (int)((long long)(sample*RANGE) + MLOW);
        // End Random.Next
        char output[4];
        for (int i = 0; i < 4; i++)
            output[i] = (next >> (i * 8));
        fwrite(output, 1, 4, stdout);
    }
}

Reference .NET implementation:

static void Main(string[] args)
{
    var rng = new Random(0);
    var buffer = new byte[4];
    using (var outputStream = Console.OpenStandardOutput())
    {
        while (true)
        {
            int next = rng.Next(Int32.MinValue, Int32.MaxValue);
            for (int i = 0; i < 4; i++)
                buffer[i] = (byte)(next >> (i * 8));
            outputStream.Write(buffer, 0, 4);
        }
    }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment