/* Written in 2017 by Tommy Ettinger (tommy.ettinger@gmail.com) | |
To the extent possible under law, the author has dedicated all copyright | |
and related and neighboring rights to this software to the public domain | |
worldwide. This software is distributed without any warranty. | |
See <http://creativecommons.org/publicdomain/zero/1.0/>. */ | |
#include <stdint.h> | |
/* This is a 32-bit variant on the 63-bit Thrust PRNG, adapted for possible | |
usage in embedded hardware with 32-bit registers because TonyB on the prng | |
mailing list seemed like he could use this. Thrust is still in development, | |
mainly at another Gist ( https://gist.github.com/tommyettinger/e6d3e8816da79b45bfe582384c2fe14a ). | |
Mulberry uses a few multiplications compounded on the last result, much like | |
how a mulberry is a compound fruit, to help improve quality beyond what the | |
tiny state normally allows. It should have a period of 2 to the 32, exactly, | |
since the state changes to be each 32-bit unsigned integer before cycling. | |
The speed of this generator hasn't even been tested, and is probably less | |
than desirable on a 64-bit desktop processor, but it's meant for 32-bit | |
hardware. It passes gjrand's 13 tests with no failures and a total P-value | |
of 0.984 (where 1 is perfect and 0.1 or less is a failure) on 4GB of | |
generated data. That's a quarter of the full period. On the same amount, the | |
comparable SplitMix32 generator has a total P-value of 0, and had multiple | |
failures of extreme significance; these were two rank 1 minor failures but | |
also two more at rank 20 and rank 21, each an irredeemable failure on its | |
own. Testing on the full period (2 to the 32 numbers with 4 bytes each, for | |
a total of 16GB of data), Mulberry32 still does very well (for a generator | |
with 32 bits of state) with total P = 0.753 and all 13 tests passed. Since | |
SplitMix32 had failed beyond any hope of improvement at a small size, to | |
compare I tried SplitMix64, with 64 bits of state and generating 64 bits at | |
a time. On 4GB of data, SplitMix64's total P-value is 0.494, and it has two | |
rank 1 failures. On 16GB of data, SplitMix64's total P-value is 0.396, and | |
it has 1 rank 1 failure. To compare what is possible at that state size, a | |
variant on Thrust with 64 bits of state and 64 bit output passes 13 of 13 | |
tests and gets a P-value of 0.91 when testing on 4GB of data, and a P-value | |
of 0.857 while still passing all tests on 16GB of data. Both SplitMix64 and | |
the 64-bit Thrust variant use fewer operations than Mulberry32 to obtain each | |
result (and that result is twice the size); if you can use a PRNG that works | |
with 64-bit math, you generally will benefit. | |
*/ | |
uint32_t x; /* The state can be seeded with any value. */ | |
/* Call next() to get 32 pseudo-random bits, call it again to get more bits. */ | |
// It may help to make this inline, but you should see if it benefits your code. | |
uint32_t next(void) { | |
uint32_t z = (x += 0x6D2B79F5UL); | |
z = (z ^ (z >> 15)) * (z | 1UL); | |
z ^= z + (z ^ (z >> 7)) * (z | 61UL); | |
return z ^ (z >> 14); | |
} | |
/* The one large constant, 0x6D2B79F5UL, is tightly linked to the shift amounts | |
used later in the code. It is probably a bad idea to change the constant without | |
verifying that the quality is still OK in a robust testing suite like gjrand or | |
maybe TestU01's BigCrush (this may be unable to pass BigCrush simply because its | |
state size is too small). This constant was found by repeatedly editing the | |
constant used by Thrust in a 64-bit version (which in turn was found by shifting | |
the constant used for Thrust in its 63-bit version, and changing the lowest and | |
highest few bits), adjusting the shift amounts to match specific patterns that | |
could occur (or be edited into) the constant. That on its own didn't prove | |
enough, so a bitwise or with 61 was tried (along with other, smaller numbers; 29 | |
was used in an earlier version), and in conjunction with the add-to-product-then- | |
xor step, that seems to do the trick and yield high quality for the state size. | |
The constant was changed between the first and second versions, but only a few | |
bits actually changed and the shifts were all fine where they were. One of the | |
bitwise OR values did change, which proved key to bringing the 16GB test to a | |
point where it has no failures. | |
*/ |
@tommyettinger Yo, thanks for this. Needed a decent single base value step randomizer, I had been using mulberry32 for a while but found issues with it. Here's what I put in javascript:
let base = 0; // ideally a random starting number
function next () {
let z = (base += 0x9e3779b9);
z ^= z >>> 16;
z = Math.imul(z, 0x21f0aaad);
z ^= z >>> 15;
z = Math.imul(z, 0x735a2d97);
z ^= z >>> 15;
return z;
}
Wow, I am out of touch with JS; I didn't even know Math.imul existed yet. Your code looks perfectly fine; I'm glad you used >>>
instead of >>
; that difference trips me up all the time when going between C or C# and Java. I think it's time I looked into what has changed in modern JS...
I wish you luck on your journey, and from what I could see on that issue you linked, there's potential for this algorithm to be improved even further, which is cool to think about
Revisiting this 5 years later, this is not great. Mulberry32 isn't equidistributed, so it doesn't produce all possible 32-bit values, and actually can't produce about 1/3 of all possible
uint32_t
numbers. A better approach would be to do what SplitMix64 does, just for 32-bit; this means a simple additive counter with a large increment (like Mulberry32), but run through a better unary hash:uint32_t next(void) { uint32_t z = (x += 0x9E3779B9u); // 2 to the 32 divided by golden ratio; adding forms a Weyl sequence z ^= z >> 16; z *= 0x21f0aaadu; z ^= z >> 15; z *= 0x735a2d97u; z ^= z >> 15; return z; }
This uses a mixer from skeeto/hash-prospector#19 (comment) -- thanks to skeeto and TheIronBorn ! The upside here is that all
uint32_t
can be produced. The downside is that nouint32_t
will be produced more than once, or less than once, per 2-to-the-32-generations period. I haven't run this one through gjrand or PractRand. I think it's not a good sign for the tests if a generator like Mulberry32 that can't produce over a billion results out of 4294967296 can actually pass any length of either of those... PractRand does immediately fail any generator that exceeds its period, so Mulberry does fail at the 32GB mark. It's just good enough at faking it, and seeming to have a larger period than it does, that lets it pass at the 16GB mark. If that's what you want, you should use Mulberry32. If you don't want large clumps and holes in the results, consider using a different hash onz
like the one I pasted above.
I've been told that the nearly-identical SplitMix32 (using MurmurHash3 fmix32
constants) does not have a period of 2^32, and has a 'fixed point' at 0xe1aff528
, which made me wary of using it (not sure what that means though). Is that a concern for this one too?
SplitMix32 does have a period of 2 to the 32; it runs on a 32-bit counter and produces a unique output for every input, so it only repeats after it has exhausted all 2 to the 32 uint32_t values. A fixed point is when running a function on an input uint32_t A returns A without change; many mixing functions have at least one of these, where one of the fixed points is often 0. The 64-bit unary hash MX3, for example, has a fixed point at 0, and could easily have a fixed point at some other input. For the pseudo-random number generator in your quote, it doesn't have any fixed points if you compare x before next() is called with the output of that next() call. It has 3 "fixed point-like outputs" if you compare the output of next() with the state of x after that call: 0x00000000, 0xE85BC599, 0x77E180C6. These aren't really fixed points because if you give 0x00000000 as the value of x (an input), then x will have 0x9E3779B9 added to it and the result won't be 0x00000000 (the input has to be the same as the output for it to be a fixed point). The sequence is still length 2 to the 32. It sounds like whoever told you about the fixed point at 0xe1aff528 was referring to repeatedly running fmix32() on the previous output, which is different from SplitMix32's algorithm (where it is run on a counter). If you run a function on its last output rather than an internal state, then a fixed point can indeed be bad, since if you know a fixed point you can make it the only output the sequence will produce.
There's a lot of solid research into what gives functions the ideal cycle length/period, and the main thing that is needed is for the function to be reversible -- if you know a state, you have to be able to know the single possible state that came before it, since there's only one state possible after any given state. If state Z can be entered with one step from both states X and Y, then the generator won't be full-period. Having a reversible state transition doesn't guarantee that the function is full-period, it just shows that it isn't going to shrink into a smaller cycle at some point. It can still have multiple non-overlapping cycles, and any fixed points in the state transition function will have cycle length 1. This is the case for Romu-Random generators if given the (disallowed) states that are just repeated 0s -- if you start a RomuTrio generator with states 0,0,0, then any later states will be 0,0,0 and the outputs will all be 0. There's a good reason that a popular state transition function is a counter with an odd-number increment -- those have a full period since the counter will reach every possible number of its word size before it repeats, and counters with odd increments don't have any fixed points. Similarly, a linear congruential generator can be a good state transition function; that's what PCG-Random uses (LCGs don't have fixed points either, and also reach every number of their word size if constructed correctly).
If you're concerned about the period of a 32-bit generator, there is some very good news for you, because 32-bit generators take less than a minute to exhaust all possible inputs on a modern PC. My Java code I used to test the fixed-points for the above code looks like:
public void test32BitFixedPoint()
{
int i = 0x80000000, state = -1;
for (; i < 0; i++) { // runs over all negative numbers first
int y = state; // state coming into this call
int z = (state += 0x9E3779B9); // state changes, y is unchanged.
z ^= z >>> 16;
z *= 0x21f0aaad;
z ^= z >>> 15;
z *= 0x735a2d97;
z ^= z >>> 15; // z would normally be the output here
if(y == z) // if the incoming state is equal to the output, print the fixed point
System.out.printf("0x%08X, ", z);
}
for (; i >= 0; i++) { // then runs over all positive numbers, stopping on wrap from overflow
int y = state;
int z = (state += 0x9E3779B9);
z ^= z >>> 16;
z *= 0x21f0aaad;
z ^= z >>> 15;
z *= 0x735a2d97;
z ^= z >>> 15;
if(y == z)
System.out.printf("0x%08X, ", z);
}
}
To test the period, you need to not check the outputs, but the state -- when the state is identical to one encountered earlier, the generator has cycled. Normally for that I have a similar pair of loops (going over all negative, then all positive, to keep the loop size manageable), and check if the state coming out of a generation is the same as the state the generator started in. If the state transition isn't reversible, this won't necessarily ever return to the original state; that's why there's still the two loops to limit it to 2 to the 32 generations.
@tommyettinger Yo, thanks for this. Needed a decent single base value step randomizer, I had been using mulberry32 for a while but found issues with it. Here's what I put in javascript:
let base = 0; // ideally a random starting number function next () { let z = (base += 0x9e3779b9); z ^= z >>> 16; z = Math.imul(z, 0x21f0aaad); z ^= z >>> 15; z = Math.imul(z, 0x735a2d97); z ^= z >>> 15; return z; }
Hi @spacefluff432, I found a small issue with your code. Keep in mind that regular numbers in Javascript are basically doubles. Any bitwise operator coerces them into 32 bit ints, but this doesn't happen to the base
variable in your code. So base
just increments and increments well above the 32-bit range. This is not an issue at first, as the assignment to z
and its first XOR and SHIFT gets it back into 32 bits, but base
starts to lose the lower bits as it increases above 2^53, the integer limit of a double. This happens after 3,393,264 iterations.
I therefore propose this fix:
let base = 0; // ideally a random starting number
function next () {
base = (base + 0x9e3779b9) | 0; // the `| 0` coerces it into a 32-bit int
let z = base;
z ^= z >>> 16;
z = Math.imul(z, 0x21f0aaad);
z ^= z >>> 15;
z = Math.imul(z, 0x735a2d97);
z ^= z >>> 15;
return z;
}
If you purely want positive values in the range 0..0xffffffff, you can change the return statement into return z >>> 0
(which is easier if you ultimately want a floating point value between 0 and 1, so you can simply divide the result by 0x100000000).
And for good measure, my 2 cents on alternative implementations to hide that global:
function mulberry32(seed)
{
return function()
{
seed = (seed + 0x9e3779b9) | 0;
let z = seed;
z ^= z >>> 16;
z = Math.imul(z, 0x21f0aaad);
z ^= z >>> 15;
z = Math.imul(z, 0x735a2d97);
z ^= z >>> 15;
return z;
}
}
function* mulberry32_gen(seed)
{
for(;;)
{
seed = (seed + 0x9e3779b9) | 0;
let z = seed;
z ^= z >>> 16;
z = Math.imul(z, 0x21f0aaad);
z ^= z >>> 15;
z = Math.imul(z, 0x735a2d97);
z ^= z >>> 15;
yield z;
}
}
Good point, @oisyn , I've hit that issue myself when using GWT (which transpiles Java to JS, but that means a Java int
becomes a JS Number
). The | 0
works there as well.
Revisiting this 5 years later, this is not great. Mulberry32 isn't equidistributed, so it doesn't produce all possible 32-bit values, and actually can't produce about 1/3 of all possible
uint32_t
numbers. A better approach would be to do what SplitMix64 does, just for 32-bit; this means a simple additive counter with a large increment (like Mulberry32), but run through a better unary hash:This uses a mixer from skeeto/hash-prospector#19 (comment) -- thanks to skeeto and TheIronBorn ! The upside here is that all
uint32_t
can be produced. The downside is that nouint32_t
will be produced more than once, or less than once, per 2-to-the-32-generations period. I haven't run this one through gjrand or PractRand. I think it's not a good sign for the tests if a generator like Mulberry32 that can't produce over a billion results out of 4294967296 can actually pass any length of either of those... PractRand does immediately fail any generator that exceeds its period, so Mulberry does fail at the 32GB mark. It's just good enough at faking it, and seeming to have a larger period than it does, that lets it pass at the 16GB mark. If that's what you want, you should use Mulberry32. If you don't want large clumps and holes in the results, consider using a different hash onz
like the one I pasted above.