elm/random
limits
Exploration of The goal of this exploration is to find out what happens in Random.int 0 <N>
for various N, particularly along the top end.
The generator says:
Generate 32-bit integers in a given range.
This generator can produce values outside of the range
[minInt, maxInt]
but sufficient randomness is not guaranteed.
But what does that mean exactly? What randomness do we get?
Note that:
Random.maxInt
is2147483647
=2^31 - 1
Random.minInt
is-2147483648
=-(2^31)
Elm works with JS 64bit floats, which gives it safe integers (Number.isSafeInteger
) in the range -9007199254740991..9007199254740991
(-(2^53 - 1)..2^53 - 1
).
But, presumably because of the specific PCG implementation we're limited to 32 bits. How though?
Let's explore the boundaries.
Macro: Powers of 2
First, I'm going to generate lists of 1000 numbers for Random.int 0 (2^N - 1)
where N=1..64
.
ℹ️ I'm using the- 1
because that seems to be the happy path. Don't worry, I'll explore values that don't fit the happy path too.
ℹ️ I'm using the random seed0
everywhere.
These files are generated by the run.sh
Bash script and are named bits*.txt
.
We can see right away from the sampled numbers and the MD5 hashes that around N=32 we start getting a streak of the same values, duplicated across N=32..53
, and then a streak of zeros around N=54..64
:
Thus, the first two insights are:
💡 ⚠️ It's useless to useRandom.int 0 (2^N - 1)
withN > 32
(it will bring you no more randomness than withN = 32
).
💡 🚷 It's dangerous to useRandom.int 0 (2^N - 1)
withN > 53
(the algorithm will give you zeros only).
Our macro-level overview is done, but we still need to zoom into the micro-level view to see where exactly these changes happen, and whether it's safe for us to use at least the N = 32
, that is, Random.int 0 (2^32 - 1)
.
Micro: Powers' neighbourhood
To check this, I'm again going to generate lists of 1000 numbers, but this time for specific integers in the neighbourhood of the 2^N - 1
, for N=31..33
.
These files are generated by the single.sh
Bash script and are named single*.txt
.
Interestingly, for
4294967298
(= 2^32 + 2),Random.int 0 N
takes much longer than for the previous numbers, to the point where I believed it is stuck in an infinite loop of some kind. But nah, it is making progress -- if you generate a list of just 2 numbers, you will get them, it will just take long time. That's why the filesingle4294967298.txt
is missing.
Let's plot them:
Around 2^31 we can see the library start to lose its breath: things are mostly the same (but still, ever so slightly changing) as you vary the max integer. The MD5 hashes differ (although above 2^31 - 1
it's not guaranteed), so all is good so far:
But the further up we go, the more problematic things become, duplicates start happening, and we can see that 2^32 - 2, 2^32 - 1, 2^33 - 2 and 2^33 - 1 are identical:
Have you noticed though? The numbers for 2^32 - 1 go up to 4 billion!
This means the library has managed to give us a roughly uniform range of numbers up to 2^32 - 1! Hey, I'll take that.
I am no cryptographer but from the chart, this range of numbers seems good enough for me for day-to-day use.
We won't have any such luck above that though, because a wraparound happens: at 2^32 we get all zeros, at 2^32 + 1 we get the same result as for 2^1 - 1 (check the MD5 hashes for bits01.txt
and single4294967297.txt
!), and the same thing happens around 2^33:
So let's summarize these new insights:
💡 ♻️ Random.int 0 n
is roughly the same asRandom.int 0 (modBy (2^32) n)
(if we ignore the weird zeroing behaviour around 2^53 and above): the largest number we can theoretically get out ofRandom.int
is 2^32 - 1.
💡 🎉 For the maximum range of generated unsigned integers, useRandom.int 0 (2^32 - 1)
.
And that Random.int Random.minInt Random.maxInt
is there to give you the maximum range of values (Random.maxInt - Random.minInt + 1 == 2^32
) and both positive and negative integers. If you only care about non-negative ones though, you can use Random.int 0 (2^32 - 1)
without any perceivable loss of uniformness.
Exercises for the reader
(As in, I didn't need answers to the below for what I need to do, so I'm not gonna bother
- Do an analogous exploration for the negative numbers. Note that you'll get the "breakpoints" and best performance with
Random.int low high
wherehigh - low + 1
is a power of 2. My hypothesis is that withhigh = 0
the bestlow
should be -4294967295 = -(2^32 - 1). - Vary the initial random seed: do any of the findings change?
- (Purely academical and of no use whatsoever) Why does the zeroing around 2^53 - 1 happen? Where exactly does it happen? Is it only happening for the "almost" powers of 2 or is it happening in all the neighbours too?