There are a couple of core questions to this proposal:
- how much breakage is acceptable to the
rand
crate API at this point? - how many RNGs should be included in this crate, and should "rand" functionality be split over multiple crates?
For now, I'll assume that breakage is acceptable and try to look at multiple options regarding organisation.
For cryptographic applications, there appear to be two schools of thought on psuedo-random-number-generators (PRNGs, often called RNGs), with relation to operating system (OS) random number provision:
- include a reasonably performant, cryptographically approved PRNG, and
support code for automatically initialising from an OS source and
conveniently generating numbers (basically,
thread_rng()
as it is now) - don't use a user-space PRNG at all; pull all numbers from an OS source
I have insufficient experience to take sides on this issue. However, it's not the whole story; games, simulators and other software often have no cryptographic requirements, but may want:
- fast generation
- good distribution and independence of samples
- ability to specify the seed and reproduce results
- an implementation of a published PRNG able to reproduce its results
- ability to plug a custom generator into existing framework
The current rand
crate satisfies most of this, but leaves open a core
question: how many RNGs should the rand
crate include?
I propose that the Rng
trait and some generation functionality should remain
in rand
, however have multiple proposals regarding other generators:
- Include only very few cryptography-approved generators in the
rand
crate (or none if numbers are sourced directly from the OS); have a companionrng
crate for usage when user-space generators are required - Accept PRs for well written and tested PRNGs (i.e. code quality is only real requirement)
- Somewhere in between; accept only PRNGs with good statistical qualities, external publication and a good rationale for inclusion (e.g. widely used or significantly better in some respects than other included algorithms)
Optionally, an additional crate rng-extra
could be maintained for PRNGs not
meeting criteria for inclusion in rand
/ rng
. The rationale for this is
that individually authored crates sometimes end up abandoned; however this is
only useful if there is incentive to keep rand
/ rng
small and also demand
for other algorithms.
My current favourite of these proposals is (1): include enough functionality
in rand
to cover low/medium frequency generation for cryptography and other
applications where reproducibility is not required, and maintain a companion
crate rng
accepting most good-quality PRNG implementations, for use with
applications requiring reproducibility, fast generation or a specific algorithm.
thread_rng
is convenient and should probably remain. I don't think it is a
sensible design to use for applications requiring reproducibility and possibly
not where very fast generation is required; for other applications there is the
question of whether this should source each number directly from the OS or keep
its current behaviour.
StdRng
appears to be a wrapper around a cryptography-approved PRNG for usage
when more performance is required, whose implementation could be switched out
by the rand
crate without affecting applications. This implies it cannot
satisfy the reproducibility requirement some PRNG applications have (if the
implementing PRNG must be fixed, applications might as well use that PRNG
directly).
Practically, does StdRng
have much utility? I suggest removing it, unless
someone can put forward a good argument for keeping it.
I propose keeping the core next_u32
, next_u64
and fill_bytes
methods
unchanged. PRNGs may use u32
or u64
type internally, and a "reader
generator" (returning randomness from a file, or maybe a USB device) may work
best with the fill_bytes
method, therefore the current approach (allowing
implementation of any, with default implementations for the latter two) is the
best compromise in my opinion.
Other methods would likely always be implemented in terms of the above three
(possibly excepting next_f32
and next_f64
); these are therefore not of
interest to implementors of Rng
.
The Rng
trait controls what functionality an implementing generator exposes.
Should it also be the user-interface through which many types of random values
are generated? Personally, I think not (but am not entirely decided on the
matter). Arguments against the current design of Rng
:
- unclear separation of implementations (PRNGs) and support functions (producing values of other types)
- support functions include both simple convertors (e.g.
gen::<i32>()
) and distributions (e.g.gen_weighted_bool()
)
Further, the generic gen()
method has different semantics depending on the
type generated:
- integers are uniformly distributed over the entire range
- floating points are restricted to the half-open range
[0, 1)
- the
char
type distributes over valid codepoints, excluding some values Option<T>
has 50% chance of yieldingNone
, which isn't uniform distribution for any T besides()
**Proposal: remove all methods on Rng
other than next_u32
, next_u64
and
fill_bytes
; also remove the Rand
trait. (Replacements are suggested below.)
This implies that rand::ramdom()
must also be removed (it depends on gen()
).
Add a gen
module (rand::gen
), including the following functions:
uniform<T>(&mut rng) -> T
for integers (u32
,i8
, etc.)uniform01<T>(&mut rng) -> T
for floatsf32
,f64
, mapping to half-open range[0, 1)
open01<T>(&mut rng) -> T
,closed01<T>(&mut rng) -> T
for floatsuniform_range<T>(&mut rng) -> T
for integers and floatscodepoint(&mut rng) -> char
Example:
let x: f64 = open01(&mut rng);
(A replacement for gen::<Option<T>>()
could also go here, but the existing
functionality seems obscure and the fixed 50% chance of None
arbitrary;
personally I don't see the need to keep this.)
The approach currently used for Open01
could be copied and the Rand
trait
retained:
let Open01(x): Open01::<f64> = rng.gen();
While neat, leveraging type deconstruction (pattern matching) like this is
likely confusing for newcomers, not good from a documentation perspective
(since impls cannot be documented) and open to abuse (e.g. the current
implementations for Rand
, choosing "smart" ways to generate various types,
must ultimately make arbitrary choices).
The existing Rand
implementations allow direct generation of tuples, e.g.
let (x, y): (f64, u32) = rng.gen();
This is neat, but again gets its brevity from undocumented impls and assumptions about how values should be generated for each type.
I propose removing this feature without direct replacement, falling back to:
let (x, y): (f64, u32) = (uniform01(&mut rng), uniform(&mut rng));
Currently, the rand
crate allows
rng.gen_iter::<i32>().take(10).collect::<Vec<i32>>();
If the generic gen()
function is removed this cannot remain, however, a
more verbose alternative should be possible:
rng.iter().map(|rng| uniform::<i32>(rng)).take(10).collect::<Vec<i32>>();
(In both examples, the redundant type specification can probably be removed.)
The Rng
trait has some other functionality:
fn gen_weighted_bool(&mut self, n: u32) -> bool
choose<'a, T>(&mut self, values: &'a [T]) -> Option<&'a T>
choose_mut<'a, T>(&mut self, values: &'a mut [T]) -> Option<&'a mut T>
shuffle<T>(&mut self, values: &mut [T])
The first three are quite simple, but may well be useful. The fourth also has quite simple code, but is complex enough to link to a documented algorithm.
These could simply be made free functions (rand::shuffle
etc.), alongside
the existing free function sample
.
Possibly sample
should be renamed to sample_subset
.
Random samples should always be independent, aside from internal details of
PRNGs. (Examples I have heard to the contrary are things like random walks, but
these are processes, not distributions.) As such, I propose removing the current
Sample
trait (which can modify its distribution), then renaming
IndepedentSample
to Sample
and ind_sample
to sample
.
RandomSample
is an adaptor to allow types implementing Rand
to be used in
terms of Sample
and IndependentSample
. Since this proposal removes Rand
,
RandomSample
should also be removed.
Weighted
and WeightedChoice
could be left as is but not marked stable for
now; I have heard some criticisms of usability but haven't looked into this.
Of the remaining distributions, Range
is the only one supporting generic types;
most others are restricted to f64
. I suggest Range
could be renamed to
UniformRange
, but it could also be left as is. I see no need to change any
other distributions.