Proposed revisions to rand
There are a couple of core questions to this proposal:
- how much breakage is acceptable to the
randcrate API at this point? - how many RNGs should be included in this crate, and should "rand" functionality be split over multiple crates?
For now, I'll assume that breakage is acceptable and try to look at multiple options regarding organisation.
Generators
Organisation
For cryptographic applications, there appear to be two schools of thought on psuedo-random-number-generators (PRNGs, often called RNGs), with relation to operating system (OS) random number provision:
- include a reasonably performant, cryptographically approved PRNG, and
support code for automatically initialising from an OS source and
conveniently generating numbers (basically,
thread_rng()as it is now) - don't use a user-space PRNG at all; pull all numbers from an OS source
I have insufficient experience to take sides on this issue. However, it's not the whole story; games, simulators and other software often have no cryptographic requirements, but may want:
- fast generation
- good distribution and independence of samples
- ability to specify the seed and reproduce results
- an implementation of a published PRNG able to reproduce its results
- ability to plug a custom generator into existing framework
The current rand crate satisfies most of this, but leaves open a core
question: how many RNGs should the rand crate include?
I propose that the Rng trait and some generation functionality should remain
in rand, however have multiple proposals regarding other generators:
- Include only very few cryptography-approved generators in the
randcrate (or none if numbers are sourced directly from the OS); have a companionrngcrate for usage when user-space generators are required - Accept PRs for well written and tested PRNGs (i.e. code quality is only real requirement)
- Somewhere in between; accept only PRNGs with good statistical qualities, external publication and a good rationale for inclusion (e.g. widely used or significantly better in some respects than other included algorithms)
Optionally, an additional crate rng-extra could be maintained for PRNGs not
meeting criteria for inclusion in rand / rng. The rationale for this is
that individually authored crates sometimes end up abandoned; however this is
only useful if there is incentive to keep rand / rng small and also demand
for other algorithms.
My current favourite of these proposals is (1): include enough functionality
in rand to cover low/medium frequency generation for cryptography and other
applications where reproducibility is not required, and maintain a companion
crate rng accepting most good-quality PRNG implementations, for use with
applications requiring reproducibility, fast generation or a specific algorithm.
Standard generation hooks/wrappers
thread_rng is convenient and should probably remain. I don't think it is a
sensible design to use for applications requiring reproducibility and possibly
not where very fast generation is required; for other applications there is the
question of whether this should source each number directly from the OS or keep
its current behaviour.
StdRng appears to be a wrapper around a cryptography-approved PRNG for usage
when more performance is required, whose implementation could be switched out
by the rand crate without affecting applications. This implies it cannot
satisfy the reproducibility requirement some PRNG applications have (if the
implementing PRNG must be fixed, applications might as well use that PRNG
directly).
Practically, does StdRng have much utility? I suggest removing it, unless
someone can put forward a good argument for keeping it.
Rng trait
I propose keeping the core next_u32, next_u64 and fill_bytes methods
unchanged. PRNGs may use u32 or u64 type internally, and a "reader
generator" (returning randomness from a file, or maybe a USB device) may work
best with the fill_bytes method, therefore the current approach (allowing
implementation of any, with default implementations for the latter two) is the
best compromise in my opinion.
Other methods would likely always be implemented in terms of the above three
(possibly excepting next_f32 and next_f64); these are therefore not of
interest to implementors of Rng.
Generation of random values
The Rng trait controls what functionality an implementing generator exposes.
Should it also be the user-interface through which many types of random values
are generated? Personally, I think not (but am not entirely decided on the
matter). Arguments against the current design of Rng:
- unclear separation of implementations (PRNGs) and support functions (producing values of other types)
- support functions include both simple convertors (e.g.
gen::<i32>()) and distributions (e.g.gen_weighted_bool())
Further, the generic gen() method has different semantics depending on the
type generated:
- integers are uniformly distributed over the entire range
- floating points are restricted to the half-open range
[0, 1) - the
chartype distributes over valid codepoints, excluding some values Option<T>has 50% chance of yieldingNone, which isn't uniform distribution for any T besides()
**Proposal: remove all methods on Rng other than next_u32, next_u64 and
fill_bytes; also remove the Rand trait. (Replacements are suggested below.)
This implies that rand::ramdom() must also be removed (it depends on gen()).
Generating simple values
Add a gen module (rand::gen), including the following functions:
uniform<T>(&mut rng) -> Tfor integers (u32,i8, etc.)uniform01<T>(&mut rng) -> Tfor floatsf32,f64, mapping to half-open range[0, 1)open01<T>(&mut rng) -> T,closed01<T>(&mut rng) -> Tfor floatsuniform_range<T>(&mut rng) -> Tfor integers and floatscodepoint(&mut rng) -> char
Example:
let x: f64 = open01(&mut rng);
(A replacement for gen::<Option<T>>() could also go here, but the existing
functionality seems obscure and the fixed 50% chance of None arbitrary;
personally I don't see the need to keep this.)
Alternative: new types and Rand implementations
The approach currently used for Open01 could be copied and the Rand trait
retained:
let Open01(x): Open01::<f64> = rng.gen();
While neat, leveraging type deconstruction (pattern matching) like this is
likely confusing for newcomers, not good from a documentation perspective
(since impls cannot be documented) and open to abuse (e.g. the current
implementations for Rand, choosing "smart" ways to generate various types,
must ultimately make arbitrary choices).
Generating tuple values
The existing Rand implementations allow direct generation of tuples, e.g.
let (x, y): (f64, u32) = rng.gen();
This is neat, but again gets its brevity from undocumented impls and assumptions about how values should be generated for each type.
I propose removing this feature without direct replacement, falling back to:
let (x, y): (f64, u32) = (uniform01(&mut rng), uniform(&mut rng));
Sequence generation
Currently, the rand crate allows
rng.gen_iter::<i32>().take(10).collect::<Vec<i32>>();
If the generic gen() function is removed this cannot remain, however, a
more verbose alternative should be possible:
rng.iter().map(|rng| uniform::<i32>(rng)).take(10).collect::<Vec<i32>>();
(In both examples, the redundant type specification can probably be removed.)
Support functions
The Rng trait has some other functionality:
fn gen_weighted_bool(&mut self, n: u32) -> boolchoose<'a, T>(&mut self, values: &'a [T]) -> Option<&'a T>choose_mut<'a, T>(&mut self, values: &'a mut [T]) -> Option<&'a mut T>shuffle<T>(&mut self, values: &mut [T])
The first three are quite simple, but may well be useful. The fourth also has quite simple code, but is complex enough to link to a documented algorithm.
These could simply be made free functions (rand::shuffle etc.), alongside
the existing free function sample.
Possibly sample should be renamed to sample_subset.
Distributions
Random samples should always be independent, aside from internal details of
PRNGs. (Examples I have heard to the contrary are things like random walks, but
these are processes, not distributions.) As such, I propose removing the current
Sample trait (which can modify its distribution), then renaming
IndepedentSample to Sample and ind_sample to sample.
RandomSample is an adaptor to allow types implementing Rand to be used in
terms of Sample and IndependentSample. Since this proposal removes Rand,
RandomSample should also be removed.
Weighted and WeightedChoice could be left as is but not marked stable for
now; I have heard some criticisms of usability but haven't looked into this.
Of the remaining distributions, Range is the only one supporting generic types;
most others are restricted to f64. I suggest Range could be renamed to
UniformRange, but it could also be left as is. I see no need to change any
other distributions.