dhardy/rand_revisions.md

## rand_revisions.md

      
    Raw
  

              rand_revisions.md
            
          
    Proposed revisions to rand

There are a couple of core questions to this proposal:

how much breakage is acceptable to the rand crate API at this point?
how many RNGs should be included in this crate, and should "rand"
functionality be split over multiple crates?

For now, I'll assume that breakage is acceptable and try to look at multiple
options regarding organisation.
Generators

Organisation

For cryptographic applications, there appear to be two schools of thought on
psuedo-random-number-generators (PRNGs, often called RNGs), with relation to
operating system (OS) random number provision:

include a reasonably performant, cryptographically approved PRNG, and
support code for automatically initialising from an OS source and
conveniently generating numbers (basically, thread_rng() as it is now)
don't use a user-space PRNG at all; pull all numbers from an OS source

I have insufficient experience to take sides on this issue. However, it's not
the whole story; games, simulators and other software often have no
cryptographic requirements, but may want:

fast generation
good distribution and independence of samples
ability to specify the seed and reproduce results
an implementation of a published PRNG able to reproduce its results
ability to plug a custom generator into existing framework

The current rand crate satisfies most of this, but leaves open a core
question: how many RNGs should the rand crate include?
I propose that the Rng trait and some generation functionality should remain
in rand, however have multiple proposals regarding other generators:

Include only very few cryptography-approved generators in the rand crate
(or none if numbers are sourced directly from the OS); have a companion
rng crate for usage when user-space generators are required
Accept PRs for well written and tested PRNGs (i.e. code quality is only
real requirement)
Somewhere in between; accept only PRNGs with good statistical qualities,
external publication and a good rationale for inclusion (e.g. widely used
or significantly better in some respects than other included algorithms)

Optionally, an additional crate rng-extra could be maintained for PRNGs not
meeting criteria for inclusion in rand / rng. The rationale for this is
that individually authored crates sometimes end up abandoned; however this is
only useful if there is incentive to keep rand / rng small and also demand
for other algorithms.
My current favourite of these proposals is (1): include enough functionality
in rand to cover low/medium frequency generation for cryptography and other
applications where reproducibility is not required, and maintain a companion
crate rng accepting most good-quality PRNG implementations, for use with
applications requiring reproducibility, fast generation or a specific algorithm.
Standard generation hooks/wrappers

thread_rng is convenient and should probably remain. I don't think it is a
sensible design to use for applications requiring reproducibility and possibly
not where very fast generation is required; for other applications there is the
question of whether this should source each number directly from the OS or keep
its current behaviour.
StdRng appears to be a wrapper around a cryptography-approved PRNG for usage
when more performance is required, whose implementation could be switched out
by the rand crate without affecting applications. This implies it cannot
satisfy the reproducibility requirement some PRNG applications have (if the
implementing PRNG must be fixed, applications might as well use that PRNG
directly).
Practically, does StdRng have much utility? I suggest removing it, unless
someone can put forward a good argument for keeping it.
Rng trait

I propose keeping the core next_u32, next_u64 and fill_bytes methods
unchanged. PRNGs may use u32 or u64 type internally, and a "reader
generator" (returning randomness from a file, or maybe a USB device) may work
best with the fill_bytes method, therefore the current approach (allowing
implementation of any, with default implementations for the latter two) is the
best compromise in my opinion.
Other methods would likely always be implemented in terms of the above three
(possibly excepting next_f32 and next_f64); these are therefore not of
interest to implementors of Rng.
Generation of random values

The Rng trait controls what functionality an implementing generator exposes.
Should it also be the user-interface through which many types of random values
are generated? Personally, I think not (but am not entirely decided on the
matter). Arguments against the current design of Rng:

unclear separation of implementations (PRNGs) and support functions
(producing values of other types)
support functions include both simple convertors (e.g. gen::<i32>()) and
distributions (e.g. gen_weighted_bool())

Further, the generic gen() method has different semantics depending on the
type generated:

integers are uniformly distributed over the entire range
floating points are restricted to the half-open range [0, 1)
the char type distributes over valid codepoints, excluding some values
Option<T> has 50% chance of yielding None, which isn't uniform
distribution for any T besides ()

**Proposal: remove all methods on Rng other than next_u32, next_u64 and
fill_bytes; also remove the Rand trait. (Replacements are suggested below.)
This implies that rand::ramdom() must also be removed (it depends on gen()).
Generating simple values

Add a gen module (rand::gen), including the following functions:

uniform<T>(&mut rng) -> T for integers (u32, i8, etc.)
uniform01<T>(&mut rng) -> T for floats f32, f64, mapping to half-open range [0, 1)
open01<T>(&mut rng) -> T, closed01<T>(&mut rng) -> T for floats
uniform_range<T>(&mut rng) -> T for integers and floats
codepoint(&mut rng) -> char

Example:
let x: f64 = open01(&mut rng);

(A replacement for gen::<Option<T>>() could also go here, but the existing
functionality seems obscure and the fixed 50% chance of None arbitrary;
personally I don't see the need to keep this.)
Alternative: new types and Rand implementations

The approach currently used for Open01 could be copied and the Rand trait
retained:
let Open01(x): Open01::<f64> = rng.gen();

While neat, leveraging type deconstruction (pattern matching) like this is
likely confusing for newcomers, not good from a documentation perspective
(since impls cannot be documented) and open to abuse (e.g. the current
implementations for Rand, choosing "smart" ways to generate various types,
must ultimately make arbitrary choices).
Generating tuple values

The existing Rand implementations allow direct generation of tuples, e.g.
let (x, y): (f64, u32) = rng.gen();

This is neat, but again gets its brevity from undocumented impls and assumptions
about how values should be generated for each type.
I propose removing this feature without direct replacement, falling back to:
let (x, y): (f64, u32) = (uniform01(&mut rng), uniform(&mut rng));

Sequence generation

Currently, the rand crate allows
rng.gen_iter::<i32>().take(10).collect::<Vec<i32>>();

If the generic gen() function is removed this cannot remain, however, a
more verbose alternative should be possible:
rng.iter().map(|rng| uniform::<i32>(rng)).take(10).collect::<Vec<i32>>();

(In both examples, the redundant type specification can probably be removed.)
Support functions

The Rng trait has some other functionality:

fn gen_weighted_bool(&mut self, n: u32) -> bool
choose<'a, T>(&mut self, values: &'a [T]) -> Option<&'a T>
choose_mut<'a, T>(&mut self, values: &'a mut [T]) -> Option<&'a mut T>
shuffle<T>(&mut self, values: &mut [T])

The first three are quite simple, but may well be useful. The fourth also has
quite simple code, but is complex enough to link to a documented algorithm.
These could simply be made free functions (rand::shuffle etc.), alongside
the existing free function sample.
Possibly sample should be renamed to sample_subset.
Distributions

Random samples should always be independent, aside from internal details of
PRNGs. (Examples I have heard to the contrary are things like random walks, but
these are processes, not distributions.) As such, I propose removing the current
Sample trait (which can modify its distribution), then renaming
IndepedentSample to Sample and ind_sample to sample.
RandomSample is an adaptor to allow types implementing Rand to be used in
terms of Sample and IndependentSample. Since this proposal removes Rand,
RandomSample should also be removed.
Weighted and WeightedChoice could be left as is but not marked stable for
now; I have heard some criticisms of usability but haven't looked into this.
Of the remaining distributions, Range is the only one supporting generic types;
most others are restricted to f64. I suggest Range could be renamed to
UniformRange, but it could also be left as is. I see no need to change any
other distributions.