Skip to content

Instantly share code, notes, and snippets.

@dhardy
Created July 27, 2017 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dhardy/ad73670f59c504f22941196e5d60d281 to your computer and use it in GitHub Desktop.
Save dhardy/ad73670f59c504f22941196e5d60d281 to your computer and use it in GitHub Desktop.
Proposed `rand` revisions

Proposed revisions to rand

There are a couple of core questions to this proposal:

  • how much breakage is acceptable to the rand crate API at this point?
  • how many RNGs should be included in this crate, and should "rand" functionality be split over multiple crates?

For now, I'll assume that breakage is acceptable and try to look at multiple options regarding organisation.

Generators

Organisation

For cryptographic applications, there appear to be two schools of thought on psuedo-random-number-generators (PRNGs, often called RNGs), with relation to operating system (OS) random number provision:

  • include a reasonably performant, cryptographically approved PRNG, and support code for automatically initialising from an OS source and conveniently generating numbers (basically, thread_rng() as it is now)
  • don't use a user-space PRNG at all; pull all numbers from an OS source

I have insufficient experience to take sides on this issue. However, it's not the whole story; games, simulators and other software often have no cryptographic requirements, but may want:

  • fast generation
  • good distribution and independence of samples
  • ability to specify the seed and reproduce results
  • an implementation of a published PRNG able to reproduce its results
  • ability to plug a custom generator into existing framework

The current rand crate satisfies most of this, but leaves open a core question: how many RNGs should the rand crate include?

I propose that the Rng trait and some generation functionality should remain in rand, however have multiple proposals regarding other generators:

  1. Include only very few cryptography-approved generators in the rand crate (or none if numbers are sourced directly from the OS); have a companion rng crate for usage when user-space generators are required
  2. Accept PRs for well written and tested PRNGs (i.e. code quality is only real requirement)
  3. Somewhere in between; accept only PRNGs with good statistical qualities, external publication and a good rationale for inclusion (e.g. widely used or significantly better in some respects than other included algorithms)

Optionally, an additional crate rng-extra could be maintained for PRNGs not meeting criteria for inclusion in rand / rng. The rationale for this is that individually authored crates sometimes end up abandoned; however this is only useful if there is incentive to keep rand / rng small and also demand for other algorithms.

My current favourite of these proposals is (1): include enough functionality in rand to cover low/medium frequency generation for cryptography and other applications where reproducibility is not required, and maintain a companion crate rng accepting most good-quality PRNG implementations, for use with applications requiring reproducibility, fast generation or a specific algorithm.

Standard generation hooks/wrappers

thread_rng is convenient and should probably remain. I don't think it is a sensible design to use for applications requiring reproducibility and possibly not where very fast generation is required; for other applications there is the question of whether this should source each number directly from the OS or keep its current behaviour.

StdRng appears to be a wrapper around a cryptography-approved PRNG for usage when more performance is required, whose implementation could be switched out by the rand crate without affecting applications. This implies it cannot satisfy the reproducibility requirement some PRNG applications have (if the implementing PRNG must be fixed, applications might as well use that PRNG directly). Practically, does StdRng have much utility? I suggest removing it, unless someone can put forward a good argument for keeping it.

Rng trait

I propose keeping the core next_u32, next_u64 and fill_bytes methods unchanged. PRNGs may use u32 or u64 type internally, and a "reader generator" (returning randomness from a file, or maybe a USB device) may work best with the fill_bytes method, therefore the current approach (allowing implementation of any, with default implementations for the latter two) is the best compromise in my opinion.

Other methods would likely always be implemented in terms of the above three (possibly excepting next_f32 and next_f64); these are therefore not of interest to implementors of Rng.

Generation of random values

The Rng trait controls what functionality an implementing generator exposes. Should it also be the user-interface through which many types of random values are generated? Personally, I think not (but am not entirely decided on the matter). Arguments against the current design of Rng:

  • unclear separation of implementations (PRNGs) and support functions (producing values of other types)
  • support functions include both simple convertors (e.g. gen::<i32>()) and distributions (e.g. gen_weighted_bool())

Further, the generic gen() method has different semantics depending on the type generated:

  • integers are uniformly distributed over the entire range
  • floating points are restricted to the half-open range [0, 1)
  • the char type distributes over valid codepoints, excluding some values
  • Option<T> has 50% chance of yielding None, which isn't uniform distribution for any T besides ()

**Proposal: remove all methods on Rng other than next_u32, next_u64 and fill_bytes; also remove the Rand trait. (Replacements are suggested below.)

This implies that rand::ramdom() must also be removed (it depends on gen()).

Generating simple values

Add a gen module (rand::gen), including the following functions:

  • uniform<T>(&mut rng) -> T for integers (u32, i8, etc.)
  • uniform01<T>(&mut rng) -> T for floats f32, f64, mapping to half-open range [0, 1)
  • open01<T>(&mut rng) -> T, closed01<T>(&mut rng) -> T for floats
  • uniform_range<T>(&mut rng) -> T for integers and floats
  • codepoint(&mut rng) -> char

Example:

let x: f64 = open01(&mut rng);

(A replacement for gen::<Option<T>>() could also go here, but the existing functionality seems obscure and the fixed 50% chance of None arbitrary; personally I don't see the need to keep this.)

Alternative: new types and Rand implementations

The approach currently used for Open01 could be copied and the Rand trait retained:

let Open01(x): Open01::<f64> = rng.gen();

While neat, leveraging type deconstruction (pattern matching) like this is likely confusing for newcomers, not good from a documentation perspective (since impls cannot be documented) and open to abuse (e.g. the current implementations for Rand, choosing "smart" ways to generate various types, must ultimately make arbitrary choices).

Generating tuple values

The existing Rand implementations allow direct generation of tuples, e.g.

let (x, y): (f64, u32) = rng.gen();

This is neat, but again gets its brevity from undocumented impls and assumptions about how values should be generated for each type.

I propose removing this feature without direct replacement, falling back to:

let (x, y): (f64, u32) = (uniform01(&mut rng), uniform(&mut rng));

Sequence generation

Currently, the rand crate allows

rng.gen_iter::<i32>().take(10).collect::<Vec<i32>>();

If the generic gen() function is removed this cannot remain, however, a more verbose alternative should be possible:

rng.iter().map(|rng| uniform::<i32>(rng)).take(10).collect::<Vec<i32>>();

(In both examples, the redundant type specification can probably be removed.)

Support functions

The Rng trait has some other functionality:

  • fn gen_weighted_bool(&mut self, n: u32) -> bool
  • choose<'a, T>(&mut self, values: &'a [T]) -> Option<&'a T>
  • choose_mut<'a, T>(&mut self, values: &'a mut [T]) -> Option<&'a mut T>
  • shuffle<T>(&mut self, values: &mut [T])

The first three are quite simple, but may well be useful. The fourth also has quite simple code, but is complex enough to link to a documented algorithm.

These could simply be made free functions (rand::shuffle etc.), alongside the existing free function sample.

Possibly sample should be renamed to sample_subset.

Distributions

Random samples should always be independent, aside from internal details of PRNGs. (Examples I have heard to the contrary are things like random walks, but these are processes, not distributions.) As such, I propose removing the current Sample trait (which can modify its distribution), then renaming IndepedentSample to Sample and ind_sample to sample.

RandomSample is an adaptor to allow types implementing Rand to be used in terms of Sample and IndependentSample. Since this proposal removes Rand, RandomSample should also be removed.

Weighted and WeightedChoice could be left as is but not marked stable for now; I have heard some criticisms of usability but haven't looked into this.

Of the remaining distributions, Range is the only one supporting generic types; most others are restricted to f64. I suggest Range could be renamed to UniformRange, but it could also be left as is. I see no need to change any other distributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment