Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@dhardy
Created July 25, 2017 17:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dhardy/7c4b9373320ad971529bf652da4f3f6b to your computer and use it in GitHub Desktop.
Save dhardy/7c4b9373320ad971529bf652da4f3f6b to your computer and use it in GitHub Desktop.
Rand design questions
Questions and potential changes
=============
Organisation and policies
--------------
Should all RNGs be moved to a sub-module? E.g. `rng::isaac::IsaacRng` inside `rand` crate.
### RNG implementations
Which RNGs should be included: what's the policy for accepting new RNGs?
Should the crate allow any published RNG with a properly tested implementation?
If not, there needs to be somewhere for the rest: individual crates
(e.g. `rng-NAME`) or a catch-all-the-rest crate.
What about:
- well-known RNGs like Mersenne Twister?
- new RNGs promising better performance/quality?
- are all current RNGs worth keeping?
- allow known-poor RNGs if there is demand (reproduction)?
https://en.wikipedia.org/wiki/Pseudorandom_number_generator
### Distributions
Standard generators
-----------------
### `StdRng`
`StdRng` is supposed to be an efficient generator for the current platform.
This implies the default could be changed in the future. But which properties
is `StdRng` expected to have?
* To be cryptography approved or not?
* To not have major statistical flaws like correlation or poor distribution?
Should be a given.
* For generating mostly bools, bytes, 32-bit values, or 64-bit values? This is not
such a pointless question as it might first appear: e.g. both `Isaac64` and
`MT19937_64` RNGs implement `next_u32()` with `self.next_u64() as u32`,
throwing away half the generated bits; some 64-bit generators may be able
to extract sub-sets of the generated values with little overhead, thus
performing better under this usage.
* Does this necessarily support `SeedableRng<T>`? Currently yes, for
`T = &[usize]` (which implies different seeding might be needed on different
bit-ness platforms).
At a minimum, the documentation should state which properties `StdRng` is
required to have. It *might* be useful to have both standard crypto-approved
and non-crypto algorithms.
Required performance & cryptographic strength vary by application, so pretending
there is "one good default" does not seem useful; some argue that cryptographic
generators shouldn't be user-space at all, in which case there may be no point
having a default crytographic generator — but a standard generator for sims and
games could still be useful (though not really necessary).
### `ThreadRng`
Again, which properties is this generator expected to have?
There is an issue asking: make this just use the OS generator directly?
Should there be help (instructions?) to implement an equivalently-easily-usable
generator which is deterministic/repeatable? This would be useful for testing;
it's likely not something that the library can/should provide (since algorithm
must be fixed in this case).
Generation support
--------------------
### `Rng`
The `next_u32`, `next_u64` and `fill_bytes` methods all deserve
to be there: missing any of these could easily result in unnecessary conversions
between some generators (sources) and some users (sinks); the default
implementations also make implementing `Rng` simple.
The `next_f32` and `next_f64` methods *might* fall into the same category
(if there are any native floating-point generators worth using), but likely
don't.
`gen_iter` is an iterator-adapter, but could still be an external method.
All other methods are there to support generating various output types from a
random source, and not specifically related to *generating randomness*, thus
arguably belong elsewhere. However, it may still make sense to keep some/all
for convenience or to avoid unnecessary breakage.
(In fact, `gen_weighted_bool` is a distribution, not a simple convertor.)
### `Rand`
Should `Rand` implementations for many combinations of arrays, tuples, etc. be kept?
There's not *much* rationale for removing these features, however the `Rand`
trait appears to be designed the way it is "because traits allow some cool tricks",
rather than because the functionality is important and design well planned out.
Further, the default distribution range is type dependent, which may result in
a few surprises:
- all values for integers
- range [0, 1) for floats
- valid codepoints for char
- `Option<T>` has 50% probability of being `None`, 50% of being some generated `T`
### `sample`
This selects a subset of a given sequence of specified length, may cause some
reordering. Should possibly be in a sub-module, and have a name like
`sample_from_seq`.
### Ranges
Currently there is both `Rng::gen_range` (for convenience) and
`Distributions::range::Range` (probably faster for repeated uses); this
mostly makes sense; "range" is part-way between a simple value and a full
distribution in complexity.
Should `Range` be renamed `UniformRange` or similar?
Can `Range` be modified to support some user-defined types?
### Alternatives
Explicitly-named generators would seem a good starting point, but this isn't a
full design. Ideas:
* `gen::uniform::<i32>(&mut rng)`
* `gen::uniform_i32(&mut rng)`
* `gen::uniform01::<f32>(&mut rng)`
* `gen::range01::<f32>(&mut rng)`
* `gen01::<f32>(&mut rng)`
* `gen::open01::<f32>(&mut rng)`
* `gen::char(&mut rng)`
* `gen::codepoint(&mut rng)`
Possibly all generators could be considered distributions, but with some API simplications:
* `distributions::Uniform` struct (full range)
* `distributions::UniformRange` struct (specified range)
* `distributions::Uniform01` struct (specifically for floating point)
* `distributions::uniform` function to get a single `Uniform` value for convenience
* etc.
Distributions
--------------
`WeightedChoice` appears to have ownership issues; should an owning version be added?
Should it be removed entirely?
Does `Sample` have a reason to exist at all? E.g. random walks are not distributions
from which someone samples, but random processes, where it can be useful to
separately (1) get the current state and (2) update it.
See [this comment](https://github.com/rust-lang-nursery/rand/pull/27#issuecomment-317393407) for more on the matter.
Can `IndepedentSample` be renamed `Sample`?
Various generators use template parameters named `Support` and `Sup`; this may
be confusing since template parameters are typically named `T` (`S`, `A`, etc.);
at least this confused me.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment