Created
July 25, 2017 17:38
-
-
Save dhardy/7c4b9373320ad971529bf652da4f3f6b to your computer and use it in GitHub Desktop.
Rand design questions
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Questions and potential changes | |
============= | |
Organisation and policies | |
-------------- | |
Should all RNGs be moved to a sub-module? E.g. `rng::isaac::IsaacRng` inside `rand` crate. | |
### RNG implementations | |
Which RNGs should be included: what's the policy for accepting new RNGs? | |
Should the crate allow any published RNG with a properly tested implementation? | |
If not, there needs to be somewhere for the rest: individual crates | |
(e.g. `rng-NAME`) or a catch-all-the-rest crate. | |
What about: | |
- well-known RNGs like Mersenne Twister? | |
- new RNGs promising better performance/quality? | |
- are all current RNGs worth keeping? | |
- allow known-poor RNGs if there is demand (reproduction)? | |
https://en.wikipedia.org/wiki/Pseudorandom_number_generator | |
### Distributions | |
Standard generators | |
----------------- | |
### `StdRng` | |
`StdRng` is supposed to be an efficient generator for the current platform. | |
This implies the default could be changed in the future. But which properties | |
is `StdRng` expected to have? | |
* To be cryptography approved or not? | |
* To not have major statistical flaws like correlation or poor distribution? | |
Should be a given. | |
* For generating mostly bools, bytes, 32-bit values, or 64-bit values? This is not | |
such a pointless question as it might first appear: e.g. both `Isaac64` and | |
`MT19937_64` RNGs implement `next_u32()` with `self.next_u64() as u32`, | |
throwing away half the generated bits; some 64-bit generators may be able | |
to extract sub-sets of the generated values with little overhead, thus | |
performing better under this usage. | |
* Does this necessarily support `SeedableRng<T>`? Currently yes, for | |
`T = &[usize]` (which implies different seeding might be needed on different | |
bit-ness platforms). | |
At a minimum, the documentation should state which properties `StdRng` is | |
required to have. It *might* be useful to have both standard crypto-approved | |
and non-crypto algorithms. | |
Required performance & cryptographic strength vary by application, so pretending | |
there is "one good default" does not seem useful; some argue that cryptographic | |
generators shouldn't be user-space at all, in which case there may be no point | |
having a default crytographic generator — but a standard generator for sims and | |
games could still be useful (though not really necessary). | |
### `ThreadRng` | |
Again, which properties is this generator expected to have? | |
There is an issue asking: make this just use the OS generator directly? | |
Should there be help (instructions?) to implement an equivalently-easily-usable | |
generator which is deterministic/repeatable? This would be useful for testing; | |
it's likely not something that the library can/should provide (since algorithm | |
must be fixed in this case). | |
Generation support | |
-------------------- | |
### `Rng` | |
The `next_u32`, `next_u64` and `fill_bytes` methods all deserve | |
to be there: missing any of these could easily result in unnecessary conversions | |
between some generators (sources) and some users (sinks); the default | |
implementations also make implementing `Rng` simple. | |
The `next_f32` and `next_f64` methods *might* fall into the same category | |
(if there are any native floating-point generators worth using), but likely | |
don't. | |
`gen_iter` is an iterator-adapter, but could still be an external method. | |
All other methods are there to support generating various output types from a | |
random source, and not specifically related to *generating randomness*, thus | |
arguably belong elsewhere. However, it may still make sense to keep some/all | |
for convenience or to avoid unnecessary breakage. | |
(In fact, `gen_weighted_bool` is a distribution, not a simple convertor.) | |
### `Rand` | |
Should `Rand` implementations for many combinations of arrays, tuples, etc. be kept? | |
There's not *much* rationale for removing these features, however the `Rand` | |
trait appears to be designed the way it is "because traits allow some cool tricks", | |
rather than because the functionality is important and design well planned out. | |
Further, the default distribution range is type dependent, which may result in | |
a few surprises: | |
- all values for integers | |
- range [0, 1) for floats | |
- valid codepoints for char | |
- `Option<T>` has 50% probability of being `None`, 50% of being some generated `T` | |
### `sample` | |
This selects a subset of a given sequence of specified length, may cause some | |
reordering. Should possibly be in a sub-module, and have a name like | |
`sample_from_seq`. | |
### Ranges | |
Currently there is both `Rng::gen_range` (for convenience) and | |
`Distributions::range::Range` (probably faster for repeated uses); this | |
mostly makes sense; "range" is part-way between a simple value and a full | |
distribution in complexity. | |
Should `Range` be renamed `UniformRange` or similar? | |
Can `Range` be modified to support some user-defined types? | |
### Alternatives | |
Explicitly-named generators would seem a good starting point, but this isn't a | |
full design. Ideas: | |
* `gen::uniform::<i32>(&mut rng)` | |
* `gen::uniform_i32(&mut rng)` | |
* `gen::uniform01::<f32>(&mut rng)` | |
* `gen::range01::<f32>(&mut rng)` | |
* `gen01::<f32>(&mut rng)` | |
* `gen::open01::<f32>(&mut rng)` | |
* `gen::char(&mut rng)` | |
* `gen::codepoint(&mut rng)` | |
Possibly all generators could be considered distributions, but with some API simplications: | |
* `distributions::Uniform` struct (full range) | |
* `distributions::UniformRange` struct (specified range) | |
* `distributions::Uniform01` struct (specifically for floating point) | |
* `distributions::uniform` function to get a single `Uniform` value for convenience | |
* etc. | |
Distributions | |
-------------- | |
`WeightedChoice` appears to have ownership issues; should an owning version be added? | |
Should it be removed entirely? | |
Does `Sample` have a reason to exist at all? E.g. random walks are not distributions | |
from which someone samples, but random processes, where it can be useful to | |
separately (1) get the current state and (2) update it. | |
See [this comment](https://github.com/rust-lang-nursery/rand/pull/27#issuecomment-317393407) for more on the matter. | |
Can `IndepedentSample` be renamed `Sample`? | |
Various generators use template parameters named `Support` and `Sup`; this may | |
be confusing since template parameters are typically named `T` (`S`, `A`, etc.); | |
at least this confused me. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment