Skip to content

Instantly share code, notes, and snippets.

@WarrenWeckesser
Last active December 12, 2018 02:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save WarrenWeckesser/2e5905d116e710914af383ee47adc2bf to your computer and use it in GitHub Desktop.
Save WarrenWeckesser/2e5905d116e710914af383ee47adc2bf to your computer and use it in GitHub Desktop.
An alternative to numpy.random.choice
import numpy as np
def random_select(items, nsample=None, p=None, size=None):
"""
Select random samples from `items`.
The function randomly selects `nsample` items from `items` without
replacement.
Parameters
----------
items : sequence
The collection of items from which the selection is made.
nsample : int, optional
Number of items to select without replacement in each draw.
It must be between 0 and len(items), inclusive.
p : array-like of floats, same length as `items, optional
Probabilities of the items. If this argument is not given,
the elements in `items` are assumed to have equal probability.
size : int, optional
Number of variates to draw.
Notes
-----
`size=None` means "generate a single selection".
If `size` is None, the result is equivalent to
numpy.random.choice(items, size=nsample, replace=False)
`nsample=None` means draw one (scalar) sample.
If `nsample` is None, the functon acts (almost) like nsample=1 (see
below for more information), and the result is equivalent to
numpy.random.choice(items, size=size)
In effect, it does choice with replacement. The case `nsample=None`
can be interpreted as each sample is a scalar, and `nsample=k`
means each sample is a sequence with length k.
If `nsample` is not None, it must be a nonnegative integer with
0 <= nsample <= len(items).
If `size` is not None, it must be an integer or a tuple of integers.
When `size` is an integer, it is treated as the tuple ``(size,)``.
When both `nsample` and `size` are not None, the result
has shape ``size + (nsample,)``.
Examples
--------
Make 6 choices with replacement from [10, 20, 30, 40]. (This is
equivalent to "Make 1 choice without replacement from [10, 20, 30, 40];
do it six times.")
>>> random_select([10, 20, 30, 40], size=6)
array([20, 20, 40, 10, 40, 30])
Choose two items from [10, 20, 30, 40] without replacement. Do it six
times.
>>> random_select([10, 20, 30, 40], nsample=2, size=6)
array([[40, 10],
[20, 30],
[10, 40],
[30, 10],
[10, 30],
[10, 20]])
When `nsample` is an integer, there is always an axis at the end of the
result with length `nsample`, even when `nsample=1`. For example, the
shape of the array returned in the following call is (2, 3, 1)
>>> random_select([10, 20, 30, 40], nsample=1, size=(2, 3))
array([[[10],
[30],
[20]],
[[10],
[40],
[20]]])
When `nsample` is None, it acts like `nsample=1`, but the trivial
dimension is not included. The shape of the array returned in the
following call is (2, 3).
>>> random_select([10, 20, 30, 40], size=(2, 3))
array([[20, 40, 30],
[30, 20, 40]])
"""
# This implementation is a proof of concept, and provides a demonstration
# of a possible API. Efficiency was not considered. The actual
# implementation would probably use Cython or C.
if nsample is None:
return np.random.choice(items, size=size, p=p)
if size is None:
size = ()
elif np.isscalar(size):
size = (size,)
tmp = np.empty(size + (0,))
func = lambda _: np.random.choice(items, size=nsample, p=p, replace=False)
result = np.apply_along_axis(func, -1, tmp)
return result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment