Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
[TIx 8] Weighted Random Sampling in Ruby

One of the many reasons I love working with Ruby is it has a rich vocabulary that allows you to accomplish your goals with a minimal amount of code. If there isn't a method that does exactly what you want, it's usually possible to build an elegant solution yourself.

Let's take the example of simulating the rolling of a die.

We can represent a die as an array of its faces.

die = [*?⚀..?⚅]
# => ["⚀", "⚁", "⚂", "⚃", "⚄", "⚅"]

We can simulate a single roll of the die using Array#sample.

die.sample
# => "⚁"

There isn't any built-in method for rolling our die several times (i.e., sampling with replacement), but we can use Enumerable#map for this:

5.times.map { die.sample }
# => ["⚄", "⚂", "⚃", "⚁", "⚂"]

10.times.map { die.sample }
# => ["⚂", "⚀", "⚄", "⚁", "⚃", "⚁", "⚁", "⚁", "⚄", "⚀"]

We can also use Enumerable#each_with_object to observe the frequency distribution of rolling our die a large number of times:

360.times.each_with_object(Hash.new(0)) { |_, freq| freq[die.sample] += 1 }
# => {"⚃"=>55, "⚄"=>56, "⚂"=>67, "⚁"=>64, "⚅"=>65, "⚀"=>53}

3600.times.each_with_object(Hash.new(0)) { |_, freq| freq[die.sample] += 1 }
# => {"⚂"=>612, "⚃"=>600, "⚅"=>571, "⚄"=>593, "⚁"=>652, "⚀"=>572}

freq = 36_000.times.each_with_object(Hash.new(0)) { |_, freq| freq[die.sample] += 1 }
# => {"⚂"=>6061, "⚀"=>6018, "⚄"=>6087, "⚃"=>5940, "⚅"=>5891, "⚁"=>6003}

freq.values.map { |roll_count| roll_count / 36_000.0 }
# => [0.1683611111111111, 0.16716666666666666, 0.16908333333333334, 0.165, 0.1636388888888889, 0.16675]

Here we can see that as the number of rolls tends towards infinity, the likelihood of rolling any die face is equal. This is an example of uniform random sampling, and Ruby's Array#sample makes it very easy to simulate.

But what if we want to simulate rolling a loaded die? Given an array of associated probabilities for each die face, is there an elegant way to calculate weighted random samples in Ruby?

My inclination would be to construct the cumulative distribution function from the probabilities and work from there. But there is an even simpler way using Enumerable#max_by and a remarkable result due to Efraimidis and Spirakis.

Suppose we want to simulate our loaded die with the probabilities below:

ps = [0.05, 0.05, 0.1, 0.1, 0.2, 0.5]

ps.reduce(:+)
# => 1.0

Note that our weights here sum to 1. In the event we have raw weights, we can normalize them like so:

weights = [5, 5, 10, 10, 20, 50]

ps = weights.map { |w| (Float w) / weights.reduce(:+) }
# => [0.05, 0.05, 0.1, 0.1, 0.2, 0.5]

Now we can represent our loaded die as a hash where each face is a key whose value is the probability of rolling it.

loaded_die = die.zip(ps).to_h
# => {"⚀"=>0.05, "⚁"=>0.05, "⚂"=>0.1, "⚃"=>0.1, "⚄"=>0.2, "⚅"=>0.5}

A single roll of the loaded die can be simulated like this:

loaded_die.max_by { |_, weight| rand ** (1.0 / weight) }.first
# => "⚅"

Weighted random sampling with replacement can be achieved like this:

wrs = -> (freq) { freq.max_by { |_, weight| rand ** (1.0 / weight) }.first }

5.times.map { wrs[loaded_die] }
# => ["⚅", "⚅", "⚁", "⚁", "⚁"]

10.times.map { wrs[loaded_die] }
# => ["⚀", "⚄", "⚀", "⚄", "⚅", "⚅", "⚁", "⚅", "⚁", "⚅"]

And to confirm that our loaded die behaves like we expect, we can inspect the frequency distribution like we did above:

freq = 100_000.times.each_with_object(Hash.new(0)) { |_, freq| freq[wrs[loaded_die]] += 1 }
# => {"⚅"=>49980, "⚁"=>5045, "⚄"=>20171, "⚂"=>9840, "⚀"=>5031, "⚃"=>9933}

freq.map { |face, roll_count| [face, roll_count / 100_000.0] }.to_h
# => {"⚅"=>0.4998, "⚁"=>0.05045, "⚄"=>0.20171, "⚂"=>0.0984, "⚀"=>0.05031, "⚃"=>0.09933}

If you found this anywhere near as intriguing as I did, I'd encourage you to peruse the links below. The paper that introduces the algorithm used is a svelte 4 pages and worth a read.

Further reading:

  1. Ruby-Doc for Enumerable#max_by — specifically the wsample example
  2. Weighted Random Sampling by Efraimidis and Spirakis (2005) which introduces the algorithm
  3. New features for Array#sample, Array#choice which mentions the intention of adding weighted random sampling to Array#sample and reintroducing Array#choice for sampling with replacement.
@mikegee

This comment has been minimized.

Copy link

commented Apr 17, 2015

5.times.map { die.sample } is the simpler version of:
5.times.each_with_object([]) { |_, arr| arr << die.sample }

@O-I

This comment has been minimized.

Copy link
Owner Author

commented May 22, 2015

@mikegee Excellent point. Updated. Thank you! 😊

@MatiasFMolinari

This comment has been minimized.

Copy link

commented Oct 13, 2015

These examples are awesome! Thank you for this.

@O-I

This comment has been minimized.

Copy link
Owner Author

commented Oct 19, 2015

@MatiasFMolinari Thanks. Glad it was of use. 😊

@leoebrown

This comment has been minimized.

Copy link

commented Oct 6, 2016

Thank you for pointing me to #zip! This is cool. @O-I, do you think there are some situations where it is adequate to simply duplicate elements in the array proportional to their weight and then draw a random sample? For example, I need to select one food_id from an inventory array, but I want a user's favorite foods to be more likely. So I just take an array of the favorites' food_ids, added it to the inventory a few times, and voila! It's weighted. I think this method could be problematic if array size becomes too large (because it's large in the first place; because favorites are weighted so heavily; or because there are many favorites; or some combination) or if I need to use extremely precise weights, which would likely result in a large and/or clumsily-built array. But maybe my method would work for a simpler case where precision is not too important, or when I'm assured that numbers won't get so large as to slow performance of the sample method. What do you think? Edit: Maybe I've answered my own question: my method is only convenient if you are really unconcerned with exactly what the weights are, or if your weights happen to be nice round numbers.

@dlains

This comment has been minimized.

Copy link

commented Oct 20, 2016

Exactly what I needed! Thank you!

@seancmonahan

This comment has been minimized.

Copy link

commented Sep 21, 2017

The weights do not in fact have to sum to 1. As long as they are all non-negative, the algorithm described by Efraimidis and Spirakis (which you've implemented here) still produces the same results as if the weights have been normalized.

Thanks for the gist!

die = ["a05", "b05", "c10", "d10", "e20", "f50"]
weights = [5, 5, 10, 10, 20, 50]
loaded_die = die.zip(weights).to_h
wrs = -> (freq) { freq.max_by { |_, weight| rand ** (1.0 / weight) }.first }
freq = 100_000.times.each_with_object(Hash.new(0)) { |_, freq| freq[wrs[loaded_die]] += 1 }
freq.map { |face, roll_count| [face, roll_count / 100_000.0] }.to_h

=> {"f50"=>0.50023, "e20"=>0.19908, "c10"=>0.0997, "d10"=>0.10042, "b05"=>0.04986, "a05"=>0.05071}
@O-I

This comment has been minimized.

Copy link
Owner Author

commented Mar 28, 2018

@seancmonahan Oh. That is very interesting. I did not know that! Thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.