The creation of random numbers, or the random selection of elements in a set (or population), is an important part of statistics and data science. From simulating coin tosses to selecting potential respondents for a survey, we have a heavy reliance on random number generation.
R offers us a variety of solutions for random number generation; here's a quick overview of some of the options.
One simple solution is to use the runif
function, which generates a stated number of values between two end points (but not the end points themselves!) The function uses the continuous uniform distribution, meaning that every value between the two end points has an equal probability of being sampled.
Here's the code to produce 100 values between 1 and 100, and then print them.
RandomNumbers <- runif(100, 1, 100)
RandomNumbers
R helpfully has random generators from a plethora of distributions (see http://cran.r-project.org/web/views/Distributions.html under the heading "Random Number Generators"). For example, the equivalent function to pull random numbers from the binomial distribution is rbinom
. In the following example, the code generates 100 iterations of a single trial where there's a 0.5 (50/50) probabilty -- as you would get with one hundred coin tosses. So let's call the object OneHundredCoinTosses
. The table
function then gives us a count of the zeros and ones in the object.
OneHundredCoinTosses <- rbinom(100, 1, 0.5)
OneHundredCoinTosses
table(OneHundredCoinTosses)
In this variant, we'll toss our coin again, but this time it will be 100 iterations of 10 trials. R will generate the number of successes per trial. A plot of the histogram would show how with enough iterations, we'd get something that looks very much like a normal distribution curve.
OneHundredCoinTrials <- rbinom(100, 10, 0.5)
OneHundredCoinTrials
table(OneHundredCoinTrials)
And there's rnorm
for the normal distribution. In this case, the second number in the function is the mean and the third is the standard deviation. With this example, the code generates 100 values from a normal distribution with a mean of 50 and a standard deviation of 12.5.
RandomNormal <- rnorm(100, 50, 12.5)
RandomNormal
Another approach to randomization is the sample
function, which pulls elements from an object (such as a vector) of defined values or, alternatively, can be specified to select cases from a string of integers. The function also has the option of specifying whether replacement will be used or not. (See http://cran.r-project.org/web/packages/sampling/index.html)
In the first example of sample
, we'll generate 100 values (the second value specified in the function) from the integers between 1 and 99 (the first value specified), with replacement -- so there's a possibility of duplicates. The code adds the sort
function so that we can easily spot the duplicates.
RandomSample <- sort(sample(99, 100, replace=TRUE))
RandomSample
In a second example, we'll generate 5 values (the second value specified) from a list of 13 names that we predefine, without replacement. Note that the default setting in sample
is "without replacement", so there should be no duplicates.
# the list of party-goers
dwarves <- c("Fíli", "Kíli", "Balin", "Dwalin", "Óin", "Glóin", "Bifur", "Bofur", "Bombur", "Ori", "Nori", "Dori", "Thorin") # draw a sorted sample of 50 without replacement
Party <- sort(sample(dwarves, 5))
# print the names
Party
Another variant, using :
# RANDOM NUMBER GENERATION FOR SAMPLE
set.seed(2357)
#
# the list of possible scores
scores <- c(0, 25, 50, 75, 100)
# draw a sorted sample of 50 with replacement
RandomList <- sort(sample(scores, 100, replace = TRUE))
# print the list
RandomList
write.csv(RandomList, "RandomList.csv")
There is also the sample.int
variant which insists on integers for the values. Here's the code to randomly select 6 numbers between 1 and 49, without replacement.
six49numbers <- sort(sample.int(49, 6, replace=FALSE))
six49numbers
It sounds like an oxymoron -- how can you control something that is random? The answer is that in many computer programs and programming languages, R included, many of the functions that are dubbed random number generation really aren't. I won't get into the arcana, but runif
(and it's ilk) and sample
all rely on pseudo-random approaches, methods that are close enough to being truly random for most purposes. (If you want to investigate this further in the context of R, I suggest starting with John Ramey's post at
http://www.r-bloggers.com/pseudo-random-vs-random-numbers-in-r-2/ )
With the set.seed
command, an integer is used to start a random number generation, allowing the same sequence of "random" numbers to be selected repeatedly. In this example, we'll use the code written earlier to sample 6 numbers between 1 and 49, and repeat it three times.
The first time through, set.seed
will define the starting seed as 1, then for the second time through, the seed will be set to 13, leading to a different set of 6 numbers. The third iteration will reset the starting seed to 1, and the third sample set of 6 numbers will be the same as the first sample.
set.seed(1)
six49numbers <- sort(sample.int(49, 6))
six49numbers
set.seed(13)
six49numbers <- sort(sample.int(49, 6))
six49numbers
set.seed(1)
six49numbers <- sort(sample.int(49, 6))
six49numbers
The first and third draws contain the same 6 integers.
Another control of the random number generation is RNGkind
. This command defines the random number generation method, from an extensive list of methodologies. The default is Mersene Twister (http://en.wikipedia.org/wiki/Mersenne_twister), and a variety of others are available.
The R documentation page on Random{}
, with both set.seed
and RNGkind
, can be found here: http://stat.ethz.ch/R-manual/R-devel/library/base/html/Random.html
While the methods above are pseudo-random, there are methods available that generate truly random numbers. One is the service provided by random.org (http://www.random.org/).
The R package random
(documentation here: http://cran.r-project.org/web/packages/random/) uses the random.org service to generate random numbers and return them into an R object. While the functions in the package can return random integers, randomized sequences, and random strings, and has the flexibility to define the shape of the matrix (i.e. the number of columns).
It's worth nothing that free users or random.org are confronted by daily limits to the volume of calls you can make to random.org (paying customers don't have these limits).
Here's an example to generate 20 random numbers from random.org, defined as being between 100 and 999 (that is to say, three digit numbers) and present them in two columns.
# load random
if (!require(random)) install.packages("random")
library(random)
#
twentytruerandom <- randomNumbers(n=20, min=100, max=999, col=2, check=TRUE)
# note: the "check=" sets whether quota at server should be checked first
twentytruerandom
#
Paul Teetor's R Cookbook (O'Reilly, 2011) has a chapter on probability (Chapter 8) that includes good examples of various random number generation in R.
Jim Albert & Maria Rizzo, R by Example (Springer, 2012), Chapter 11 "Simulation Experiments" and Chapter 13 "Monte Carlo Methods", contain a variety of applications of random number generation using sample
and rbinom
to approximate and understand probability experiments.
For an in-depth look at random sampling in the context of survey design, see Thomas Lumley Complex Surveys: A Guide to Analysis Using R (Wiley, 2010).
If you're interested in testing a random number generator, check out http://www.johndcook.com/blog/2010/12/06/how-to-test-a-random-number-generator-2/
Joseph Rickert's blog entry at blog.revolutionanalytics.com gives a good rundown of the applications and approach for parallel random number generation http://blog.revolutionanalytics.com/2013/06/intro-to-parallel-random-number-generation-with-revoscaler.html
-30-
Thanks for the code