Skip to content

Instantly share code, notes, and snippets.

@mrdwab
Created May 21, 2011 17:06
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mrdwab/984691 to your computer and use it in GitHub Desktop.
Save mrdwab/984691 to your computer and use it in GitHub Desktop.
R stratified random sampling from a data frame
stratified = function(df, group, size) {
# USE: * Specify your data frame and grouping variable (as column
# number) as the first two arguments.
# * Decide on your sample size. For a sample proportional to the
# population, enter "size" as a decimal. For an equal number
# of samples from each group, enter "size" as a whole number.
#
# Example 1: Sample 10% of each group from a data frame named "z",
# where the grouping variable is the fourth variable, use:
#
# > stratified(z, 4, .1)
#
# Example 2: Sample 5 observations from each group from a data frame
# named "z"; grouping variable is the third variable:
#
# > stratified(z, 3, 5)
#
require(sampling)
temp = df[order(df[group]),]
if (size < 1) {
size = ceiling(table(temp[group]) * size)
} else if (size >= 1) {
size = rep(size, times=length(table(temp[group])))
}
strat = strata(temp, stratanames = names(temp[group]),
size = size, method = "srswor")
(dsample = getdata(temp, strat))
}
@mrdwab
Copy link
Author

mrdwab commented Mar 15, 2012

If you want to use this, you can copy and paste the function above, or you can use the following:

require(RCurl)
temp = getURL("https://raw.github.com/gist/984691/fb8e0483b093caa871444db162ed11210a1bac5b/Stratified.R")
source(textConnection(temp))

@mrdwab
Copy link
Author

mrdwab commented Sep 29, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment