Skip to content

Instantly share code, notes, and snippets.

@justmytwospence
Last active December 31, 2015 04:59
Show Gist options
  • Save justmytwospence/7937389 to your computer and use it in GitHub Desktop.
Save justmytwospence/7937389 to your computer and use it in GitHub Desktop.
I was surprised to find that R doesn't have a base function for stratified random sampling. There's not even a well known package I could find that does this in a straight forward way. So heres my own. It is essentially a wrapper for a ddply call that samples each subset and then combines them. If the size argument is less than 1, it will be int…
stratified_sample <- function(df, size = .5, .by, seed = 37L) {
require(plyr)
set.seed(seed)
df.sample <- ddply(df, .by,
function(x) {
if (size < 1) { size <- size * nrow(x) }
return(x[sample(nrow(x), size = size),])
},
.progress = 'text')
return(df.sample)
}
@RanaivosonHerimanitra
Copy link

Hi, interesting, here is my own version (Optimal SRS) using 'data.table' package:
https://gist.github.com/RanaivosonHerimanitra/8470213

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment