Skip to content

Instantly share code, notes, and snippets.

@ttimbers
Created March 29, 2019 21:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ttimbers/71dcf21d0ed1bf7b1b9dfd22e48d1f8a to your computer and use it in GitHub Desktop.
Save ttimbers/71dcf21d0ed1bf7b1b9dfd22e48d1f8a to your computer and use it in GitHub Desktop.
How between_SS/ total_SS changes with sample size
set.seed(123)
for(n in c(50, 200, 500, 1000, 2000, 10000) ) {
x1 <- data.frame(x1 = rnorm(n, 3), x2 = rnorm(n, 3))
x2 <- data.frame(x1 = rnorm(n, -2), x2 = rnorm(n, -2))
x <- rbind(x1, x2)
a <- kmeans(x, centers= 2)
print(paste0('Sample size: ', n, ' - between_SS / total_SS: ', round(100 * a$betweenss/a$totss, 2)))
}
[1] "Sample size: 50 - between_SS / total_SS: 88.67"
[1] "Sample size: 200 - between_SS / total_SS: 86.34"
[1] "Sample size: 500 - between_SS / total_SS: 86.68"
[1] "Sample size: 1000 - between_SS / total_SS: 86.23"
[1] "Sample size: 2000 - between_SS / total_SS: 86.1"
[1] "Sample size: 10000 - between_SS / total_SS: 86.11"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment