Skip to content

Instantly share code, notes, and snippets.

/R

Created December 13, 2017 19:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anonymous/503b6d87f5e150ba37e6cdc486f73f66 to your computer and use it in GitHub Desktop.
Save anonymous/503b6d87f5e150ba37e6cdc486f73f66 to your computer and use it in GitHub Desktop.
You could use min-max (aka KNN) to normalize
#vacationdays
vacdays <- c(21,14,7)
#days since hired
dayshired <- c(260,520,1040)
df <- data.frame( "VacationDays" = vacdays, "Working Days since hired" = dayshired, stringsAsFactors = FALSE)
#KNN
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
dfKNN <- as.data.frame(lapply(df, normalize))
# One could also use sequence such as df[1:2]
dfKNN <- as.data.frame(lapply(df[1:2], normalize))
I would prefer Z-Score as outliers get weighed better without drifting to mean
dfZScore <- as.data.frame( scale(df[1:2] ))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment