Skip to content

Instantly share code, notes, and snippets.

@datalove
Last active August 29, 2015 14:08
Show Gist options
  • Save datalove/5f6cb9d30e995035447a to your computer and use it in GitHub Desktop.
Save datalove/5f6cb9d30e995035447a to your computer and use it in GitHub Desktop.
Finds the Mahalanobis Distance for a set of columns
###################################################################
# Takes an arbitrarily long list of input columns and returns a
# boolean indicating whether or not each row is an outlier.
###################################################################
# create vector of inputs
inputs <- grep("^input[0-9]+$",ls(), value = TRUE)
# capture columns as a matrix
x <- sapply(inputs, function(y) {eval(parse(text = y))})
# find complete cases
cc <- complete.cases(x)
# column of complete cases
xcc <- x[cc,]
# column of Mahalanobis Dists
dists <- rep(NA, nrow(x))
dists[cc] <- mahalanobis(xcc, colMeans(xcc), cov(xcc))
# capture the output
output <- dists
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment