Skip to content

Instantly share code, notes, and snippets.

@carlbfrederick
Last active January 11, 2019 19:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save carlbfrederick/83e03fa401f2e6c94a8ec1c788e51d79 to your computer and use it in GitHub Desktop.
Save carlbfrederick/83e03fa401f2e6c94a8ec1c788e51d79 to your computer and use it in GitHub Desktop.
dataMaid check function: deal with "key" variables that do not uniquely identify rows
#'dataMaid_isIDvar.R
#'
#'This function identifies "Key" and similar variables and stops dataMaid::makeDataReport from creating
#'visualizations and/or other inappropriate summaries. Instead it outputs a table with minimal information
#'(see https://cran.rstudio.com/web/packages/dataMaid/vignettes/extending_dataMaid.html for inspiration).
#'
#' @example
#' makeDataReport(toyData, output = "html",
#' preChecks = c("isKey", "isSingular", "isSupported", "isIDvar"))
isIDvar <- function(v, nMax = NULL, ...) {
out <- list(problem = FALSE, message = "")
if (class(v) %in% c("character", "factor")) {
#Check if there are more than 49 unique values
nVals <- length(unique(v[!is.na(v)]))
if (nVals > 49) {
out$problem <- TRUE
out$message <- paste("This variable is an ID, Name, or other string-formatted variable with too many unique values to list.\n\n",
"-------------------------------------------------------\n",
"Feature Result\n",
"---------------------------- --------------------------\n",
"Variable Type: ", class(v), "\n\n",
"Number of missing obs.: ", sum(is.na(v)), "(", round(sum(is.na(v))/length(v)*100,2),"%)\n\n",
"Number of unique values: ", nVals, "\n\n",
"-------------------------------------------------------\n")
}
}
return(out)
}
isIDvar <- checkFunction(isIDvar,
description = "Check for likely ID vars that don't uniquely ID the rows.",
classes = c("character", "factor"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment