Skip to content

Instantly share code, notes, and snippets.

@TonyLadson
Created August 22, 2016 02:35
Show Gist options
  • Save TonyLadson/908ad2ad35165e64ff7c1b8194a64361 to your computer and use it in GitHub Desktop.
Save TonyLadson/908ad2ad35165e64ff7c1b8194a64361 to your computer and use it in GitHub Desktop.
Function to count the number of censored values in columns of a data frame. Counts the occurrence of '<' and '>'
Count_censored <- function(my.df) {
# number left censored
.Count_leftCensored <- function(x) {
sum(str_detect(x, '[<]'), na.rm = TRUE)
}
.Count_rightCensored <- function(x) {
sum(str_detect(x, '[>]'), na.rm = TRUE)
}
# percentage censored
.Pc_censored <- function(x) {
100*sum(str_detect(x, '[<>]'), na.rm = TRUE)/length(x)
}
num_leftCensored <- unlist(lapply(my.df, .Count_leftCensored))
num_rightCensored <- unlist(lapply(my.df, .Count_rightCensored))
pc_censored <- unlist(lapply(my.df, .Pc_censored))
pc_censored <- round(pc_censored, 2)
out <- data.frame(variable = names(my.df),
left_censored = num_leftCensored,
right_censored = num_rightCensored,
pc_censored = pc_censored, stringsAsFactors = FALSE)
# Only retain rows where counts > zero
out <- out[c(!(out$left_censored == 0 & out$right_censored == 0)), ]
row.names(out) <- NULL
out
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment