Skip to content

Instantly share code, notes, and snippets.

@DavZim
Last active October 27, 2017 12:41
Show Gist options
  • Save DavZim/332be9b3ca4c830340d4e5fe5ed49d83 to your computer and use it in GitHub Desktop.
Save DavZim/332be9b3ca4c830340d4e5fe5ed49d83 to your computer and use it in GitHub Desktop.
# set parameters for the dataset
possible_values <- c(1:10, 99, NA)
n_rows <- 1000
n_cols <- 10
col_names <- letters[1:n_cols]
# create a matrix with the sampled data, that will be transformed into a data.frame in the next step
mwe_matrix <- matrix(sample(possible_values, n_rows * n_cols, replace = T),
nrow = n_rows, ncol = n_cols)
# transform into a dataset
df <- as.data.frame(mwe_matrix)
names(df) <- col_names
head(df)
# a b c d e f g h i j
# 1 6 7 7 5 7 99 5 3 99 3
# 2 4 99 8 99 10 7 99 4 NA 8
# 3 NA 7 9 99 2 4 4 1 5 5
# 4 8 9 10 6 3 2 5 10 5 2
# 5 4 9 99 10 2 4 NA 5 7 99
# 6 1 2 7 3 5 5 2 6 5 3
# Mean per Column
# wrong try 1: with NAs
apply(df, 2, mean)
# a b c d e f g h i j
# NA NA NA NA NA NA NA NA NA NA
# wrong try 2: without NAs but with 99s
apply(df, 2, mean, na.rm = T)
# a b c d e f g h i j
# 12.77802 13.66377 14.87859 14.09660 13.15812 12.59477 13.21288 13.84785 14.21098 13.08490
# compose a new function that excludes certain numbers, it defaults to excluding 99, but you can add more values if you need
mean_fun <- function(x, exclude_nums = c(99)) {
mean(x[!x %in% exclude_nums], na.rm = T)
}
apply(df, 2, mean_fun)
# a b c d e f g h i j
# 5.338061 5.444709 5.600490 5.472793 5.511876 5.462264 5.562426 5.497579 5.671801 5.452581
# also exclude 10, 9, and 8
apply(df, 2, mean_fun, exclude_nums = c(99, 10, 9, 8))
# a b c d e f g h i j
# 3.968801 4.088235 3.851852 3.891798 3.957265 4.023333 4.061329 3.984321 4.015817 3.907216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment