test<-mtcars
test[2,] <- NA
head(test)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag NA NA NA NA NA NA NA NA NA NA NA
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Now lets figure out some ways to grab three columns. We can do this in at least three ways :)
head(test[c('mpg','disp','hp')])
# mpg disp hp
# Mazda RX4 21.0 160 110
# Mazda RX4 Wag NA NA NA
# Datsun 710 22.8 108 93
# Hornet 4 Drive 21.4 258 110
# Hornet Sportabout 18.7 360 175
# Valiant 18.1 225 105
head(test[,c('mpg','disp','hp')])
# mpg disp hp
# Mazda RX4 21.0 160 110
# Mazda RX4 Wag NA NA NA
# Datsun 710 22.8 108 93
# Hornet 4 Drive 21.4 258 110
# Hornet Sportabout 18.7 360 175
# Valiant 18.1 225 105
library(dplyr)
test %>% select(mpg,disp,hp) %>% head
# mpg disp hp
# Mazda RX4 21.0 160 110
# Mazda RX4 Wag NA NA NA
# Datsun 710 22.8 108 93
# Hornet 4 Drive 21.4 258 110
# Hornet Sportabout 18.7 360 175
# Valiant 18.1 225 105
Now lets do the lapply statement and remove some NAs, we can do this in several ways:
We can remove the NAs by modifying the data itself, as in the following example with base R:
cleantest <- na.omit(test)
sapply(cleantest[,c('mpg','disp','hp')],mean)
# mpg disp hp
# 20.06129 233.00323 147.87097
And similarly in this example with dplyr (which we will learn about soon). In both cases we are modifying the data and then applying the mean function to it.
test %>% select(mpg,disp,hp) %>% na.omit %>% lapply(mean)
# mpg disp hp
# 20.06129 233.00323 147.87097
Instead of modifying the data, we may want to modify the function instead. Rather than modify our data and applying the default mean function to it, we can modify our function and apply it to the default data. This is shown below in two of many ways to do this:
sapply(test[,c('mpg','disp','hp')], mean, na.rm = TRUE)
# mpg disp hp
# 20.06129 233.00323 147.87097
test %>% select(mpg,disp,hp) %>% sapply(mean, na.rm = TRUE)
# mpg disp hp
# 20.06129 233.00323 147.87097