Skip to content

Instantly share code, notes, and snippets.

@josephsdavid
Last active May 26, 2019 22:42
Show Gist options
  • Save josephsdavid/0abc711b22598f47d4901acd141e43eb to your computer and use it in GitHub Desktop.
Save josephsdavid/0abc711b22598f47d4901acd141e43eb to your computer and use it in GitHub Desktop.
Example R for DDS

Example for homework 3

Making a MWE (minimal working example)

test<-mtcars
test[2,] <- NA
head(test)
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag       NA  NA   NA  NA   NA    NA    NA NA NA   NA   NA
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Now lets figure out some ways to grab three columns. We can do this in at least three ways :)

Grabbing three columns in base R

head(test[c('mpg','disp','hp')])
#                    mpg disp  hp
# Mazda RX4         21.0  160 110
# Mazda RX4 Wag       NA   NA  NA
# Datsun 710        22.8  108  93
# Hornet 4 Drive    21.4  258 110
# Hornet Sportabout 18.7  360 175
# Valiant           18.1  225 105

Grabbing three columns using column indexes in base R

head(test[,c('mpg','disp','hp')])
#                    mpg disp  hp
# Mazda RX4         21.0  160 110
# Mazda RX4 Wag       NA   NA  NA
# Datsun 710        22.8  108  93
# Hornet 4 Drive    21.4  258 110
# Hornet Sportabout 18.7  360 175
# Valiant           18.1  225 105

Bonus: Using dplyr %>% pipes %>%

library(dplyr)
test %>% select(mpg,disp,hp) %>% head

#                    mpg disp  hp
# Mazda RX4         21.0  160 110
# Mazda RX4 Wag       NA   NA  NA
# Datsun 710        22.8  108  93
# Hornet 4 Drive    21.4  258 110
# Hornet Sportabout 18.7  360 175
# Valiant           18.1  225 105

Now lets do the lapply statement and remove some NAs, we can do this in several ways:

Modifying the object

We can remove the NAs by modifying the data itself, as in the following example with base R:

cleantest <- na.omit(test)
sapply(cleantest[,c('mpg','disp','hp')],mean)
#       mpg      disp        hp 
#  20.06129 233.00323 147.87097 

And similarly in this example with dplyr (which we will learn about soon). In both cases we are modifying the data and then applying the mean function to it.

test %>% select(mpg,disp,hp) %>% na.omit %>% lapply(mean)
#       mpg      disp        hp 
#  20.06129 233.00323 147.87097 

Modifying the function

Instead of modifying the data, we may want to modify the function instead. Rather than modify our data and applying the default mean function to it, we can modify our function and apply it to the default data. This is shown below in two of many ways to do this:

Base R

sapply(test[,c('mpg','disp','hp')], mean, na.rm = TRUE)
#       mpg      disp        hp 
#  20.06129 233.00323 147.87097 

Bonus dplyr method:

test %>% select(mpg,disp,hp) %>% sapply(mean, na.rm = TRUE)
#       mpg      disp        hp 
#  20.06129 233.00323 147.87097 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment