R Interview Questions

1.) If I have a data.frame `df <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6), c(7, 8, 9))`...

1a.) How do I select the `c(4, 5, 6)`?

1b.) How do I select the `1`?

1c.) How do I select the `5`?

1d.) What is `df[, 3]`?

1e.) What is `df[1,]`?

1f.) What is `df[2, 2]`?

Answers: (a) `df[]` or `df\$b`, (b) `df[][]` or `df\$a[]`, (c) `df[][]` or `df\$b[]`, (d) 7 8 9, (e) 1 4 7, (f) 5.

2.) What is the difference between a matrix and a dataframe?

Answer: A dataframe can contain heterogenous inputs and a matrix cannot. (You can have a dataframe of characters, integers, and even other dataframes, but you can't do that with a matrix -- a matrix must be all the same type.)

3a.) If I concatenate a number and a character together, what will the class of the resulting vector be?

3b.) What if I concatenate a number and a logical?

3c.) What if I concatenate a number and `NA`?

Answers: (a) character, (b) number, (c) number.

4.) What is the difference between `sapply` and `lapply`? When should you use one versus the other? Bonus: When should you use `vapply`?

Answer: Use `lapply` when you want the output to be a list, and `sapply` when you want the output to be a vector or a dataframe. Generally `vapply` is preferred over `sapply` because you can specify the output type of `vapply` (but not `sapply`). The drawback is `vapply` is more verbose and harder to use.

5.) What is the difference between `seq(4)` and `seq_along(4)`?

Answer: `seq(4)` produces a vector from 1 to 4 (`c(1, 2, 3, 4)`), whereas `seq_along(4)` produces a vector of `length(4)`, or 1 (`c(1)`).

6.) What is `f(3)` where:

```y <- 5
f <- function(x) { y <- 2; y^2 + g(x) }
g <- function(x) { x + y }```

Why?

Answer: 12. In `f(3)`, `y` is 2, so `y^2` is 4. When evaluating `g(3)`, `y` is the globally scoped `y` (5) instead of the `y` that is locally scoped to `f`, so `g(3)` evaluates to 3 + 5 or 8. The rest is just 4 + 8, or 12.

7.) I want to know all the values in `c(1, 4, 5, 9, 10)` that are not in `c(1, 5, 10, 11, 13)`. How do I do that with one built-in function in R? How could I do it if that function didn't exist?

Answer: `setdiff(c(1, 4, 5, 9, 10), c(1, 5, 10, 11, 13))` and `c(1, 4, 5, 9, 10)[!c(1, 4, 5, 9, 10) %in% c(1, 5, 10, 11, 13)`.

8.) Can you write me a function in R that replaces all missing values of a vector with the mean of that vector?

`mean_impute <- function(x) { x[is.na(x)] <- mean(x, na.rm = TRUE); x }`

9.) How do you test R code? Can you write a test for the function you wrote in #6?

Answer: You can use Hadley's testthat package. A test might look like this:

```testthat("It imputes the median correctly", {
expect_equal(mean_impute(c(1, 2, NA, 6)), 3)
})```

10.) Say I have...

`fn(a, b, c, d, e) a + b * c - d / e`

How do I call `fn` on the vector `c(1, 2, 3, 4, 5)` so that I get the same result as `fn(1, 2, 3, 4, 5)`? (No need to tell me the result, just how to do it.)

Answer: `do.call(fn, as.list(c(1, 2, 3, 4, 5)))`

11.)

```dplyr <- "ggplot2"
library(dplyr)```

Why does the dplyr package get loaded and not ggplot2?

Answer: `deparse(substitute(dplyr))`

12.)

```mystery_method <- function(x) { function(z) Reduce(function(y, w) w(y), x, z) }
fn <- mystery_method(c(function(x) x + 1, function(x) x * x))
fn(3)```

What is the value of `fn(3)`? Can you explain what is happening at each step?

Best seen in steps.

`fn(3)` requires `mystery_method` to be evaluated first.

`mystery_method(c(function(x) x + 1, function(x) x * x))` evaluates to...

`function(z) Reduce(function(y, w) w(y), c(function(x) x + 1, function(x) x * x), z)`

Now, we can see the 3 in `fn(3)` is supposed to be z, giving us...

`Reduce(function(y, w) w(y), c(function(x) x + 1, function(x) x * x), 3)`

This `Reduce` call is wonky, taking three arguments. A three argument `Reduce` call will initialize at the third argument, which is 3.

The inner function, `function(y, w) w(y)` is meant to take an argument and a function and apply that function to the argument. Luckily for us, we have some functions to apply.

That means we intialize at 3 and apply the first function, `function(x) x + 1`. 3 + 1 = 4.

We then take the value 4 and apply the second function. 4 * 4 = 16.

