Instantly share code, notes, and snippets.

Created September 27, 2013 20:24
Show Gist options
• Save hadley/6734639 to your computer and use it in GitHub Desktop.
My first stab at a basic R programming curriculum. I think teaching just these topics without overall motivating examples would be extremely boring, but if you're a self-taught R user, this might be useful to help spot your gaps.

Notes:

• I've tried to break up in to separate pieces, but it's not always possible: e.g. knowledge of data structures and subsetting are tidy intertwined.

• Level of Bloom's taxonomy listed in square brackets, e.g. http://bit.ly/15gqPEx. Few categories currently assess components higher in the taxonomy.

# Programming R curriculum

## Data structures

• basic data structures (vector, matrix, list and data frame):

• list and describe their differences (dimensionality, homogeneous vs. heterogeneous) [knowledge]

• pick the best data structure for a given problem [application]

• recall functions to coerce data structures between different forms [knowledge], and recognise which coercions are lossy [comprehension]

• match data types and the functions that identify them, and remember common gotchas (is.vector, is.numeric etc.) [comprehension]

• `str`:

• interpret the output of `str` [comprehension]

• use `str` and subsetting to extract desired pieces from an arbitrary object (for example, extract the r squared value from a linear model) [application]

• vectors:

• recognise which types of data corresponding to the four common atomic vectors (character, double, integer, logical) [knowledge]

• recognise the use of `L` to create integer vectors [knowledge]

• create new vectors with `c()`, and correctly predict vector type when multiple types are mixed (e.g. what is the type of `c(1, 1L, F)`) [application]

• create named vectors with `c()`, recognise how named vectors are printed and how to extract values with character subsetting [application]

• employ implicit logical to numerical coercion to compute number and proportion of TRUEs in a vector (e.g. what proportion of values are missing?) [application]

• predict how missing values propagate [application], and discuss why `is.na()` is necessary [synthesis]

• data frames:

• use `data.frame()` to create a data frame from multiple vectors, and control the names of the generated columns [application]

• describe the situations under which strings are coerced to factors, and recall how to use `I`, `asis = TRUE` or `stringsAsFactors = FALSE` to prevent conversion [knowledge]

• combine two or more data frames with `cbind()` and `rbind()`, and describe what conditions must be true for the combination to work [knowledge]

• use `head()`, `tail()`, `summary()` and `str()` to get an overview of a data frame [application]

• describe how 1d and 2d subsetting of data frame differ, and enumerate the circumstances under which subsetting a data frame will return a column instead of a data frame [comprehension]

• matrices

• contrast 1d vector operations and 2d matrix operations (e.g. `names()` vs. `colnames()` & `rownames()`, `length()` vs `nrow()` and `ncol()`). [analysis]

• predict the output when a matrix is coerced into a vector (i.e. remember that R matrices are stored col-wise)

• lists

• create a new list with `list()`, and selectively name components [application]

• convert a list into a vector with unlist, and apply implicit coercion rules to predict type of output [application]

• NULL

• strings vs. factors vs. ordered factors

• recall the key differences (cardinality, ordering) between strings, factors and ordered factors [knowledge]

• select the most appropriate type for a given variable [analysis]

• describe the operation of `drop = TRUE`, when it is needed, and remedies if you are using it frequently [application]

• match data types with conversion and testing functions, and list common gotchas (e.g. converting an ordered factor to a factor) [knowledge]

• know enough about floating point math to predict the output of `sqrt(2)^ 2 - 2 == 0` and spot potentially hazardous use of equality comparisons [application]

## Subsetting

• types of subsetting

• match the six types of subsetting objects with their results [knowledge]

• compare and contrast the use of subsetting, `match` and `%in%` when looking for matching values across two vectors [application]

• use integer subsetting to order multidimensional structures [application]

• apply De Morgan's rule to simplify a complicated double negation [application]

• identify uses of `which()` that are redundant (i.e. only need which you want the position of nth TRUE) [analysis]

• use repeated values in numeric indexing to create a "subset" that is larger than the original set [application]

• use character subsetting to create a lookup table [application]

• understand how 1d subsetting generalises to 2d subsetting [comprehension]

• describe the difference between simplifying and preserving subsetting (`[`` vs `[[`, when `drop = FALSE` is necessary) [analysis]

• understand the difference between `x\$y` and `x[["y"]]` and know when to use each form [application]

• use subsetting with assignment to change multiple values in a data structure at once [application]

• use subsetting with assignment and NULL to remove elements from a list/data frame [application]

• identify when subsetting + assignment will fail because the number of values to assign does not match the number of values in the subset [analysis]

• use R's boolean operators to recreate english expressions (e.g. x is less than 50 and more than 25). Recall the difference between R's or and or in regular English. [application]

• compare and contrast `&` and `|` with `&&` and `||` [analysis]

## Input and output

• identify the correct function to read/write a data frame to/from disk (csv, tab delimited or fixed width file) [application]

• use common arguments (`na.string`, `sep`, `header`) to deal with files that have unusual structure [analysis]

• recongise the lack of symmetry between `read.csv()` and `write.csv()`, and describe which options should be used by default [knowledge]

• use subset & transform to reduce the amount of typing for common data manipulation operations [knowledge]

• use `readRDS`/`saveRDS` to cache binary R objects that were expensive to compute [application]

• understand what `save()` and `load()` do, how they differ from `readRDS()` and `saveRDS()` [knowledge] and when to use them instead of the single object variants [evaluation]

## Functions & control flow

• convert a simple script into parameterised functions [synthesis]

• describe a simple R function in words [synthesis]

• describe R's argument matching semantics (position, partial, exact) [knowledge], predict how they apply in a specific situation [application], and evaluate good and less-good use of the three different types [evaluation]

• describe the parts of a function using correct terminology: body, formal arguments, return value [comprehension]

• use scoping rules to predict how names are mapped to values [application]

• describe short-circuiting and its impact on expressions like `is.null(x) || all(is.na(x))` or `TRUE || stop("!")`

• execute a script of R code with `source())`

## Control flow

• describe the structure of an if statement [comprehension]

• use a for loop to repeat the same operation on different elements of a data structure [application]

• convert a for loop to a while loop [analysis]

• illustrate why `1:length(x)` is dangerous and suggest a safer way [application]

• correct the identing and spacing of a piece of poorly formatted source code [application]

## Vectorisation/recycling

• describe what vectorisation means, distinguish internal and external vectorisation, and the performance consequence of each functions [knowledge]

• use vectorised operations instead of for loops to perform simple mathematical operations (log, addition, subtraction etc.) [application]

• use `lapply()`, `sapply()` and `apply()` to vectorise operations that are not already vectorised. [analysis]

• convert an `lapply()` call to a for loop [application]

• recognise a for-loop that can be rewritten to use `lapply` [knowledge]

• match common non-vectorised equivalents to their vectorised equivalents (e.g. `min()` and `pmin()`, `sum()` to `cumsum()` and `colSums()`) [knowledge]

• describe basic recycling rules, and know how to avoid them when necesary [knowledge]

## Recovering from errors

• recognise and remedy simple syntax errors (missing quotes, missing parentheses etc.) [comprehension]

• use `try()` to recover from an error [application]

• interpret the output of `traceback()`` to identify where an error occured [application]

• initiate an interactive debugger with `browser()` or `options(error = recover())` [application]

• list the commands used to control `browser()`/`recover()` [knowledge]

• use `options(warn = 2)` to convert warnings into errors for debug

• create a minimal reproducible example to get help from others [synthesis]

• find help for a function, data set, and package [knowledge]

• read and interpret the documentation of a function [analysis]

• use google to identify the name of a function that performs a given task

## Package management

• install a packages with `install.packages()` [comprehension]

• load a package with `library()` or `require()` [comprehension]

• determine which packages are out of date [application]

• understand lifetime of `install.packages`/`library` effects [comprehension]

• use `::` to refer to a function in a specific package

### hadley commented Sep 30, 2013

@vsbuffalo good point - I'll add a bullet

### wabarr commented Oct 6, 2013

What about file system commands like copying, listing files in directory etc? And string manipulation (base and/or stringr)?