Skip to content

Instantly share code, notes, and snippets.

@hadley
Created September 27, 2013 20:24
Show Gist options
  • Save hadley/6734639 to your computer and use it in GitHub Desktop.
Save hadley/6734639 to your computer and use it in GitHub Desktop.
My first stab at a basic R programming curriculum. I think teaching just these topics without overall motivating examples would be extremely boring, but if you're a self-taught R user, this might be useful to help spot your gaps.

Notes:

  • I've tried to break up in to separate pieces, but it's not always possible: e.g. knowledge of data structures and subsetting are tidy intertwined.

  • Level of Bloom's taxonomy listed in square brackets, e.g. http://bit.ly/15gqPEx. Few categories currently assess components higher in the taxonomy.

Programming R curriculum

Data structures

  • basic data structures (vector, matrix, list and data frame):

    • list and describe their differences (dimensionality, homogeneous vs. heterogeneous) [knowledge]

    • pick the best data structure for a given problem [application]

    • recall functions to coerce data structures between different forms [knowledge], and recognise which coercions are lossy [comprehension]

    • match data types and the functions that identify them, and remember common gotchas (is.vector, is.numeric etc.) [comprehension]

  • str:

    • interpret the output of str [comprehension]

    • use str and subsetting to extract desired pieces from an arbitrary object (for example, extract the r squared value from a linear model) [application]

  • vectors:

    • recognise which types of data corresponding to the four common atomic vectors (character, double, integer, logical) [knowledge]

    • recognise the use of L to create integer vectors [knowledge]

    • create new vectors with c(), and correctly predict vector type when multiple types are mixed (e.g. what is the type of c(1, 1L, F)) [application]

    • create named vectors with c(), recognise how named vectors are printed and how to extract values with character subsetting [application]

    • employ implicit logical to numerical coercion to compute number and proportion of TRUEs in a vector (e.g. what proportion of values are missing?) [application]

    • predict how missing values propagate [application], and discuss why is.na() is necessary [synthesis]

  • data frames:

    • use data.frame() to create a data frame from multiple vectors, and control the names of the generated columns [application]

    • describe the situations under which strings are coerced to factors, and recall how to use I, asis = TRUE or stringsAsFactors = FALSE to prevent conversion [knowledge]

    • combine two or more data frames with cbind() and rbind(), and describe what conditions must be true for the combination to work [knowledge]

    • use head(), tail(), summary() and str() to get an overview of a data frame [application]

    • describe how 1d and 2d subsetting of data frame differ, and enumerate the circumstances under which subsetting a data frame will return a column instead of a data frame [comprehension]

  • matrices

    • contrast 1d vector operations and 2d matrix operations (e.g. names() vs. colnames() & rownames(), length() vs nrow() and ncol()). [analysis]

    • predict the output when a matrix is coerced into a vector (i.e. remember that R matrices are stored col-wise)

  • lists

    • create a new list with list(), and selectively name components [application]

    • convert a list into a vector with unlist, and apply implicit coercion rules to predict type of output [application]

  • NULL

  • strings vs. factors vs. ordered factors

    • recall the key differences (cardinality, ordering) between strings, factors and ordered factors [knowledge]

    • select the most appropriate type for a given variable [analysis]

    • describe the operation of drop = TRUE, when it is needed, and remedies if you are using it frequently [application]

    • match data types with conversion and testing functions, and list common gotchas (e.g. converting an ordered factor to a factor) [knowledge]

  • know enough about floating point math to predict the output of sqrt(2)^ 2 - 2 == 0 and spot potentially hazardous use of equality comparisons [application]

Subsetting

  • types of subsetting

    • match the six types of subsetting objects with their results [knowledge]

    • compare and contrast the use of subsetting, match and %in% when looking for matching values across two vectors [application]

    • use integer subsetting to order multidimensional structures [application]

    • apply De Morgan's rule to simplify a complicated double negation [application]

    • identify uses of which() that are redundant (i.e. only need which you want the position of nth TRUE) [analysis]

    • use repeated values in numeric indexing to create a "subset" that is larger than the original set [application]

    • use character subsetting to create a lookup table [application]

  • understand how 1d subsetting generalises to 2d subsetting [comprehension]

  • describe the difference between simplifying and preserving subsetting ([`` vs [[, when drop = FALSE` is necessary) [analysis]

  • understand the difference between x$y and x[["y"]] and know when to use each form [application]

  • use subsetting with assignment to change multiple values in a data structure at once [application]

  • use subsetting with assignment and NULL to remove elements from a list/data frame [application]

  • identify when subsetting + assignment will fail because the number of values to assign does not match the number of values in the subset [analysis]

  • use R's boolean operators to recreate english expressions (e.g. x is less than 50 and more than 25). Recall the difference between R's or and or in regular English. [application]

  • compare and contrast & and | with && and || [analysis]

Input and output

  • identify the correct function to read/write a data frame to/from disk (csv, tab delimited or fixed width file) [application]

  • use common arguments (na.string, sep, header) to deal with files that have unusual structure [analysis]

  • recongise the lack of symmetry between read.csv() and write.csv(), and describe which options should be used by default [knowledge]

  • use subset & transform to reduce the amount of typing for common data manipulation operations [knowledge]

  • use readRDS/saveRDS to cache binary R objects that were expensive to compute [application]

  • understand what save() and load() do, how they differ from readRDS() and saveRDS() [knowledge] and when to use them instead of the single object variants [evaluation]

Functions & control flow

  • convert a simple script into parameterised functions [synthesis]

  • describe a simple R function in words [synthesis]

  • describe R's argument matching semantics (position, partial, exact) [knowledge], predict how they apply in a specific situation [application], and evaluate good and less-good use of the three different types [evaluation]

  • describe the parts of a function using correct terminology: body, formal arguments, return value [comprehension]

  • use scoping rules to predict how names are mapped to values [application]

  • describe short-circuiting and its impact on expressions like is.null(x) || all(is.na(x)) or TRUE || stop("!")

  • execute a script of R code with source())

Control flow

  • describe the structure of an if statement [comprehension]

  • use a for loop to repeat the same operation on different elements of a data structure [application]

  • convert a for loop to a while loop [analysis]

  • illustrate why 1:length(x) is dangerous and suggest a safer way [application]

  • correct the identing and spacing of a piece of poorly formatted source code [application]

Vectorisation/recycling

  • describe what vectorisation means, distinguish internal and external vectorisation, and the performance consequence of each functions [knowledge]

  • use vectorised operations instead of for loops to perform simple mathematical operations (log, addition, subtraction etc.) [application]

  • use lapply(), sapply() and apply() to vectorise operations that are not already vectorised. [analysis]

  • convert an lapply() call to a for loop [application]

  • recognise a for-loop that can be rewritten to use lapply [knowledge]

  • match common non-vectorised equivalents to their vectorised equivalents (e.g. min() and pmin(), sum() to cumsum() and colSums()) [knowledge]

  • describe basic recycling rules, and know how to avoid them when necesary [knowledge]

Recovering from errors

  • recognise and remedy simple syntax errors (missing quotes, missing parentheses etc.) [comprehension]

  • use try() to recover from an error [application]

  • interpret the output of `traceback()`` to identify where an error occured [application]

  • initiate an interactive debugger with browser() or options(error = recover()) [application]

  • list the commands used to control browser()/recover() [knowledge]

  • use options(warn = 2) to convert warnings into errors for debug

  • create a minimal reproducible example to get help from others [synthesis]

  • find help for a function, data set, and package [knowledge]

  • read and interpret the documentation of a function [analysis]

  • use google to identify the name of a function that performs a given task

Package management

  • install a packages with install.packages() [comprehension]

  • load a package with library() or require() [comprehension]

  • determine which packages are out of date [application]

  • understand lifetime of install.packages/library effects [comprehension]

  • use :: to refer to a function in a specific package

@hadley
Copy link
Author

hadley commented Sep 30, 2013

@vsbuffalo good point - I'll add a bullet

@wabarr
Copy link

wabarr commented Oct 6, 2013

What about file system commands like copying, listing files in directory etc? And string manipulation (base and/or stringr)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment