Skip to content

Instantly share code, notes, and snippets.

@csgillespie
Last active October 11, 2017 21:35
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save csgillespie/fccde48eba798e6a98613468cef301ff to your computer and use it in GitHub Desktop.
Save csgillespie/fccde48eba798e6a98613468cef301ff to your computer and use it in GitHub Desktop.
odsc

Overview

This session will be a mixture of lectures and short practical sessions using R.

Software

Please make sure you are using the latest version of R (current version is 3.4.1) - the final digit is the minor version number. The essential numbers are the first two. You can check the version of R you are running via

R.version.string

Please install the latest version of RStudio (https://www.rstudio.com/products/rstudio/download/) (or another suitable IDE).

We'll also need a couple of packages

install.packages(c("mvtnorm", "glmnet"))

We'll also use the following function in the lectures

get_data = function(n = 100, total_p = 50, real_p = 15, 
                    train_prop = 0.66, 
                    sd = 1,
                    seed = NULL) {
  if(!is.null(seed)) set.seed(seed)
  
  x = matrix(rnorm(n*total_p), nrow=n, ncol=total_p)
  y = apply(x[,1:real_p], 1, sum) + rnorm(n, sd = sd)
  
  train_rows = sample(1:n, train_prop * n)
  l = list()
  l[["x_train"]] = x[train_rows, ]
  l[["x_test"]] = x[-train_rows, ]
  
  l[["y_train"]] = y[train_rows]
  l[["y_test"]] = y[-train_rows]
  l
}

Slides & Notes

  • Links to be added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment