Skip to content

Instantly share code, notes, and snippets.

title output
List Column Switch
html_notebook
toc toc_float
true
true

Summary

title output
List Column Switch
html_notebook
toc toc_float
true
true

Summary

title output
List Column Switch
html_notebook
toc toc_float
true
true

Summary

@jaeddy
jaeddy / rstudio_ami_guide.md
Last active January 22, 2024 21:36
steps for creating and configuring a new AMI with RStudio Server

Building a new RStudio Server AMI

The steps below can be followed to create a new AMI for use with Amazon EC2 instances that includes the latest versions of R, RStudio, and RStudio Server. The idea is inspired by the work of Louis Aslett, who creates and hosts his own public AMIs for RStudio. My own goal was to create an AMI with RStudio v1.0.0 or higher, such that I could use the recent R Notebooks feature. However, the instructions should generally apply for whenever you might be impatient accessing the latest version of R-related software on AWS (via an interactive browser interface...).

Getting started

  1. Create a new EC2 instance with the latest Ubuntu AMI (should be fine to do with Spot); based on Louis Aslett's AMI, I opted to include a general purpose SSD EBS volume with 10GB of storage space
  2. SSH into the instance

Downloading/installing RStudio Server

Data science / big data techniques, described in 40 words or less

Collection of common data science terms, tools, and concepts with definitions, as assembled by Vincent Granville in an analyticbridge blog post. (accessed 07/24/2014)

Adjusted R^2 (R-Square)

The method preferred by statisticians for determining which variables to include in a model. It is a modified version of R^2 which penalizes each new variable on the basis of how many have already been admitted. Due to its construct, R^2 will always increase as you add new variables, which result in models that over-fit the data and have poor predictive ability. Adjusted R^2 results in more parsimonious models that admit new variables only if the improvement in fit is larger than the penalty, which improves the ultimate goal of out-of-sample prediction. (Submitted by Santiago Perez)

Cluster Analysis