Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View cosmincatalin's full-sized avatar
👻

Cosmin Catalin SANDA cosmincatalin

👻
View GitHub Profile
@cosmincatalin
cosmincatalin / readme.md
Last active October 27, 2022 11:07
AWS EMR bootstrap to install R packages from CRAN

AWS EMR bootstrap to install R packages from CRAN

This bootstrap is useful if you want to deploy SparkR applications that run arbitrary code on the EMR cluster's workers. The R code will need to have its dependencies already installed on each of the workers, and will fail otherwise. This is the case if you use functions such as gapply or dapply.

How to use the bootstrap

  1. You will first have to download the gist to a file and then upload it to S3 in a bucket of your choice.
  2. Using the AWS EMR Console create a cluster and choose advanced options.
  3. In Step 3 you can configure your bootstraps. Choose to Configure and add a Custom action
@cosmincatalin
cosmincatalin / install-jupyter.sh
Last active April 17, 2023 14:23
AWS EMR bootstraps to install Jupyter (R, SparkR, Python 2, Python 3, PySpark)
#!/bin/bash
MINICONDA_VERSION="4.3.21"
PANDAS_VERSION="0.20.3"
SCIKIT_VERSION="0.19.0"
while [[ $# > 1 ]]; do
key="$1"
case $key in