Skip to content

Instantly share code, notes, and snippets.

@hakanilter
Last active April 9, 2021 19:56
Show Gist options
  • Save hakanilter/7a4c4cc2a5e29563e1eb449c37ea4a18 to your computer and use it in GitHub Desktop.
Save hakanilter/7a4c4cc2a5e29563e1eb449c37ea4a18 to your computer and use it in GitHub Desktop.
AWS EMR Examples Master Setup
#!/bin/bash
# install git
sudo yum install git
# maven
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
sudo yum install -y apache-maven
mvn --version
sudo alternatives --config java
# build code
git clone https://github.com/hakanilter/aws-emr-examples
cd aws-emr-examples
mvn clean package -Pmake-jar
# download test data
wget https://s3.amazonaws.com/hw-sandbox/tutorial1/NYSE-2000-2001.tsv.gz
aws s3 cp NYSE-2000-2001.tsv.gz s3://datapyro-main/test/
# submit job
CLUSTER_ID=j-3E9EBVSDO0MF6
CLASS_NAME=com.datapyro.emr.spark.SparkS3BinaryData
JAR_LOCATION=s3://datapyro-main/lib/aws-emr-examples-1.0.0-SNAPSHOT-dist.jar
INPUT_FOLDER=s3://datapyro-main/test/NYSE*
OUTPUT_FOLDER=s3://datapyro-main/test/parquet
aws emr add-steps --cluster-id $CLUSTER_ID --steps Type=spark,Name=EmrExample,Args=[--deploy-mode,cluster,--class,$CLASS_NAME,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=false,$JAR_LOCATION,$INPUT_FOLDER,$OUTPUT_FOLDER],ActionOnFailure=CONTINUE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment