Skip to content

Instantly share code, notes, and snippets.

@mrtns
Last active July 10, 2019 13:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mrtns/f1ca66cf5aaea3eff13a to your computer and use it in GitHub Desktop.
Save mrtns/f1ca66cf5aaea3eff13a to your computer and use it in GitHub Desktop.
AWS EMR Spark

Install Pre-Requisites

Install AWS CLI

sudo pip install awscli

Configure AWS CLI Keys

aws configure

Create KeyPair

aws ec2 create-key-pair --key-name aws_nrd-io_mrtn_keypair --query 'KeyMaterial' --output text > ~/aws_nrd-io_mrtn_keypair.pem

chmod 400 ~/aws_nrd-io_mrtn_keypair.pem

aws ec2 describe-key-pairs --key-name aws_nrd-io_mrtn_keypair

Create EMR Default Roles

Launch Spark Cluster

aws emr create-default-roles

aws emr create-cluster --name "Spark" --release-label emr-4.2.0 --applications Name=Spark --ec2-attributes KeyName=aws_nrd-io_mrtn_keypair,InstanceProfile=EMR_EC2_DefaultRole --instance-type m3.xlarge --instance-count 3 --configurations file://./spark_emr_config.json --no-termination-protected --no-auto-terminate --visible-to-all-users --service-role EMR_DefaultRole

TODO:

  • Define custom service role
  • Define custom instance profile
  • Define custom VPC
  • Bootstrap Zeppelin on Master
  • Configure EMRFS Consistent View
  • Configure S3 logging
  • Configure outboud JDBC connection policy (IAM?) for master, slaves

Admin

Query Cluster Status

aws emr describe-cluster --cluster-id j-LKB1W2TOTQ0H

Terminate Cluster

aws emr terminate-clusters --cluster-ids j-LKB1W2TOTQ0H
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment