Skip to content

Instantly share code, notes, and snippets.

@michaeljfazio
Created May 13, 2020 15:07
Show Gist options
  • Save michaeljfazio/2fcff1772c9b833af5cb00cc5fbe8e7d to your computer and use it in GitHub Desktop.
Save michaeljfazio/2fcff1772c9b833af5cb00cc5fbe8e7d to your computer and use it in GitHub Desktop.
GeoWave AWS
AWSTemplateFormatVersion: 2010-09-09
Parameters:
Environment:
Type: String
Default: Production
Description: The name of the enviornment this cluster is being deployed to.
EmrVersion:
Type: String
Default: emr-5.29.0
AllowedValues:
- emr-5.29.0
- emr-5.28.1
- emr-5.28.0
- emr-5.27.0
- emr-5.26.0
Description: The version of Amazon AWS EMR.
GeoWaveVersion:
Type: String
Default: 1.1.0
AllowedValues:
- 1.1.0
- 0.9.8
- 0.9.7
- 0.9.6
- 0.9.5
- 0.9.4
- 0.9.3
Description: >
The version of GeoWave that will be deployed to the EMR cluster.
S3Bucket:
Type: String
Description: >
The bucket that will be used to persis ALL data associated with this cluster. This includes the
HBase root folder, logs, and any associated Jupyter Notebooks.
SubnetId:
Type: AWS::EC2::Subnet::Id
Description: The subnet that the EMR cluster will be deployed in.
Ec2KeyName:
Type: AWS::EC2::KeyPair::KeyName
Description: >
The name of the EC2 key pair that will be used to provision the master node.
Resources:
# TODO: Performance tuning as per https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html
GeoWaveCluster:
Type: AWS::EMR::Cluster
Properties:
Name: !Sub '${AWS::StackName}-${Environment}'
JobFlowRole: !Ref EmrEc2InstanceProfile
ServiceRole: !Ref EmrRole
ReleaseLabel: !Ref EmrVersion
LogUri: !Sub "s3://${S3Bucket}/logs"
Applications:
- Name: Hadoop
- Name: Hue
- Name: Hive
- Name: Pig
- Name: Spark
- Name: HBase
- Name: ZooKeeper
- Name: Livy
- Name: JupyterHub
Configurations:
- Classification: 'hbase-site'
ConfigurationProperties:
'hbase.rootdir': !Sub "s3://${S3Bucket}/hbase-root"
- Classification: 'hbase'
ConfigurationProperties:
'hbase.emr.storageMode': 's3'
- Classification: 'emrfs-site'
ConfigurationProperties:
'fs.s3.consistent': 'true'
- Classification: 'jupyter-s3-conf'
ConfigurationProperties:
's3.persistence.enabled': 'true'
's3.persistence.bucket': !Ref S3Bucket
- Classification: 'hive-site'
ConfigurationProperties:
'hive.metastore.client.factory.class': 'com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory'
BootstrapActions:
- Name: GeoWave Bootstrap
ScriptBootstrapAction:
Path: !Sub 's3://geowave/${GeoWaveVersion}/scripts/emr/hbase/bootstrap-geowave.sh'
- Name: GeoWave Jupyter Notebook Bootstrap
ScriptBootstrapAction:
Path: !Sub 's3://geowave/${GeoWaveVersion}/scripts/emr/jupyter/bootstrap-jupyter.sh'
EbsRootVolumeSize: 32
Instances:
MasterInstanceFleet:
Name: Master
TargetOnDemandCapacity: 1
InstanceTypeConfigs:
- InstanceType: m5.xlarge
CoreInstanceFleet:
Name: Core
TargetSpotCapacity: 1
InstanceTypeConfigs:
- InstanceType: m4.xlarge
BidPriceAsPercentageOfOnDemandPrice: 35
Ec2KeyName: !Ref Ec2KeyName
Ec2SubnetId: !Ref SubnetId
EmrRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2008-10-17
Statement:
- Effect: Allow
Principal:
Service: elasticmapreduce.amazonaws.com
Action: 'sts:AssumeRole'
Path: /
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole'
Tags:
- Key: Environment
Value: !Ref Environment
EmrEc2Role:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: 'sts:AssumeRole'
Path: /
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role'
- 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'
Tags:
- Key: Environment
Value: !Ref Environment
EmrEc2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles:
- !Ref EmrEc2Role
Outputs:
GeoWaveCluster:
Value: !Ref GeoWaveCluster
EmrEc2Role:
Value: !Ref EmrEc2Role
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment