Skip to content

Instantly share code, notes, and snippets.

@maatthc
Last active February 10, 2022 18:02
Show Gist options
  • Save maatthc/128392baa1924f8bb580071325603bf7 to your computer and use it in GitHub Desktop.
Save maatthc/128392baa1924f8bb580071325603bf7 to your computer and use it in GitHub Desktop.
Cloud Formation example for Glue Spark Job with metrics and scheduler
AWSTemplateFormatVersion: '2010-09-09'
Description: Cloud Formation example for Glue Spark Job with metrics and scheduler
Parameters:
ArtifactBucket:
Description: A global deployable artefact bucket
Type: String
Default: artefacts
ServiceName:
Description: Service Name that owns the stack when created
Type: String
Default: awesome-service
Environment:
Description: Name of the environment the service is being deployed.
Type: String
Default: dev
Resources:
MyGlueSparkJob:
Type: AWS::Glue::Job
Properties:
Role: !Ref MyGlueSparkJobRole
# TODO: Size your cluster accordingly - For each unit here your have 4vCPUs and 16GB RAM
AllocatedCapacity: 5
Command:
Name: glueetl
ScriptLocation: !Sub s3://${ArtifactBucket}/${ServiceName}/glue_spark_etl_job.py
DefaultArguments:
"--environment": !Ref Environment
"--enable-metrics": 'true'
"--output_bucket_uri": !Sub "s3://${ServiceName}/etl_job_results_output"
"--input_bucketuri": !Sub "s3://${ServiceName}/etl_job_results_input"
ExecutionProperty:
MaxConcurrentRuns: 1
MaxRetries: 1
Name: !Sub ${Environment}-${ServiceName}-generator
MyGlueSparkJobJobTrigger:
Type: AWS::Glue::Trigger
Properties:
Type: SCHEDULED
Description: DESCRIPTION_SCHEDULED
Schedule: cron(0 0 * * ? *)
Actions:
- JobName: !Sub ${Environment}-${ServiceName}-generator
Name: !Sub ${Environment}-${ServiceName}-generator-trigger
MyGlueSparkJobRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Principal:
Service:
- "glue.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: "/"
Policies:
-
PolicyName: root
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: arn:aws:logs:*:*:*
-
Effect: Allow
Action:
- s3:*
Resource:
- Fn::ImportValue: !Sub "s3://${ServiceName}/etl_job_results_output"
- Fn::Join:
- ''
- - Fn::ImportValue: "s3://${ServiceName}/etl_job_results_output"
- '/*'
-
Effect: Allow
Action:
- s3:*
Resource:
- arn:aws:s3:::aws-glue-*/*
- arn:aws:s3:::*/*aws-glue-*/*
- arn:aws:s3:::aws-glue-*
-
Effect: Allow
Action:
- glue:CreateDatabase
- glue:DeleteDatabase
- glue:GetDatabase
- glue:GetDatabases
- glue:UpdateDatabase
- glue:CreateTable
- glue:DeleteTable
- glue:BatchDeleteTable
- glue:UpdateTable
- glue:GetTable
- glue:GetTables
- glue:BatchCreatePartition
- glue:CreatePartition
- glue:DeletePartition
- glue:BatchDeletePartition
- glue:UpdatePartition
- glue:GetPartition
- glue:GetPartitions
- glue:BatchGetPartition
Resource: "*"
- Effect: Allow
Action:
- 'athena:*'
Resource: '*'
-
Effect: Allow
Action:
- cloudwatch:*
Resource: '*'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment