Last active
February 10, 2022 18:02
-
-
Save maatthc/128392baa1924f8bb580071325603bf7 to your computer and use it in GitHub Desktop.
Cloud Formation example for Glue Spark Job with metrics and scheduler
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AWSTemplateFormatVersion: '2010-09-09' | |
Description: Cloud Formation example for Glue Spark Job with metrics and scheduler | |
Parameters: | |
ArtifactBucket: | |
Description: A global deployable artefact bucket | |
Type: String | |
Default: artefacts | |
ServiceName: | |
Description: Service Name that owns the stack when created | |
Type: String | |
Default: awesome-service | |
Environment: | |
Description: Name of the environment the service is being deployed. | |
Type: String | |
Default: dev | |
Resources: | |
MyGlueSparkJob: | |
Type: AWS::Glue::Job | |
Properties: | |
Role: !Ref MyGlueSparkJobRole | |
# TODO: Size your cluster accordingly - For each unit here your have 4vCPUs and 16GB RAM | |
AllocatedCapacity: 5 | |
Command: | |
Name: glueetl | |
ScriptLocation: !Sub s3://${ArtifactBucket}/${ServiceName}/glue_spark_etl_job.py | |
DefaultArguments: | |
"--environment": !Ref Environment | |
"--enable-metrics": 'true' | |
"--output_bucket_uri": !Sub "s3://${ServiceName}/etl_job_results_output" | |
"--input_bucketuri": !Sub "s3://${ServiceName}/etl_job_results_input" | |
ExecutionProperty: | |
MaxConcurrentRuns: 1 | |
MaxRetries: 1 | |
Name: !Sub ${Environment}-${ServiceName}-generator | |
MyGlueSparkJobJobTrigger: | |
Type: AWS::Glue::Trigger | |
Properties: | |
Type: SCHEDULED | |
Description: DESCRIPTION_SCHEDULED | |
Schedule: cron(0 0 * * ? *) | |
Actions: | |
- JobName: !Sub ${Environment}-${ServiceName}-generator | |
Name: !Sub ${Environment}-${ServiceName}-generator-trigger | |
MyGlueSparkJobRole: | |
Type: AWS::IAM::Role | |
Properties: | |
AssumeRolePolicyDocument: | |
Version: "2012-10-17" | |
Statement: | |
- | |
Effect: Allow | |
Principal: | |
Service: | |
- "glue.amazonaws.com" | |
Action: | |
- "sts:AssumeRole" | |
Path: "/" | |
Policies: | |
- | |
PolicyName: root | |
PolicyDocument: | |
Version: "2012-10-17" | |
Statement: | |
- | |
Effect: Allow | |
Action: | |
- logs:CreateLogGroup | |
- logs:CreateLogStream | |
- logs:PutLogEvents | |
Resource: arn:aws:logs:*:*:* | |
- | |
Effect: Allow | |
Action: | |
- s3:* | |
Resource: | |
- Fn::ImportValue: !Sub "s3://${ServiceName}/etl_job_results_output" | |
- Fn::Join: | |
- '' | |
- - Fn::ImportValue: "s3://${ServiceName}/etl_job_results_output" | |
- '/*' | |
- | |
Effect: Allow | |
Action: | |
- s3:* | |
Resource: | |
- arn:aws:s3:::aws-glue-*/* | |
- arn:aws:s3:::*/*aws-glue-*/* | |
- arn:aws:s3:::aws-glue-* | |
- | |
Effect: Allow | |
Action: | |
- glue:CreateDatabase | |
- glue:DeleteDatabase | |
- glue:GetDatabase | |
- glue:GetDatabases | |
- glue:UpdateDatabase | |
- glue:CreateTable | |
- glue:DeleteTable | |
- glue:BatchDeleteTable | |
- glue:UpdateTable | |
- glue:GetTable | |
- glue:GetTables | |
- glue:BatchCreatePartition | |
- glue:CreatePartition | |
- glue:DeletePartition | |
- glue:BatchDeletePartition | |
- glue:UpdatePartition | |
- glue:GetPartition | |
- glue:GetPartitions | |
- glue:BatchGetPartition | |
Resource: "*" | |
- Effect: Allow | |
Action: | |
- 'athena:*' | |
Resource: '*' | |
- | |
Effect: Allow | |
Action: | |
- cloudwatch:* | |
Resource: '*' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment