Skip to content

Instantly share code, notes, and snippets.

@ritesh
Created July 21, 2020 21:07
Show Gist options
  • Save ritesh/aaf1b560b8b362f4a8aae53a938614d2 to your computer and use it in GitHub Desktop.
Save ritesh/aaf1b560b8b362f4a8aae53a938614d2 to your computer and use it in GitHub Desktop.
A sample glue job
AWSTemplateFormatVersion: "2010-09-09"
Description: "Create a glue job to process S3 Data events"
Parameters:
LogBucket:
Type: String
GlueAssetsBucket:
Type: String
RawDBName:
Type: String
RawTableName:
Type: String
ConvertedTableName:
Type: String
ConvertedDBName:
Type: String
Resources:
GlueJobRole:
Type: AWS::IAM::Role
Properties:
ManagedPolicyArns:
- "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Principal:
Service:
- "glue.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: "/"
Policies:
-
PolicyName: "inline"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action:
- "s3:ListAllMyBuckets"
- "s3:GetBucketLocation"
Resource: "*"
-
Effect: "Allow"
Action:
- "s3:GetObject*"
Resource:
- !Sub |-
arn:aws:s3:::${LogBucket}/AWSLogs/*
- !Sub |-
arn:aws:s3:::${GlueAssetsBucket}/*
- Effect: "Allow"
Action:
- "s3:GetObject*"
- "s3:PutObject*"
- "s3:Delete*"
Resource:
- !Sub |-
arn:aws:s3:::${LogBucket}/converted2/*
- !Sub |-
arn:aws:s3:::${LogBucket}/temp2/*
MyJob:
Type: AWS::Glue::Job
Properties:
Command:
Name: glueetl
ScriptLocation: !Sub "s3://${GlueAssetsBucket}/agsl/glue_scripts/sample_cloudtrail_s3_job.py"
DefaultArguments:
"--job-bookmark-option": "job-bookmark-enable"
"--raw_database_name": !Sub "${RawDBName}"
"--raw_table_name": !Sub "${RawTableName}"
"--converted_database_name": !Sub "${ConvertedDBName}"
"--converted_table_name": !Sub "${ConvertedTableName}"
"--TempDir": !Sub "s3://${LogBucket}/temp2"
"--s3_converted_target": !Sub "s3://${LogBucket}/converted2/s3_events_parquet/"
"--s3_source_location": !Sub "s3://${LogBucket}/AWSLogs/${AWS::AccountId}/CloudTrail/"
"--extra-py-files": !Sub "s3://${GlueAssetsBucket}/agsl/glue_scripts/athena_glue_converter_latest.zip"
ExecutionProperty:
MaxConcurrentRuns: 2
MaxRetries: 0
Name: S3DataEventConverter
Role: !Ref GlueJobRole
MyJobTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: "S3DataEventConverterTrigger"
Description: "Runs the S3 Data event converter every 4 hours"
Type: SCHEDULED
Schedule: cron(0 */4 * * ? *)
Actions:
- JobName: !Ref MyJob
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment