Skip to content

Instantly share code, notes, and snippets.

@ritesh
Created July 31, 2020 11:08
Show Gist options
  • Save ritesh/9b5fee99a33e753619b3b5b2fc8fc9ce to your computer and use it in GitHub Desktop.
Save ritesh/9b5fee99a33e753619b3b5b2fc8fc9ce to your computer and use it in GitHub Desktop.
VPCFlowlogs
[
{
"ParameterKey": "RawDBName",
"ParameterValue": "raw_db_vpc_flow_logs"
},
{
"ParameterKey": "RawTableName",
"ParameterValue": "raw_table_vpc_flow_logs"
},
{
"ParameterKey": "ConvertedDBName",
"ParameterValue": "converted_db_vpc_flow_logs"
},
{
"ParameterKey": "ConvertedTableName",
"ParameterValue": "converted_table_vpc_flow_logs"
},
{
"ParameterKey": "GlueJobRole",
"ParameterValue": "the-IAM-role-you-want-glue-to-use"
},
{
"ParameterKey": "FlowLogBucket",
"ParameterValue": "where-the-vpc-flow-logs-are"
},
{
"ParameterKey": "GlueSecurityConfiguration",
"ParameterValue": "S3KMS"
},
{
"ParameterKey": "GlueAssetsBucket",
"ParameterValue": "yourglueassetsbucket"
},
{
"ParameterKey": "SecurityAccount",
"ParameterValue": "1232424314314"
}
]
AWSTemplateFormatVersion: "2010-09-09"
Description: |
Creates Glue Job to parse VPC flow logs. This requires you to provide a KMS key ID, a role that glue will use that has access
to the bucket containing the flow logs. You also need to provide a GlueAssetsBucket, that contains the ETL scripts to do the conversion. See this:
https://github.com/awslabs/athena-glue-service-logs. You will also need to create a GlueSecurityConfiguration (that tells glue to use KMS encryption)
ahead of time. The SecurityAccountId is the account that should have access to the Catalog that this creates.
Parameters:
GlueAssetsBucket:
Type: String
RawDBName:
Type: String
RawTableName:
Type: String
ConvertedTableName:
Type: String
ConvertedDBName:
Type: String
SecurityAccount:
Type: String
FlowLogBucket:
Type: String
GlueJobRole:
Type: String
GlueSecurityConfiguration:
Type: String
Default: "S3KMS"
Resources:
MyJob:
Type: AWS::Glue::Job
Properties:
SecurityConfiguration: !Ref GlueSecurityConfiguration
Command:
Name: glueetl
ScriptLocation: !Sub "s3://${GlueAssetsBucket}/agsl/glue_scripts/sample_vpc_flow_job.py"
DefaultArguments:
"--job-bookmark-option": "job-bookmark-enable"
"--raw_database_name": !Sub "${RawDBName}"
"--raw_table_name": !Sub "${RawTableName}"
"--converted_database_name": !Sub "${ConvertedDBName}"
"--converted_table_name": !Sub "${ConvertedTableName}"
"--TempDir": !Sub "s3://${FlowLogBucket}/temp"
"--s3_converted_target": !Sub "s3://${FlowLogBucket}/converted/"
"--s3_source_location": !Sub "s3://${FlowLogBucket}/AWSLogs/${AWS::AccountId}/vpcflowlogs/"
"--extra-py-files": !Sub "s3://${GlueAssetsBucket}/agsl/glue_scripts/athena_glue_converter_latest.zip"
ExecutionProperty:
MaxConcurrentRuns: 1
MaxRetries: 0
Name: VpcFlowLogConverter
Role: !Ref GlueJobRole
MyJobTrigger:
Type: AWS::Glue::Trigger
Properties:
Name: "VPCFlowLogTrigger"
Description: "Runs the VPC flow log converter every 4 hours"
Type: SCHEDULED
StartOnCreation: True
Schedule: cron(0 */4 * * ? *)
Actions:
- JobName: !Ref MyJob
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment