Skip to content

Instantly share code, notes, and snippets.

@antongrbin
Created November 19, 2020 10:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save antongrbin/ffd477844e27fef94fd44db391208fe6 to your computer and use it in GitHub Desktop.
Save antongrbin/ffd477844e27fef94fd44db391208fe6 to your computer and use it in GitHub Desktop.
AWSTemplateFormatVersion: 2010-09-09
Description: Noom DMS base stack. This stack exports shared resources that team's DMS stacks will use.
Parameters:
DatalakeS3BucketName:
Type: String
Description: Output S3 datalake bucket name.
NetworkStackName:
Type: String
Default: noom-fargate-networking
Description: DO NOT CHANGE. The name of the parent Fargate networking stack that you created.
Necessary to locate and reference resources created by that stack.
Resources:
DatalakeAccessRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- dms.amazonaws.com
Action:
- sts:AssumeRole
Policies:
- PolicyName: DMSDatalakeWrite
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:ListBucket
Resource:
- !Join
- ""
- - "arn:aws:s3:::"
- !Ref DatalakeS3BucketName
- Effect: Allow
Action:
- s3:PutObject
Resource:
- !Join
- ""
- - "arn:aws:s3:::"
- !Ref DatalakeS3BucketName
- "/data/*"
S3DatalakeTargetEndpointRealtime:
Type: AWS::DMS::Endpoint
Properties:
EndpointType: target
EngineName: s3
ExtraConnectionAttributes: !Join
- ";"
- - "dataFormat=parquet"
- "timestampColumnName=dms_timestamp"
- "parquetTimestampInMillisecond=true"
- "parquetVersion=PARQUET_1_0"
- "cdcMaxBatchInterval=60"
- "cdcMinFileSize=32000"
S3Settings:
BucketFolder: ""
BucketName: !Ref DatalakeS3BucketName
CompressionType: GZIP
ServiceAccessRoleArn:
Fn::GetAtt:
- DatalakeAccessRole
- Arn
S3DatalakeTargetEndpoint:
Type: AWS::DMS::Endpoint
Properties:
EndpointType: target
EngineName: s3
ExtraConnectionAttributes: !Join
- ";"
- - "dataFormat=parquet"
- "timestampColumnName=dms_timestamp"
- "parquetTimestampInMillisecond=true"
- "parquetVersion=PARQUET_1_0"
- "cdcMaxBatchInterval=600"
- "cdcMinFileSize=1048576"
S3Settings:
BucketFolder: ""
BucketName: !Ref DatalakeS3BucketName
CompressionType: GZIP
ServiceAccessRoleArn:
Fn::GetAtt:
- DatalakeAccessRole
- Arn
DMSSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: DMS security group
VpcId:
Fn::ImportValue:
!Sub "${NetworkStackName}:VPCId"
DMSClusterSubnetGroup:
Type: AWS::DMS::ReplicationSubnetGroup
Properties:
ReplicationSubnetGroupDescription: DMS subnet group
SubnetIds:
- Fn::ImportValue:
!Sub "${NetworkStackName}:PrivateSubnetOne"
- Fn::ImportValue:
!Sub "${NetworkStackName}:PrivateSubnetTwo"
Outputs:
DMSClusterSubnetGroupName:
Value: !Ref DMSClusterSubnetGroup
Export:
Name: !Sub '${AWS::StackName}:DMSClusterSubnetGroupName'
DMSSecurityGroupName:
Value: !Ref DMSSecurityGroup
Export:
Name: !Sub '${AWS::StackName}:DMSSecurityGroupName'
S3DatalakeTargetEndpointRealtimeArn:
Description: S3 Datalake Target endpoint that DMS Tasks should use to write output.
Realtime endpoint flushes a new file every 60 seconds or earlier if the buffer exceeds 32mb.
Value: !Ref S3DatalakeTargetEndpointRealtime
Export:
Name: !Sub '${AWS::StackName}:S3DatalakeTargetEndpointRealtime'
S3DatalakeTargetEndpointArn:
Description: S3 Datalake Target endpoint that DMS Tasks should use to write output.
Default endpoint flushes a new file every 10 minutes or earlier if the buffer exceeds 1gb.
Value: !Ref S3DatalakeTargetEndpoint
Export:
Name: !Sub '${AWS::StackName}:S3DatalakeTargetEndpoint'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment