Skip to content

Instantly share code, notes, and snippets.

@maxgr0
Created December 30, 2020 14:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maxgr0/b92714426038171b99afe59e9bdfa221 to your computer and use it in GitHub Desktop.
Save maxgr0/b92714426038171b99afe59e9bdfa221 to your computer and use it in GitHub Desktop.
resources:
Resources:
DataLakeBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: ${self:custom.dataLakeBucketName}
GlueDataLake:
Type: AWS::Glue::Database
Properties:
CatalogId: ${self:provider.environment.AWS_ACCOUNT_ID}
DatabaseInput:
Name: ${self:custom.dataLakeIdentifier}
GlueDataLakeInteractionsTable:
DependsOn: GlueDataLake
Type: AWS::Glue::Table
Properties:
CatalogId: ${self:provider.environment.AWS_ACCOUNT_ID}
DatabaseName: ${self:custom.dataLakeIdentifier}
TableInput:
Name: interactions
TableType: EXTERNAL_TABLE
Parameters:
classification: parquet
projection.enabled: true
projection.dt.format: yyyy-MM-dd-HH
projection.dt.interval: 1
projection.dt.interval.unit: HOURS
projection.dt.range: 2020-12-01-00,NOW
projection.dt.type: date
storage.location.template:
Fn::Join:
- ''
- - 's3://'
- ${self:custom.dataLakeBucketName}
- '/'
- 'interactions'
- '/dt=${dt}'
PartitionKeys:
- Name: dt
Type: string
StorageDescriptor:
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
SerdeInfo:
SerializationLibrary: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
Location:
Fn::Join:
- ''
- - 's3://'
- ${self:custom.dataLakeBucketName}
- '/'
- 'interactions'
- '/'
Columns:
- Name: id
Type: string
- Name: created_at
Type: timestamp
- Name: created_by
Type: string
- Name: entity
Type: string
- Name: type
Type: string
InteractionsDataDeliveryStream:
DependsOn:
- DataLakeKinesisFirehoseS3Role
- DataLakeBucket
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
DeliveryStreamType: DirectPut
DeliveryStreamName: ${self:custom.dataDeliveryStreamNameInteractions}
ExtendedS3DestinationConfiguration:
RoleARN:
Fn::GetAtt:
- DataLakeKinesisFirehoseS3Role
- Arn
BucketARN:
Fn::GetAtt:
- DataLakeBucket
- Arn
Prefix:
Fn::Join:
- ''
- - Ref: GlueDataLakeInteractionsTable
- '/dt=!{timestamp:yyyy}-!{timestamp:MM}-!{timestamp:dd}-!{timestamp:HH}/'
ErrorOutputPrefix:
Fn::Join:
- ''
- - Ref: GlueDataLakeInteractionsTable
- '/error/!{firehose:error-output-type}/dt=!{timestamp:yyyy}-!{timestamp:MM}-!{timestamp:dd}-!{timestamp:HH}/'
BufferingHints:
SizeInMBs: 128
IntervalInSeconds: 900
CloudWatchLoggingOptions:
Enabled: false
S3BackupMode: Disabled
DataFormatConversionConfiguration:
Enabled: True
SchemaConfiguration:
CatalogId: ${self:provider.environment.AWS_ACCOUNT_ID}
RoleARN:
Fn::GetAtt:
- DataLakeKinesisFirehoseS3Role
- Arn
DatabaseName:
Ref: GlueDataLake
TableName:
Ref: GlueDataLakeInteractionsTable
Region: ${self:provider.region}
VersionId: LATEST
InputFormatConfiguration:
Deserializer:
OpenXJsonSerDe: {}
OutputFormatConfiguration:
Serializer:
ParquetSerDe: {}
DataLakeKinesisFirehoseS3Role:
Type: AWS::IAM::Role
DependsOn: DataLakeBucket
Properties:
RoleName:
Fn::Join:
- '-'
- - ${self:custom.dataLakeIdentifier}
- 'kinesis-firehose-s3-role'
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: ''
Effect: Allow
Principal:
Service: firehose.amazonaws.com
Action: 'sts:AssumeRole'
Condition:
StringEquals:
'sts:ExternalId': ${self:provider.environment.AWS_ACCOUNT_ID}
Path: '/'
Policies:
- PolicyName:
Fn::Join:
- '-'
- - ${self:custom.dataLakeIdentifier}
- 'kinesis-firehose-s3-policy'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- 's3:AbortMultipartUpload'
- 's3:GetBucketLocation'
- 's3:GetObject'
- 's3:ListBucket'
- 's3:ListBucketMultipartUploads'
- 's3:PutObject'
Resource:
- Fn::Join:
- ''
- - 'arn:aws:s3:::'
- Ref: DataLakeBucket
- Fn::Join:
- ''
- - 'arn:aws:s3:::'
- Ref: DataLakeBucket
- '/*'
- Effect: Allow
Action: 'glue:GetTableVersions'
Resource: '*'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment