Skip to content

Instantly share code, notes, and snippets.

@chrisj-au
Created December 3, 2020 01:04
Show Gist options
  • Save chrisj-au/c34a44d3597cd64b7ccbb6acfbaccf2e to your computer and use it in GitHub Desktop.
Save chrisj-au/c34a44d3597cd64b7ccbb6acfbaccf2e to your computer and use it in GitHub Desktop.
CloudWatch Metric, Alarm and Event for SNS and Kinesis Firehose
## Example Lambda error tracking using CloudWatch (SNS and Kinesis Firehose targets)
## Here SNS is for email notification while Firehose is for relaying to Splunk
## This template is cut and paste from a working template but is untested, it is likely to contain errors.
## Assumes secret manager for HEC token & parameter store for splunk url
Parameters:
LambdaName:
Type: String
Description: Lambda Name
CreateAlarmSNS:
Type: String
Description: Create SNS topic for alarm?
AllowedValues:
- True
- False
SNSEmail:
Type: String
Description: Email address to subscribe to SNS topic
hecSsmPath:
Type: String
Description: SSM path and version for hec URL. e.g. /MyApp/Splunk/HECURL:6
hecToken:
Type: String
Description: Secrets manager name to the secret, must be called token within the secret
Conditions:
EmailOnLambdaError: # Set to true if SNSEmail is not equal to ""
!Not [!Equals [!Ref SNSEmail, ""]]
Resources:
LambdaFunc:
Type: AWS::Lambda::Function
Parameters:
FunctionName: !Ref LambdaName
Runtime: Python3.8
# more....
LambdaLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub "/aws/lambda/${LambdaFunc}"
RetentionInDays: 14
FailedAuthMetricFilter:
Type: 'AWS::Logs::MetricFilter'
Properties:
LogGroupName: !Ref LambdaLogGroup
FilterPattern: >-
"[ERROR]"
MetricTransformations:
- MetricValue: '1'
DefaultValue: '0'
MetricNamespace: Lambda-Error-Metrics
MetricName: !Sub ${LambdaName}-Error-Metrics
LambdaErrorAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmName: !Sub ${LambdaName}-Error-Alarm
AlarmDescription: >-
A CloudWatch Alarm that triggers if there are errors returned from the Lambda based off metrics
MetricName: Lambda-Error-Metrics
Namespace: !Sub ${LambdaName}-Error-Metrics
Statistic: Sum
Period: '300'
EvaluationPeriods: '1'
Threshold: '1'
ComparisonOperator: GreaterThanOrEqualToThreshold
AlarmActions:
- "Fn::If": [NotifyOnLambdaError, !Ref NotifySNS, !Ref "AWS::NoValue"]
TreatMissingData: notBreaching
NotifySNS:
Type: AWS::SNS::Topic
Condition: EmailOnLambdaError
Properties:
DisplayName: !Sub ${AppName}-Lambda-Error
Subscription:
- Endpoint: !Ref SNSEmail
Protocol: Email
TopicName: !Sub ${AppName}Error
IAMCWEvent:
Type: AWS::IAM::Role
Properties:
RoleName: CWEventRead
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: ''
Effect: Allow
Principal:
Service: events.amazonaws.com
Action: 'sts:AssumeRole'
Description: IAM Role for CW Events to write to firehose
Policies:
- PolicyName: allow-access-to-cw-logs
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- firehose:PutRecord
Resource: !GetAtt SplunkFireHose.Arn
LambdaFailureCWEvent:
Type: AWS::Events::Rule
Properties:
Description: CW Alert action trigger
EventPattern: !Sub |
{
"source": [
"aws.cloudwatch"
],
"detail-type": [
"CloudWatch Alarm State Change"
],
"detail": {
"state": {
"value": [
"ALARM"
]
},
"alarmName": [
"${LambdaErrorAlarm}"
]
}
}
State: ENABLED
Targets:
- Arn: !GetAtt SplunkFireHose.Arn
Id: !Ref SplunkFireHose
RoleArn: !GetAtt IAMFireHose.Arn
IAMFireHose:
Type: AWS::IAM::Role
Properties:
RoleName: SplunkFirehose
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: ''
Effect: Allow
Principal:
Service: firehose.amazonaws.com
Action: 'sts:AssumeRole'
Description: IAM Role for firehose to write to S3 and CW
Policies:
- PolicyName: allow-access-to-cw-logs
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:AbortMultipartUpload
- s3:GetBucketLocation
- s3:GetObject
- s3:ListBucket
- s3:ListBucketMultipartUploads
- s3:PutObject
Resource:
- !Sub arn:aws:s3:::${FireHoseBucket}
- !Sub arn:aws:s3:::${FireHoseBucket}/*
- Effect: Allow
Action:
- logs:PutLogEvents
Resource:
- !GetAtt FireHoseLogs.Arn
FireHoseLogs:
Type: AWS::Logs::LogGroup
Properties:
RetentionInDays: 30
LogGroupName: /aws/kinesisfirehose/SendCWToSplunk
FireHostLogStream:
Type: AWS::Logs::LogStream
Properties:
LogGroupName: !Ref FireHoseLogs
LogStreamName: FireHoseDelivery
SplunkFireHose:
Type: AWS::KinesisFirehose::DeliveryStream
DependsOn:
- FireHoseBucket
- IAMFireHose
Properties:
DeliveryStreamName: LambdaErrorsToSplunk
DeliveryStreamType: DirectPut
SplunkDestinationConfiguration:
HECEndpoint: !Sub 'https://{{resolve:ssm:${hecSsmPath}}}/services/collector/raw?index=devops&sourcetype=testing'
HECEndpointType: Raw
HECToken: !Sub '{{resolve:secretsmanager:${hecToken}:SecretString:token}}'
S3Configuration:
BucketARN: { Fn::GetAtt: [ FireHoseBucket, Arn ] }
RoleARN: { Fn::GetAtt: [ IAMFireHose, Arn ] }
CloudWatchLoggingOptions:
Enabled: true
LogGroupName: !Ref FireHoseLogs
LogStreamName: FireHoseS3logs
CloudWatchLoggingOptions:
Enabled: true
LogGroupName: !Ref FireHoseLogs
LogStreamName: FireHoseDelivery
FireHoseBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: firehose-to-splunk-err
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment