Skip to content

Instantly share code, notes, and snippets.

@tphummel
Last active September 11, 2017 17:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tphummel/c3ebfc1964eacfa5ca2f70b31ecbc507 to your computer and use it in GitHub Desktop.
Save tphummel/c3ebfc1964eacfa5ca2f70b31ecbc507 to your computer and use it in GitHub Desktop.
Athena, S3, and HiveQL for CloudTrail Analysis

Doing CloudTrail Analysis with Athena, S3, HiveQL

Problem

I discovered a security group which got opened too widely. I want to figure out when it happened and who did it.

Prerequisites

This article assumes you have CloudTrail enabled and there is a complete history of your account activity sitting in an S3 bucket.

Approach

AWS has a product called Athena that let's you run Hive queries against data in S3 without needing to set up your own Hadoop resources.

An AWS blog post lists all of the steps to do this type of analysis.

See attached sql files for the approach I took. Replace your bucket and account number in dml.sql and your security group id in security-group-ingress.sql

Next Steps

Consolidating CloudTrail logs from multiple AWS accounts

Validating the integrity of CloudTrail log files

Security Monkey and AWS Config are two tools which can help teams discover misconfigured resources more proactively.

CREATE EXTERNAL TABLE cloudtrail_logs (
eventversion STRING,
userIdentity STRUCT< type:STRING,
principalid:STRING,
arn:STRING,
accountid:STRING,
invokedby:STRING,
accesskeyid:STRING,
userName:STRING,
sessioncontext:STRUCT< attributes:STRUCT< mfaauthenticated:STRING,
creationdate:STRING>,
sessionIssuer:STRUCT< type:STRING,
principalId:STRING,
arn:STRING,
accountId:STRING,
userName:STRING>>>,
eventTime STRING,
eventSource STRING,
eventName STRING,
awsRegion STRING,
sourceIpAddress STRING,
userAgent STRING,
errorCode STRING,
errorMessage STRING,
requestParameters STRING,
responseElements STRING,
additionalEventData STRING,
requestId STRING,
eventId STRING,
resources ARRAY<STRUCT< ARN:STRING,
accountId: STRING,
type:STRING>>,
eventType STRING,
apiVersion STRING,
readOnly STRING,
recipientAccountId STRING,
serviceEventDetails STRING,
sharedEventID STRING,
vpcEndpointId STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://<bucket-name>/AWSLogs/<account-number>/'
SELECT eventname,
useridentity.username,
sourceIPAddress,
eventtime,
requestparameters
FROM cloudtrail_logs
WHERE (requestparameters LIKE '{"groupId":"sg-1234abcd"%')
AND (eventname LIKE '%SecurityGroupIngress%')
AND eventtime > '2016-01-01T00:00:00Z'
ORDER BY eventtime asc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment