Skip to content

Instantly share code, notes, and snippets.

@anton-yurchenko
Last active April 16, 2022 08:17
Show Gist options
  • Save anton-yurchenko/24b4d81cb7314b26c9d57898b431cdf9 to your computer and use it in GitHub Desktop.
Save anton-yurchenko/24b4d81cb7314b26c9d57898b431cdf9 to your computer and use it in GitHub Desktop.
Query AWS Organization CloudTrail Logs Using Athena

Notice: if you want to query logs for multiple days (week or so), you might need to transform them to pargquet first

Table Creation DDL:

CREATE EXTERNAL TABLE cloudtrail_logs(
    eventVersion STRING,
    userIdentity STRUCT<
        type: STRING,
        principalId: STRING,
        arn: STRING,
        accountId: STRING,
        invokedBy: STRING,
        accessKeyId: STRING,
        userName: STRING,
        sessionContext: STRUCT<
            attributes: STRUCT<
                mfaAuthenticated: STRING,
                creationDate: STRING>,
            sessionIssuer: STRUCT<
                type: STRING,
                principalId: STRING,
                arn: STRING,
                accountId: STRING,
                userName: STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestParameters STRING,
    responseElements STRING,
    additionalEventData STRING,
    requestId STRING,
    eventId STRING,
    readOnly STRING,
    resources ARRAY<STRUCT<
        arn: STRING,
        accountId: STRING,
        type: STRING>>,
    eventType STRING,
    apiVersion STRING,
    recipientAccountId STRING,
    serviceEventDetails STRING,
    sharedEventID STRING,
    vpcEndpointId STRING
  )
PARTITIONED BY (
   `accountid` string,
   `region` string,
   `timestamp` string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://<bucket-name>/AWSLogs/<org-id>/'
TBLPROPERTIES (
  'projection.enabled'='true', 
  'projection.region.type' = 'enum',
  'projection.region.values' = 'ap-northeast-1,ap-northeast-2,ap-northeast-3,ap-south-1,ap-southeast-1,ap-southeast-2,ca-central-1,eu-central-1,eu-north-1,eu-west-1,eu-west-2,eu-west-3,sa-east-1,us-east-1,us-east-2,us-west-1,us-west-2',
  'projection.accountid.type' = 'integer',
  'projection.accountid.range' = '0,999999999999',
  'projection.timestamp.format'='yyyy/MM/dd', 
  'projection.timestamp.interval'='1', 
  'projection.timestamp.interval.unit'='DAYS', 
  'projection.timestamp.range'='2021/08/01,NOW', 
  'projection.timestamp.type'='date', 
  'storage.location.template'='s3://<bucket-name>/AWSLogs/<org-id>/${accountid}/CloudTrail/${region}/${timestamp}')

Query Example:

SELECT * FROM "default"."cloudtrail_logs" 
WHERE timestamp = '2021/08/02'
AND accountid = '<account-id>'
AND region = '<region>'
LIMIT 50;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment