Skip to content

Instantly share code, notes, and snippets.

@hectcastro
Created June 22, 2017 02:58
Show Gist options
  • Save hectcastro/3031dcbfeb594e5685f21458b07c0bf2 to your computer and use it in GitHub Desktop.
Save hectcastro/3031dcbfeb594e5685f21458b07c0bf2 to your computer and use it in GitHub Desktop.
A DDL to convert S3 access logs into an AWS Athena table.
CREATE EXTERNAL TABLE IF NOT EXISTS azavea_datahub_logs.azavea_datahub (
bucket_owner string,
bucket string,
requested_at string,
remote_ip string,
requester string,
request_id string,
operation string,
key string,
request_uri_operation string,
request_uri_key string,
request_uri_http_version string,
http_status string,
error_code string,
bytes_sent string,
object_size string,
total_time string,
turn_around_time string,
referrer string,
user_agent string,
version_id string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") ([^ ]*)$' )
LOCATION 's3://azavea-datahub-logs/azavea-datahub/';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment