Skip to content

Instantly share code, notes, and snippets.

@szinck
Created August 13, 2017 02:21
Show Gist options
  • Save szinck/d456fbf691483ab77d2453c316db3371 to your computer and use it in GitHub Desktop.
Save szinck/d456fbf691483ab77d2453c316db3371 to your computer and use it in GitHub Desktop.
Athena Table for ALB Logs
-- This creates an athena table that can parse ALB logs.
-- Advantage of this over others are this works when the log ends with a trailing space
-- plus it also breaks the http request into route and params for easier grouping
CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
type string,
timestamp string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code string,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
protocol string,
host string,
port int,
route string,
params string,
httpversion string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string )
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1','input.regex' =
'(.*?) (.*?) (.*?) ([0-9.]+):([0-9]*) ([0-9.]+):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) ([-0-9]*) "(.*?) (.*?)://(.*?):([0-9]+)([^? ]*)(\\x3f?.*?) (.*?)" "(.*?)" (.*?) (.*?) (.*?) "(.*?)" *$'
)
LOCATION 's3://bucket/api-alb/external/AWSLogs/accountid/elasticloadbalancing/us-west-2/'
@usefksa
Copy link

usefksa commented Aug 13, 2017

Thanks a lot

@mpilar
Copy link

mpilar commented Oct 6, 2017

They were changed again (adding two fields for https) we ended up adding a 'logline_tail' column and just making it (.*) at the end.

@foragerr
Copy link

foragerr commented Dec 1, 2017

@mpilar Would you mind sharing your version of the query? Thanks in advance!

@dadvir
Copy link

dadvir commented Dec 22, 2018

doesn't work... i'm getting zero records... can someone advise on the possible problem? running the aws tutorial worked

@arpit728
Copy link

arpit728 commented Mar 4, 2020

Is there a way to import time as timestamp instead of string, this make queries slow as each record requires parsing.

@anirudhdggl
Copy link

doesn't work... i'm getting zero records... can someone advise on the possible problem? running the aws tutorial worked

I was facing the same issue. It's because the location does not hold any logs. Check the location

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment