Skip to content

Instantly share code, notes, and snippets.

@farski
Last active November 18, 2021 17:52
Show Gist options
  • Save farski/2035d193d8e30ab70cb2212cff162b66 to your computer and use it in GitHub Desktop.
Save farski/2035d193d8e30ab70cb2212cff162b66 to your computer and use it in GitHub Desktop.

More snippets

Standard Access Logs

The included SQL and CloudFormation YAML snippets create AWS Glue tables for standard access logs from Amazon CloudFront, which can be queried using Amazon Athena. The snippets are intended to be equivalent.

The tables that are created are not partitioned, as the native filenames of CloudFront access logs are not compatible with Athena partition projections. The speed and cost of querying these tables will depend on how much data is in the S3 bucket location it's anlyzing. If the bucket is configured to expire data after several days, using an unpartitioned table may be acceptable. If your bucket contains too much data, partitions can be created using methods other than partition projections.

CREATE EXTERNAL TABLE IF NOT EXISTS __DATABASE_NAME__.__TABLE_NAME__ (
`date` date COMMENT "UTC date as YYYY-MM-DD",
`time` string COMMENT "UTC time when the response finished as HH:MM:SS",
location string COMMENT "Code for the edge location that served the request as XYZ1",
bytes bigint COMMENT "Total number of bytes that the server sent to the viewer",
request_ip string COMMENT "The IP address of the viewer that made the request",
method string COMMENT "The HTTP method of the request",
host string COMMENT "Domain name of the CloudFront distribution (e.g., d111111abcdef8.cloudfront.net)",
uri string COMMENT "The path portion of the request URL, without query parameters",
status int COMMENT "The HTTP status code of the response, or 000 is the viewer closed the connection",
referrer string COMMENT "The value of the Referer header in the request",
user_agent string COMMENT "The value of the User-Agent header in the request",
query_string string COMMENT "The query string portion of the request path, or a hyphen",
cookie string COMMENT "The value of the Cookie header in the request",
result_type string COMMENT "A classification of the response, such as Hit, Miss, and Error",
request_id string COMMENT "An opaque string that uniquely identifies a request",
host_header string COMMENT "The value of the Host header in the request",
request_protocol string COMMENT "The protocol of the viewer request, http, https, ws, wss ",
request_bytes bigint COMMENT "The total number of bytes in the request",
time_taken float COMMENT "Number of seconds the request took (e.g., 0.123)",
xforwarded_for string COMMENT "The value of the X-Forwarded-For header in the request, or a hyphen",
ssl_protocol string COMMENT "The SSL/TLS protocol that the viewer and server negotiated, or a hyphen",
ssl_cipher string COMMENT "The SSL/TLS cipher that the viewer and server negotiated, or a hyphen",
response_result_type string COMMENT "A classification of the response, such as Hit, Miss, and Error",
http_version string COMMENT "The HTTP version of the request",
fle_status string COMMENT "A code that indicates whether the request body was successfully processed, or a hyphen",
fle_encrypted_fields int COMMENT "The number of encrypted fields forwarded to the origin, or a hyphen",
c_port int COMMENT "The port number of the request",
time_to_first_byte float COMMENT "The number of seconds between receiving the request and writing the first byte of the response",
x_edge_detailed_result_type string COMMENT "When the result type is an error, this is the type of error, othewise it's the result type",
sc_content_type string COMMENT "The value of the Content-Type header in the response",
sc_content_len bigint COMMENT "The value of the Content-Length header in the response",
sc_range_start bigint COMMENT "The start value of the Content-Range header in the response, or a hyphen",
sc_range_end bigint COMMENT "The end value of the Content-Range header in the response, or a hyphen"
)
COMMENT "TKTKTK A description of the table"
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
WITH SERDEPROPERTIES (
"field.delim" = "\t",
"serialization.format" = "\t"
)
STORED AS
INPUTFORMAT "org.apache.hadoop.mapred.TextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION "s3://__BUCKET_AND_PREFIX__/"
TBLPROPERTIES (
"skip.header.line.count" = "2"
);
AccessLogsGlueTable:
Type: AWS::Glue::Table
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseName: __DATABASE_NAME__
TableInput:
Description: TKTKTK A description of the table
Name: __TABLE_NAME__
Parameters:
skip.header.line.count: "2"
StorageDescriptor:
Columns:
- Name: date
Type: date
Comment: UTC date as YYYY-MM-DD
- Name: time
Type: string
Comment: UTC time when the response finished as HH:MM:SS
- Name: location
Type: string
Comment: Code for the edge location that served the request as XYZ1
- Name: bytes
Type: bigint
Comment: Total number of bytes that the server sent to the viewer
- Name: request_ip
Type: string
Comment: The IP address of the viewer that made the request
- Name: method
Type: string
Comment: The HTTP method of the request
- Name: host
Type: string
Comment: Domain name of the CloudFront distribution (e.g., d111111abcdef8.cloudfront.net)
- Name: uri
Type: string
Comment: The path portion of the request URL, without query parameters
- Name: status
Type: int
Comment: The HTTP status code of the response, or 000 is the viewer closed the connection
- Name: referrer
Type: string
Comment: The value of the Referer header in the request
- Name: user_agent
Type: string
Comment: The value of the User-Agent header in the request
- Name: query_string
Type: string
Comment: The query string portion of the request path, or a hyphen
- Name: cookie
Type: string
Comment: The value of the Cookie header in the request
- Name: result_type
Type: string
Comment: A classification of the response, such as Hit, Miss, and Error
- Name: request_id
Type: string
Comment: An opaque string that uniquely identifies a request
- Name: host_header
Type: string
Comment: The value of the Host header in the request
- Name: request_protocol
Type: string
Comment: The protocol of the viewer request, http, https, ws, wss
- Name: request_bytes
Type: bigint
Comment: The total number of bytes in the request
- Name: time_taken
Type: float
Comment: Number of seconds the request took (e.g., 0.123)
- Name: xforwarded_for
Type: string
Comment: The value of the X-Forwarded-For header in the request, or a hyphen
- Name: ssl_protocol
Type: string
Comment: The SSL/TLS protocol that the viewer and server negotiated, or a hyphen
- Name: ssl_cipher
Type: string
Comment: The SSL/TLS cipher that the viewer and server negotiated, or a hyphen
- Name: response_result_type
Type: string
Comment: A classification of the response, such as Hit, Miss, and Error
- Name: http_version
Type: string
Comment: The HTTP version of the request
- Name: fle_status
Type: string
Commen: A code that indicates whether the request body was successfully processed, or a hyphen
- Name: fle_encrypted_fields
Type: int
Comment: The number of encrypted fields forwarded to the origin, or a hyphen
- Name: c_port
Type: int
Comment: The port number of the request
- Name: time_to_first_byte
Type: float
Comment: The number of seconds between receiving the request and writing the first byte of the response
- Name: x_edge_detailed_result_type
Type: string
Comment: When the result type is an error, this is the type of error, othewise it's the result type
- Name: sc_content_type
Type: string
Comment: The value of the Content-Type header in the response
- Name: sc_content_len
Type: bigint
Comment: The value of the Content-Length header in the response
- Name: sc_range_start
Type: bigint
Comment: The start value of the Content-Range header in the response, or a hyphen
- Name: sc_range_end
Type: bigint
Comment: The end value of the Content-Range header in the response, or a hyphen
InputFormat: org.apache.hadoop.mapred.TextInputFormat
Location: !Sub s3://__BUCKET_AND_PREFIX__/
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
SerdeInfo:
Parameters:
field.delim: "\t"
serialization.format: "\t"
SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TableType: EXTERNAL_TABLE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment