Skip to content

Instantly share code, notes, and snippets.

@urjitbhatia
Created September 12, 2019 00:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save urjitbhatia/f572300348ad777444cbf5c7b99fa605 to your computer and use it in GitHub Desktop.
Save urjitbhatia/f572300348ad777444cbf5c7b99fa605 to your computer and use it in GitHub Desktop.
Parse CloudFront logs via awk
#!/bin/bash
HTTP_STATUS_CODE=421
CF_DISTRIBUTION_NAME=foobar
s3cmd "s3://${S3_BUCKET}/${LOGS_PREFIX}"
## OR s3cmd "s3://${S3_BUCKET}/${LOGS_PREFIX}/${CF_DISTRIBUTION_NAME}" to download only specific distributions
zcat *.gz | awk 'match($9, /421/) && match($7, /foobar/) { print $7" "$9" "$16" "$10 }'
### https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html#LogFileFormat
# Field Number Field Name Description
# 1 date The date on which the event occurred in the format yyyy-mm-dd, for example, 2015-06-30. The date and time are in Coordinated Universal Time (UTC). For WebSocket connections, this is the date when the connection is closed.
# 2 time The time when the CloudFront server finished responding to the request (in UTC), for example, 01:42:39. For WebSocket connections, this is the time when the connection is closed.
# 3 x-edge-location The edge location that served the request. Each edge location is identified by a three-letter code and an arbitrarily assigned number, for example, DFW3. The three-letter code typically corresponds with the International Air Transport Association airport code for an airport near the edge location. (These abbreviations might change in the future.) For a list of edge locations, see the Amazon CloudFront detail page, http://aws.amazon.com/cloudfront.
# 4 sc-bytes The total number of bytes that CloudFront served to the viewer in response to the request, including headers, for example, 1045619. For WebSocket connections, this is the total number of bytes sent from the server to the client through the connection.
# 5 c-ip
# The IP address of the viewer that made the request, for example, 192.0.2.183 or 2001:0db8:85a3:0000:0000:8a2e:0370:7334. If the viewer used an HTTP proxy or a load balancer to send the request, the value of c-ip is the IP address of the proxy or load balancer. See also X-Forwarded-For in field 20.
# 6 cs-method The HTTP access method: DELETE, GET, HEAD, OPTIONS, PATCH, POST, or PUT.
# 7 cs(Host) The domain name of the CloudFront distribution, for example, d111111abcdef8.cloudfront.net.
# 8 cs-uri-stem The portion of the URI that identifies the path and object, for example, /images/daily-ad.jpg.
# 9 sc-status
# One of the following values:
# An HTTP status code (for example, 200). For a list of HTTP status codes, see RFC 2616, Hypertext Transfer Protocol—HTTP 1.1, section 10, Status Code Definitions. For more information, see How CloudFront Processes and Caches HTTP 4xx and 5xx Status Codes from Your Origin.
# 000, which indicates that the viewer closed the connection (for example, closed the browser tab) before CloudFront could respond to a request.
# If the viewer closes the connection after CloudFront starts to send the object, the log contains the applicable HTTP status code.
# 10 cs(Referer) The name of the domain that originated the request. Common referrers include search engines, other websites that link directly to your objects, and your own website.
# 11 cs(User-Agent) The value of the User-Agent header in the request. The User-Agent header identifies the source of the request, such as the type of device and browser that submitted the request and, if the request came from a search engine, which search engine. For more information, see User-Agent Header.
# 12 cs-uri-query
# The query string portion of the URI, if any. When a URI doesn't contain a query string, the value of cs-uri-query is a hyphen (-).
# For more information, see Caching Content Based on Query String Parameters.
# 13 cs(Cookie)
# The cookie header in the request, including name-value pairs and the associated attributes. If you enable cookie logging, CloudFront logs the cookies in all requests regardless of which cookies you choose to forward to the origin: none, all, or a whitelist of cookie names. When a request doesn't include a cookie header, the value of cs(Cookie) is a hyphen (-).
# For more information about cookies, see Caching Content Based on Cookies.
# 14 x-edge-result-type
# How CloudFront classifies the response after the last byte left the edge location. In some cases, the result type can change between the time that CloudFront is ready to send the response and the time that CloudFront has finished sending the response.
# For example, in HTTP streaming, suppose CloudFront finds a segment in the edge cache. The value of x-edge-response-result-type, the result type immediately before CloudFront begins to respond to the request, is Hit. However, if the user closes the viewer before CloudFront has delivered the entire segment, the final result type—the value of x-edge-result-type—changes to Error.
# As another example, WebSocket connections will appear as a Miss since the content is not cacheable and is proxied directly back to the origin server.
# Possible values include:
# Hit – CloudFront served the object to the viewer from the edge cache.
# For information about a situation in which CloudFront classifies the result type as Hit even though the response from the origin contains a Cache-Control: no-cache header, see Simultaneous Requests for the Same Object (Traffic Spikes).
# RefreshHit – CloudFront found the object in the edge cache but it had expired, so CloudFront contacted the origin to determine whether the cache has the latest version of the object and, if not, to get the latest version.
# Miss – The request could not be satisfied by an object in the edge cache, so CloudFront forwarded the request to the origin server and returned the result to the viewer.
# LimitExceeded – The request was denied because a CloudFront limit was exceeded.
# CapacityExceeded – CloudFront returned an HTTP 503 status code (Service Unavailable) because the CloudFront edge server was temporarily unable to respond to requests.
# Error – Typically, this means the request resulted in a client error (sc-status is 4xx) or a server error (sc-status is 5xx). If sc-status is 200, the HTTP request was successful but the client disconnected before they downloaded all of the bytes.
# Redirect – CloudFront redirects from HTTP to HTTPS.
# If sc-status is 403 and you configured CloudFront to restrict the geographic distribution of your content, the request might have come from a restricted location. For more information about geo restriction, see Restricting the Geographic Distribution of Your Content.
# If the value of x-edge-result-type is Error and the value of x-edge-response-result-type is not Error, the client disconnected before finishing the download.
# 15 x-edge-request-id An encrypted string that uniquely identifies a request. In the response header, this is x-amz-cf-id.
# 16 x-host-header
# The value that the viewer included in the Host header for this request. This is the domain name in the request:
# If you're using the CloudFront domain name in your object URLs, such as http://d111111abcdef8.cloudfront.net/logo.png, the x-host-header field contains that domain name.
# If you're using alternate domain names in your object URLs, such as http://example.com/logo.png, the x-host-header field contains the alternate domain name, such as example.com. To use alternate domain names, you must add them to your distribution. For more information, see Using Custom URLs for Files by Adding Alternate Domain Names (CNAMEs).
# If you're using alternate domain names, see cs(Host) in field 7 for the domain name that is associated with your distribution.
# 17 cs-protocol The protocol that the viewer specified in the request: http, https, ws, or wss.
# 18 cs-bytes The number of bytes of data that the viewer included in the request (client to server bytes), including headers. For WebSocket connections, this is the total number of bytes sent from the client to the server on the connection.
# 19 time-taken The number of seconds (to the thousandth of a second, for example, 0.002) between the time that a CloudFront edge server receives a viewer's request and the time that CloudFront writes the last byte of the response to the edge server's output queue as measured on the server. From the perspective of the viewer, the total time to get the full object will be longer than this value due to network latency and TCP buffering.
# 20 x-forwarded-for
# If the viewer used an HTTP proxy or a load balancer to send the request, the value of c-ip in field 5 is the IP address of the proxy or load balancer. In that case, x-forwarded-for is the IP address of the viewer that originated the request.
# If the viewer did not use an HTTP proxy or a load balancer, the value of x-forwarded-for is a hyphen (-).
# Note
# The X-Forwarded-For header contains IPv4 addresses (such as 192.0.2.44) and IPv6 addresses (such as 2001:0db8:85a3:0000:0000:8a2e:0370:7334), as applicable.
# 21 ssl-protocol
# When cs-protocol in field 17 is https, the SSL protocol that the client and CloudFront negotiated for transmitting the request and response. When cs-protocol is http, the value for ssl-protocol is a hyphen (-).
# Possible values include the following:
# SSLv3
# TLSv1
# TLSv1.1
# TLSv1.2
# 22 ssl-cipher
# When cs-protocol in field 17 is https, the SSL cipher that the client and CloudFront negotiated for encrypting the request and response. When cs-protocol is http, the value for ssl-cipher is a hyphen (-).
# Possible values include the following:
# ECDHE-RSA-AES128-GCM-SHA256
# ECDHE-RSA-AES128-SHA256
# ECDHE-RSA-AES128-SHA
# ECDHE-RSA-AES256-GCM-SHA384
# ECDHE-RSA-AES256-SHA384
# ECDHE-RSA-AES256-SHA
# AES128-GCM-SHA256
# AES256-GCM-SHA384
# AES128-SHA256
# AES256-SHA
# AES128-SHA
# DES-CBC3-SHA
# RC4-MD5
# 23 x-edge-response-result-type
# How CloudFront classified the response just before returning the response to the viewer. See also x-edge-result-type in field 14.
# Possible values include:
# Hit – CloudFront served the object to the viewer from the edge cache.
# RefreshHit – CloudFront found the object in the edge cache but it had expired, so CloudFront contacted the origin to verify that the cache has the latest version of the object.
# Miss – The request could not be satisfied by an object in the edge cache, so CloudFront forwarded the request to the origin server and returned the result to the viewer.
# LimitExceeded – The request was denied because a CloudFront limit was exceeded.
# CapacityExceeded – CloudFront returned a 503 error because the edge location didn't have enough capacity at the time of the request to serve the object.
# Error – Typically, this means the request resulted in a client error (sc-status is 4xx) or a server error (sc-status is 5xx).
# Redirect – CloudFront redirects from HTTP to HTTPS.
# If sc-status is 403 and you configured CloudFront to restrict the geographic distribution of your content, the request might have come from a restricted location. For more information about geo restriction, see Restricting the Geographic Distribution of Your Content.
# If the value of x-edge-result-type is Error and the value of x-edge-response-result-type is not Error, the client disconnected before finishing the download.
# 24 cs-protocol-version The HTTP version that the viewer specified in the request. Possible values include HTTP/0.9, HTTP/1.0, HTTP/1.1, and HTTP/2.0.
# 25 fle-status
# When field-level encryption is configured for a distribution, a code that indicates whether the request body was successfully processed. If field-level encryption is not configured for the distribution, the value of fle-status is a hyphen (-).
# When CloudFront successfully processes the request body, encrypts values in the specified fields, and forwards the request to the origin, the value of the fle-status column is Processed. The value of x-edge-result-type, column 14, can still indicate a client-side or server-side error.
# If the request exceeds a field-level encryption limit, fle-status contains one of the following error codes, and CloudFront returns HTTP status code 400 to the viewer. For a list of the current limits on field-level encryption, see Limits on Field-Level Encryption.
# FieldLengthLimitClientError – A field that is configured to be encrypted exceeded the length limit
# FieldNumberLimitClientError – A request that CloudFront is configured to encrypt contains more than the number of fields allowed
# RequestLengthLimitClientError – The length of the request body exceeded the limit when field-level encryption is configured
# Other possible values for fle-status include the following:
# ForwardedByContentType – CloudFront forwarded the request to the origin without parsing or encryption because no content type was configured.
# ForwardedByQueryArgs – CloudFront forwarded the request to the origin without parsing or encryption because the request contains a query argument that wasn't in the configuration for field-level encryption.
# ForwardedDueToNoProfile – CloudFront forwarded the request to the origin without parsing or encryption because no profile was specified in the configuration for field-level encryption.
# MalformedContentTypeClientError – CloudFront rejected the request and returned an HTTP 400 status code to the viewer because the value of the Content-Type header was in an invalid format.
# MalformedInputClientError – CloudFront rejected the request and returned an HTTP 400 status code to the viewer because the request body was in an invalid format.
# MalformedQueryArgsClientError – CloudFront rejected the request and returned an HTTP 400 status code to the viewer because a query argument was empty or in an invalid format.
# RejectedByContentType – CloudFront rejected the request and returned an HTTP 400 status code to the viewer because no content type was specified in the configuration for field-level encryption.
# RejectedByQueryArgs – CloudFront rejected the request and returned an HTTP 400 status code to the viewer because no query argument was specified in the configuration for field-level encryption.
# ServerError – The server returned an error.
# 26 fle-encrypted-fields
# The number of fields that CloudFront encrypted and forwarded to the origin. CloudFront streams the processed request to the origin as it encrypts data, so fle-encrypted-fields can have a value even if the value of fle-status is an error. If field-level encryption is not configured for the distribution, the value of fle-encrypted-fields is a hyphen (-).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment