Skip to content

Instantly share code, notes, and snippets.

@csuhta

csuhta/gzip.md Secret

Last active September 28, 2023 07:02
Show Gist options
  • Save csuhta/0001d1bb74200412bc1d7f9e11ec4ea5 to your computer and use it in GitHub Desktop.
Save csuhta/0001d1bb74200412bc1d7f9e11ec4ea5 to your computer and use it in GitHub Desktop.
R2 double-gzip issue

⚠️ UPDATE: This issue was fixed, and these bucket links are not live.

I generated two files locally:

  • example-plain.json is a plaintext JSON file
  • example-gzip.json is a JSON file that I gzipped locally

Uploaded using AWS SDK for Ruby:

@r2_client.put_object(
  bucket: "r2-gzip-issue",
  key: "example-plain.json",
  body: File.open("example-plain.json"),
  content_type: "application/json",
)
#=> #<struct Aws::S3::Types::PutObjectOutput expiration=nil, etag="\"90cef12c848ede5f1c6b17c422160a4a\"", checksum_crc32=nil, checksum_crc32c=nil, checksum_sha1=nil, checksum_sha256=nil, server_side_encryption=nil, version_id="ee69c423aee74d109b940336a9f55d5c", sse_customer_algorithm=nil, sse_customer_key_md5=nil, ssekms_key_id=nil, ssekms_encryption_context=nil, bucket_key_enabled=nil, request_charged=nil>

@r2_client.put_object(
  bucket: "r2-gzip-issue",
  key: "example-gzip.json",
  body: File.open("example-gzip.json"),
  content_type: "application/json",
  content_encoding: "gzip", # <- Critical part
)
#=> #<struct Aws::S3::Types::PutObjectOutput expiration=nil, etag="\"ab1875caa0f4c5bd456a868442a3fc63\"", checksum_crc32=nil, checksum_crc32c=nil, checksum_sha1=nil, checksum_sha256=nil, server_side_encryption=nil, version_id="7e27a64bec764faa938b9cebb9cf7575", sse_customer_algorithm=nil, sse_customer_key_md5=nil, ssekms_key_id=nil, ssekms_encryption_context=nil, bucket_key_enabled=nil, request_charged=nil>

Cloudflare may use these two keys, domains, and the bucket to diagnose the issue:

What should happen:

On Amazon S3, when you upload something with Content-Encoding: gzip, S3 intelligently handles that encoding for end-users when they request the file. If the user requests gzip, the file is streamed to them without modification from disk. This lets you upload already compressed files to store smaller on disk.

This is probably (?) a special code path for Amazon S3's servers.

What happens on R2:

For example-gzip.json, Cloudflare returns compression encoding that the client negotiates, and double-gzips the file if the client requests gzip. You receive a gzip stream of a gzipped file instead of a gzip stream of the JSON file "passed through" from disk. You can fix the file you downloaded locally by then uncompressing it again, but clients like web browsers won't handle this.

Further clarification: Is this an issue with Cloudflare's S3 API?

Not exactly. R2's API will happily accept the file that misbehaves and store it.

This is more an issue with R2's server or HTTP responses. R2 needs to look at the uploaded encoding of the R2 object before it forwards it to whatever performs Cloudflare's transparent compression.

S3 clients written for AWS expect that files uploaded with Content-Encoding: gzip are handled in this special way for end-user HTTP responses. (For example, a deployment script might compress frontend assets before uploading them to AWS S3.)

This isn't something documented formally in S3-compatable APIs, it's just a situation that AWS S3 will handle gracfefully.

cURL examples:

curl "https://r2-gzip-issue.scryfall.io/example-plain.json" --include --header "Accept-Encoding: gzip" --compressed --output - 
HTTP/1.1 200 OK
Date: Sat, 17 Dec 2022 00:14:19 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
ETag: W/"90cef12c848ede5f1c6b17c422160a4a"
Last-Modified: Fri, 16 Dec 2022 23:53:57 GMT
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=31536000
CF-Cache-Status: MISS
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=gHQl0nYcF1RnAZMY64z3HAeTPeXRYvgVEEajS%2BVqh1Ao6SKTO1f7sm%2BPw2z8GEO2K2E6LgL7BW6kNXg1bgENalEncLQzYhWBvaegvwSyXd7arzacec2qCYZL9%2BIwWLfcPPiaSIMjjkT%2Fj%2Bk%3D"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Strict-Transport-Security: max-age=15552000; includeSubDomains; preload
X-Content-Type-Options: nosniff
Server: cloudflare
CF-RAY: 77ab75388d0913c9-IAD
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

<JSON OUTPUT>
curl "https://r2-gzip-issue.scryfall.io/example-gzip.json" --include --header "Accept-Encoding: gzip" --compressed --output -
HTTP/1.1 200 OK
Date: Sat, 17 Dec 2022 00:14:52 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
ETag: W/"ab1875caa0f4c5bd456a868442a3fc63"
Last-Modified: Fri, 16 Dec 2022 23:54:07 GMT
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=31536000
CF-Cache-Status: HIT
Age: 558
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=omnnAAN3oWRGK1psZlL2Qg%2B0p8lrsbqyTETK%2BypAYgvLmIOvG2ojMcnAxoSrD3z7sn%2BQ8B0UcPHz8PhpyHwbv6M6fxRlXzU2KguVs6g2akRzV8Gh%2FzVemYKt6FAdwamZvalVncAV9%2Bu%2FSTY%3D"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Strict-Transport-Security: max-age=15552000; includeSubDomains; preload
X-Content-Type-Options: nosniff
Server: cloudflare
CF-RAY: 77ab760c9e465b1c-IAD
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

<GARBAGE>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment