Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jacopotagliabue/3fc8aaf95ac2caf99e96bf5dadd7df05 to your computer and use it in GitHub Desktop.
Save jacopotagliabue/3fc8aaf95ac2caf99e96bf5dadd7df05 to your computer and use it in GitHub Desktop.
Query 13 from TPC-H on parquet files stored in S3
SELECT
c_count,
count(*) AS custdist
FROM (
SELECT
c_custkey,
count(o_orderkey)
FROM
read_parquet('s3://bucket/customer.parquet') AS customer
LEFT OUTER JOIN
read_parquet('s3://bucket/orders.parquet') AS orders
ON c_custkey = o_custkey
AND o_comment NOT LIKE '%special%requests%'
GROUP BY
c_custkey) AS c_orders (c_custkey, c_count)
GROUP BY
c_count
ORDER BY
custdist DESC,
c_count DESC;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment