Skip to content

Instantly share code, notes, and snippets.

@thomasdarimont
Created July 30, 2018 20:56
Show Gist options
  • Save thomasdarimont/cb9e4ff142af3c795652172dfcb132fb to your computer and use it in GitHub Desktop.
Save thomasdarimont/cb9e4ff142af3c795652172dfcb132fb to your computer and use it in GitHub Desktop.
Deduplicate results with ARRAY_AGG(..) in BigQuery
#standardSQL
# take the one name associated with a SKU
WITH product_query AS (
SELECT
DISTINCT
v2ProductName,
productSKU
FROM `data-to-insights.ecommerce.all_sessions_raw`
WHERE v2ProductName IS NOT NULL
)
SELECT k.* FROM (
# aggregate the products into an array and
# only take 1 result
SELECT ARRAY_AGG(x LIMIT 1)[OFFSET(0)] k
FROM product_query x
GROUP BY productSKU # this is the field we want deduplicated
);
@seniorpreacher
Copy link

It may be a new feature of BigQuery, but now we can use: ARRAY_AGG(DISTINCT column_name)

@BorFour
Copy link

BorFour commented May 11, 2023

This is already a feature! Check the BigQuery docs for ARRAY_AGG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment