Skip to content

Instantly share code, notes, and snippets.

@hui-zheng
Created March 14, 2020 19:49
Show Gist options
  • Save hui-zheng/b52eaebaff6737385e3791fb89ea897f to your computer and use it in GitHub Desktop.
Save hui-zheng/b52eaebaff6737385e3791fb89ea897f to your computer and use it in GitHub Desktop.
[BigQuery Advanced SQL] the most flexible script to detect and display duplicate records and remove duplicates (dedup)
-- base_time has t
WITH rows_by_key AS(
SELECT
surrogate_key,
array_agg(base_table) as _rows,
count(*) as _count
FROM `gcp_project.data_set.original_table` as base_table
WHERE stamp BETWEEN "2020-03-12T00:00:00" AND "2020-03-14T00:00:00"
GROUP BY surrogate_key
)
SELECT
*
FROM rows_by_key
where _count > 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment