Skip to content

Instantly share code, notes, and snippets.

@katopz
Created August 1, 2020 16:05
Show Gist options
  • Save katopz/1261007bd7ce8ae8f2b90ff4bb8364f2 to your computer and use it in GitHub Desktop.
Save katopz/1261007bd7ce8ae8f2b90ff4bb8364f2 to your computer and use it in GitHub Desktop.
BigQuery delete deduplication

Ref

https://medium.com/google-cloud/bigquery-deduplication-14a1206efdbb

Deduplicate by recreate the Table

Slower

CREATE OR REPLACE TABLE `transactions.testdata`
PARTITION BY date
AS SELECT DISTINCT * FROM `transactions.testdata`;

Deduplicate by using MERGE

Faster

MERGE `transactions.testdata` t
USING (
  SELECT DISTINCT *
  FROM `transactions.testdata`
  WHERE date=CURRENT_DATE()
)
ON FALSE
WHEN NOT MATCHED BY SOURCE AND date=CURRENT_DATE() THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT ROW
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment