Skip to content

Instantly share code, notes, and snippets.

@Mr--John-Doe
Last active April 2, 2022 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Mr--John-Doe/989d359f9fc6f79388335196d562e050 to your computer and use it in GitHub Desktop.
Save Mr--John-Doe/989d359f9fc6f79388335196d562e050 to your computer and use it in GitHub Desktop.
Gets a list of source/target tables for any query (only available if bundle 2021_10 is enabled)
WITH table_lineage_history AS (
SELECT r.value:"objectName"::varchar AS upstream_table_name
, r.value:"objectDomain"::varchar AS upstream_table_domain
, r.value:"columns" AS upstream_table_columns
, w.value:"objectName"::varchar AS downstream_table_name
, w.value:"objectDomain"::varchar AS downstream_table_domain
, w.value:"columns" AS downstream_table_columns
, t.query_start_time AS query_start_time
, t.QUERY_ID AS query_id
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY t
, LATERAL FLATTEN(INPUT => t.BASE_OBJECTS_ACCESSED) r
, LATERAL FLATTEN(INPUT => t.OBJECTS_MODIFIED) w
WHERE r.value:"objectId" IS NOT NULL
AND w.value:"objectId" IS NOT NULL
AND t.QUERY_START_TIME >= (CURRENT_DATE - 1)::timestamp -- filtering query_history a little
)
SELECT QH.SESSION_ID
, QH.QUERY_ID
, QH.QUERY_TYPE
, QH.QUERY_TEXT
, lin.upstream_table_name
, lin.downstream_table_name
, lin.upstream_table_columns
, lin.downstream_table_columns
FROM table_lineage_history lin
INNER JOIN SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY QH
ON QH.QUERY_ID = lin.query_id
AND qh.START_TIME >= (CURRENT_DATE - 1)::timestamp -- filtering query_history a little
WHERE lin.downstream_table_domain = 'Table' -- only interested in tables as target
AND QH.QUERY_TYPE != 'DELETE' -- DELETES are not really lineage
@CristianDuca
Copy link

hello ! this is an interesting script. I've tried it and it doesnt work on some of the queries. For example I have a query to Create a view and it just doesnt show me the modified objects. do you happen to know why ?

@Mr--John-Doe
Copy link
Author

Hi Cristian,
that is because this table only captures "data reads" and a view is only metadata (it's a fancy with clause, where you store code to run later). If you query from the view, you should see the query being run in the DB and showing in this table. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment