Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save maiconbaum/dc3124eaaecffdba82935a57eb3f6b03 to your computer and use it in GitHub Desktop.
Save maiconbaum/dc3124eaaecffdba82935a57eb3f6b03 to your computer and use it in GitHub Desktop.
AWS Glue Catalog Table for centralized Databricks Audit Logs using Partition Projection.
CREATE EXTERNAL TABLE IF NOT EXISTS databricks_audit_logs (
`version` string,
`timestamp` bigint,
`sourceIPAddress` string,
`userAgent` string,
`sessionId` string,
`userIdentity` STRUCT<
email: STRING
>,
`requestParams` MAP<string, string>,
`response` STRUCT<
errorMessage: STRING,
statusCode: INT,
result: STRING
>,
`serviceName` string,
`actionName` string,
`requestId` string,
`accountId` string,
`auditLevel` string
)
PARTITIONED BY (
`workspaceid` string,
`date` string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION
's3://<bucket-name>/audit-logs/'
TBLPROPERTIES (
'projection.enabled'='true',
'projection.date.format'='yyyy-MM-dd',
'projection.date.interval'='1',
'projection.date.interval.unit'='DAYS',
'projection.date.range'='2020-01-01,NOW',
'projection.date.type'='date',
'projection.workspaceid.type' = 'injected',
'storage.location.template'='s3://<bucket-name>/audit-logs/workspaceId=${workspaceid}/date=${date}'
)
@maiconbaum
Copy link
Author

@maiconbaum
Copy link
Author

⚠️ This table uses partition projection with the workspaceidbeing an injected data type, which means that you must provide the workspaceid in the WHERE clause. E.g.:

SELECT *
FROM "default"."databricks_audit_logs"
WHERE workspaceid = '123456789123456'
AND date = '2023-03-23'
limit 10;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment