This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| AWS Glue Streaming Job: MSK → Iceberg with lag monitoring. | |
| Reads from MSK via Spark Structured Streaming, writes to an Iceberg table | |
| in the Glue Data Catalog, and tracks consumer lag via a dedicated Kafka | |
| consumer group using the LagMonitorListener. | |
| Usage: | |
| Upload this file + streaming_lag_monitor.py to S3 and reference them | |
| in your Glue job configuration (see infra/main.tf). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import org.apache.spark.sql.expressions.Window | |
| val oldData = spark.read.parquet(target+"/old") | |
| .withColumn("last", lit(false)) | |
| oldData.cache | |
| val newData = spark.read.parquet(target+"/updates") | |
| .withColumn("last", lit(true)) | |
| .select(oldData.columns.map(col(_)):_*) | |
| newData.cache |