Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Spark Structured Streaming Kinesis Watermarks
from pyspark.sql.functions import window
# configure reading from the stream
kinesis_df = spark.readStream.format("kinesis")
.option("streamName", KINESIS_STREAM_NAME)
.option("region", AWS_REGION)
.option("roleArn", KINESIS_ACCESS_ROLE_ARN
.option("initialPosition", "latest")
.load()
# configure custom logic for watermarks / event time windows
kinesis_df.withWatermark("eventTime", "10 minutes") \
.groupBy(
"userId",
window("eventTime", "10 minutes", "5 minutes"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment