Skip to content

Instantly share code, notes, and snippets.

@zsxwing
Last active March 22, 2018 17:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zsxwing/18ee545ac640d3009b793763a2e7ec8b to your computer and use it in GitHub Desktop.
Save zsxwing/18ee545ac640d3009b793763a2e7ec8b to your computer and use it in GitHub Desktop.
watermark = 1 hour
First batch (max event time = null):
2017-06-07 10:00:00.000
StateStore will store 2017-06-07 10:00:00.000
Second batch (max event time = 2017-06-07 10:00:00.000):
2017-06-07 11:00:00.000
StateStore will store 2017-06-07 10:00:00.000 and 2017-06-07 11:00:00.000
StateStore will evict rows <= max event time - 1 hour (2017-06-07 09:00:00.000)
Third batch (max event time = 2017-06-07 11:00:00.000):
2017-06-07 12:00:00.000
StateStore will store 2017-06-07 10:00:00.000, 2017-06-07 11:00:00.000 and 2017-06-07 12:00:00.000
StateStore will evict rows <= max event time - 1 hour (2017-06-07 10:00:00.000)
So now you can see 2017-06-07 10:00:00.000 in the output.
@messenjah00
Copy link

Is it possible to shorten the time until data is output by using a negative delayThreshold for the watermark? In my use case, I assign current_timestamp() to eventTime so for me there's no point in waiting for late data to arrive because there is no late data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment