Skip to content

Instantly share code, notes, and snippets.

@jmwilli25
jmwilli25 / Spark23_StreamStream-LeftOuterJoin.scala
Last active January 26, 2018 15:59
Spark 2.3 Stream-Stream LeftOuterJoin
////////
// The left side without matches won't be written out until a left record with event time greater than the left watermark has been processed. Matches are written immediately. Right side records with no match are dropped when a right record with eventime greater than the right water mark is processed.
////////
// Get schemas
val txnSchema = spark.read.parquet("s3a://testset-a109/txn/parquet").schema
val scoreSchema = spark.read.parquet("s3a://testset-a109/score/parquet").schema
// Read data
val txnSDf = spark.readStream.schema(txnSchema).parquet("s3a://testset-a109/txn/parquet")