Skip to content

Instantly share code, notes, and snippets.

@nfarah86
Created April 19, 2023 04:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nfarah86/2f1e8d4d6ba64d5fe81e58478cec09f5 to your computer and use it in GitHub Desktop.
Save nfarah86/2f1e8d4d6ba64d5fe81e58478cec09f5 to your computer and use it in GitHub Desktop.
import org.apache.spark.sql.SaveMode._
val basePath = "/tmp/hudi/stocks"
val stocksDF1 = spark.read.json("docker/demo/data/batch_1.json")
val stocksDF2 = spark.read.option("multiline", "true").json("docker/demo/data/batch_2.json").limit(1)
stocksDF1.write.format("hudi").
option("hoodie.datasource.write.recordkey.field", "symbol").
option("hoodie.datasource.write.partitionpath.field", "date").
option("hoodie.datasource.write.precombine.field", "ts").
option("hoodie.table.name", "stocks").
mode(Overwrite).
save(basePath)
stocksDF2.write.format("hudi").
option("hoodie.datasource.write.recordkey.field", "symbol").
option("hoodie.datasource.write.partitionpath.field", "date").
option("hoodie.datasource.write.precombine.field", "ts").
mode(Append).
save(basePath)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment