Skip to content

Instantly share code, notes, and snippets.

@scalactic
Created July 2, 2021 17:33
Show Gist options
  • Save scalactic/dc2fbcbfa254a1558b912effbbc18e5b to your computer and use it in GitHub Desktop.
Save scalactic/dc2fbcbfa254a1558b912effbbc18e5b to your computer and use it in GitHub Desktop.
//spark-shell --conf spark.sql.sources.partitionOverwriteMode=dynamic
import org.apache.spark.sql.SaveMode
val data = Seq((1,2,"20210701"),(1,3,"20210701"),(3,4,"20210702"),(3,5,"20210702"))
val df = spark.createDataFrame(data).toDF("col_1", "col_2", "prt_date")
// this will create the table, with partitions 20210701, 20210702
df
.write
.partitionBy("prt_date")
.mode(SaveMode.Overwrite)
.format("orc")
.option("orc.compress", "zlib")
.saveAsTable("default.dynamicPartition")
val data1 = Seq((4,4,"20210702"),(4,5,"20210702"))
val df1 = spark.createDataFrame(data).toDF("col_1", "col_2", "prt_date")
// this will overwrite partition 20210702 only.
df1
.write
.mode(SaveMode.Overwrite)
.format("orc")
.option("orc.compress", "zlib")
.insertInto("default.dynamicPartition")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment