Skip to content

Instantly share code, notes, and snippets.

@Arnold1
Created February 19, 2018 03:45
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Arnold1/9ca288ebe38ad3c99b49a4b1b3095bdc to your computer and use it in GitHub Desktop.
Save Arnold1/9ca288ebe38ad3c99b49a4b1b3095bdc to your computer and use it in GitHub Desktop.
copy schema from one dataframe to another dataframe
// reference: https://stackoverflow.com/questions/36795680/copy-schema-from-one-dataframe-to-another-dataframe?rq=1
case class Person(Dummy: String, Name: String, Timestamp: String, Age: Int)
val personDF = spark.sparkContext.parallelize(Seq(Person("dummy", "Ray", "12345", 23), Person("dummy", "John", "12345", 44))).toDF()
val personSchema = StructType(
Seq(StructField("Name", StringType, true),
StructField("Age", IntegerType, true)))
var dataRDD = spark.sparkContext.emptyRDD[Row]
val perschonDFSchema = spark.createDataFrame(dataRDD, personSchema)
val exprs = perschonDFSchema.schema.fields.map { f =>
if (personDF.schema.fields.contains(f)) col(f.name)
else lit(null).cast(f.dataType).alias(f.name)
}
val finalDF = personDF.select(exprs: _*)
finalDF.show(false)
finalDF.printSchema
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment