Skip to content

Instantly share code, notes, and snippets.

@scalactic
Last active July 14, 2021 15:12
Show Gist options
  • Save scalactic/6a32489f3ffba50a82727138e5a57d98 to your computer and use it in GitHub Desktop.
Save scalactic/6a32489f3ffba50a82727138e5a57d98 to your computer and use it in GitHub Desktop.
Generate schema from case class in spark
import org.apache.spark.sql.types.{StructType, ArrayType}
import org.apache.spark.sql.catalyst.ScalaReflection
/** Simple schema */
case class A(key: String, time: java.sql.Timestamp, date: java.sql.Date, decimal: java.math.BigDecimal, map: Map[String, Int], nested: Seq[Map[String, Seq[Int]]])
val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType]
schema.printTreeString
/** Array schema */
val arrSchema = ScalaReflection.schemaFor[Seq[A]].dataType.asInstanceOf[ArrayType]
/** When aUDF is returning A */
val aUDF = udf(()=>{}, schema)
/** When arrUDF is returning Seq[A] */
val arrUDF = udf(()=>{}, arrSchema)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment