Skip to content

Instantly share code, notes, and snippets.

@spektom
Created May 3, 2018 13:47
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save spektom/21243a2ac3b18745d953423a3a807ba3 to your computer and use it in GitHub Desktop.
Save spektom/21243a2ac3b18745d953423a3a807ba3 to your computer and use it in GitHub Desktop.
Generate Hive schema from Spark Dataframe
import org.apache.spark.sql.DataFrame
def dataFrameToDDL(dataFrame: DataFrame, tableName: String): String = {
val columns = dataFrame.schema.map { field =>
" " + field.name + " " + field.dataType.simpleString.toUpperCase
}
s"CREATE TABLE $tableName (\n${columns.mkString(",\n")}\n)"
}
import spark.sqlContext.implicits._
// Example of hierarchical structure:
case class Model(`type`: String)
case class Device(`type`: String, model: Model, serial: Long)
case class Event(device: Device, timestamp: Long)
val df = Seq(
Event(Device("Android", Model("Huawei"), 1), 1525354897L)).toDF()
dataFrameToDDL(df, "events")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment