Skip to content

Instantly share code, notes, and snippets.

@tonyfraser
Created September 4, 2019 20:38
Show Gist options
  • Save tonyfraser/e9b6fb6a2cb7ceb216dc2261d30c5752 to your computer and use it in GitHub Desktop.
Save tonyfraser/e9b6fb6a2cb7ceb216dc2261d30c5752 to your computer and use it in GitHub Desktop.
dynamically create a column if a column in a spark dataframe if it does not already exist
//An example of dynamically adding a column if it does not exist
val df = Seq(
("channel_one", "my_show", "episode1"),
("channel_one", "my_show", "episode2")
).toDF("network_name", "show_name", "episode")
//there is no rank column so add one
val newdf = df.columns match {
case a if a contains "rank" => df
case _ =>df.withColumn("rank", lit("0"))
}
newdf.show
// +------------+---------+--------+----+
// |network_name|show_name| episode|rank|
// +------------+---------+--------+----+
// | channel_one| my_show|episode1| 0|
// | channel_one| my_show|episode2| 0|
// +------------+---------+--------+----+
// there is a show_name column
val newdf2 = df.columns match {
case a if a contains "show_name" => df
case _ =>df.withColumn("rank", lit("0"))
}
newdf2.show
// +------------+---------+--------+
// |network_name|show_name| episode|
// +------------+---------+--------+
// | channel_one| my_show|episode1|
// | channel_one| my_show|episode2|
// +------------+---------+--------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment