Skip to content

Instantly share code, notes, and snippets.

@HeartSaVioR
Last active August 20, 2018 22:00
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save HeartSaVioR/6606bceceaceaff439db0853fa591501 to your computer and use it in GitHub Desktop.
Save HeartSaVioR/6606bceceaceaff439db0853fa591501 to your computer and use it in GitHub Desktop.
A bit tricky result of Spark SQL query result (Tested with 2.3.0)
/////////////////////////////////////////////////////////////////////////////////////////////
// 1. select with swapping columns, and apply where
/////////////////////////////////////////////////////////////////////////////////////////////
import spark.implicits._
import org.apache.spark.sql.{DataFrame, Dataset}
case class Hello(id: Int, name1: String, name2: String)
val ds = List(Hello(1, "Alice", "Bob"), Hello(2, "Bob", "Alice")).toDS
val swapRenamedDF: DataFrame = ds.select($"id", $"name1".as("name2"), $"name2".as("name1"))
swapRenamedDF.explain
swapRenamedDF.where($"name1" === "Alice").show
swapRenamedDF.where($"name1" === "Alice").explain
swapRenamedDF.where($"name2" === "Alice").show
swapRenamedDF.where($"name2" === "Alice").explain
/*
> swapRenamedDF.explain
== Physical Plan ==
LocalTableScan [id#3, name2#7, name1#8]
> swapRenamedDF.where($"name1" === "Alice").show
+---+-----+-----+
| id|name2|name1|
+---+-----+-----+
| 2| Bob|Alice|
+---+-----+-----+
> swapRenamedDF.where($"name1" === "Alice").explain
== Physical Plan ==
*(1) Project [id#3, name1#4 AS name2#7, name2#5 AS name1#8]
+- *(1) Filter (isnotnull(name2#5) && (name2#5 = Alice))
+- LocalTableScan [id#3, name1#4, name2#5]
> swapRenamedDF.where($"name2" === "Alice").show
+---+-----+-----+
| id|name2|name1|
+---+-----+-----+
| 1|Alice| Bob|
+---+-----+-----+
> swapRenamedDF.where($"name2" === "Alice").explain
== Physical Plan ==
*(1) Project [id#3, name1#4 AS name2#7, name2#5 AS name1#8]
+- *(1) Filter (isnotnull(name1#4) && (name1#4 = Alice))
+- LocalTableScan [id#3, name1#4, name2#5]
*/
/////////////////////////////////////////////////////////////////////////////////////////////
// 2. apply where, and select with swapping columns
/////////////////////////////////////////////////////////////////////////////////////////////
import spark.implicits._
import org.apache.spark.sql.{DataFrame, Dataset}
case class Hello(id: Int, name1: String, name2: String)
val ds = List(Hello(1, "Alice", "Bob"), Hello(2, "Bob", "Alice")).toDS
val applyWhereToName1ThenSelectDF = ds.where($"name1" === "Alice").select($"id", $"name1".as("name2"), $"name2".as("name1"))
applyWhereToName1ThenSelectDF.show
applyWhereToName1ThenSelectDF.explain
val applyWhereToName2ThenSelectDF = ds.where($"name2" === "Alice").select($"id", $"name1".as("name2"), $"name2".as("name1"))
applyWhereToName2ThenSelectDF.show
applyWhereToName2ThenSelectDF.explain
/*
> applyWhereToName1ThenSelectDF.show
+---+-----+-----+
| id|name2|name1|
+---+-----+-----+
| 1|Alice| Bob|
+---+-----+-----+
> applyWhereToName1ThenSelectDF.explain
== Physical Plan ==
*(1) Project [id#3, name1#4 AS name2#8, name2#5 AS name1#9]
+- *(1) Filter (isnotnull(name1#4) && (name1#4 = Alice))
+- LocalTableScan [id#3, name1#4, name2#5]
> applyWhereToName2ThenSelectDF.show
+---+-----+-----+
| id|name2|name1|
+---+-----+-----+
| 2| Bob|Alice|
+---+-----+-----+
> applyWhereToName2ThenSelectDF.explain
== Physical Plan ==
*(1) Project [id#3, name1#4 AS name2#26, name2#5 AS name1#27]
+- *(1) Filter (isnotnull(name2#5) && (name2#5 = Alice))
+- LocalTableScan [id#3, name1#4, name2#5]
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment