Skip to content

Instantly share code, notes, and snippets.

@ttomasz
Created November 2, 2022 22:33
Show Gist options
  • Save ttomasz/7fd8ee45f342dd841c7da1905fa1fe34 to your computer and use it in GitHub Desktop.
Save ttomasz/7fd8ee45f342dd841c7da1905fa1fe34 to your computer and use it in GitHub Desktop.
Scala2 implicit conversion to achieve fluent interface on spark dataframes
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "2.12.17"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.2.2",
"org.apache.spark" %% "spark-sql" % "3.2.2",
)
import org.apache.spark.sql.functions.{col, lower}
import org.apache.spark.sql.{DataFrame, SparkSession}
object Main {
def main(args: Array[String]): Unit = {
// initialize
val spark = SparkSession.builder.master("local[1]").appName("Simple Application").getOrCreate()
// allows .toDf conversion
import spark.implicits._
// create sample DataFrame
val columns = Seq("language", "users_count")
val data = Seq(("Java", "20000"), ("Python", "100000"), ("Scala", "3000"))
val sample_df: DataFrame = data.toDF(columns:_*)
// print sample data frame
println(
sample_df
.show
)
// print data frame after modifying it
println(
sample_df
.langToLowercase
.renameLangColumn
.show
)
}
implicit class Functions(df: DataFrame) {
def langToLowercase: DataFrame = df.withColumn("language", lower(col("language")))
def renameLangColumn: DataFrame = df.withColumnRenamed("language", "language_name")
}
}

Implicit conversions allow extending DataFrame class/trait without inheritance. Code is converted to function calls during compilation.

For example:

sample_df
  .langToLowercase
  .renameLangColumn

is converted to: renameLangColumn(langToLowercase(sample_df))

Running this code outputs:

+--------+-----------+ |language|users_count| +--------+-----------+ | Java| 20000| | Python| 100000| | Scala| 3000| +--------+-----------+

+-------------+-----------+ |language_name|users_count| +-------------+-----------+ | java| 20000| | python| 100000| | scala| 3000| +-------------+-----------+

In case of error about StorageUtils see https://stackoverflow.com/questions/72230174/java-17-solution-for-spark-java-lang-noclassdeffounderror-could-not-initializ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment