Skip to content

Instantly share code, notes, and snippets.

@dongjinahn
Last active July 5, 2019 01:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dongjinahn/5e0aa1c98a16ddd61f4e49a014fdf1e9 to your computer and use it in GitHub Desktop.
Save dongjinahn/5e0aa1c98a16ddd61f4e49a014fdf1e9 to your computer and use it in GitHub Desktop.
import org.apache.spark.sql.functions
type myFunctionType = (Int, Long, String) => Int
val myFunction: myFunctionType = (i, l, str) => i + l.toInt + Integer.parseInt(str)
val myUDF = functions.udf(myFunction)
type myFunction2Type = (Long, java.lang.Double, String) => Tuple2[Double, String]
val myFunction2: myFunction2Type = (l, d, str) => (d + l, str)
val myUDF2 = functions.udf(myFunction2)
val colList = List[Column](
originalDF("long_column"),
originalDF("double_column"),
originalDF("string_column")
)
val processedDF = originalDF
.withColumn("newCol", myUDF2(colList:_*))
.select("newCol._1", "newCol._2")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment