Skip to content

Instantly share code, notes, and snippets.

@tonyfraser
Created September 23, 2019 17:00
Show Gist options
  • Save tonyfraser/84459cba4d0f1b1864b410fc37b62697 to your computer and use it in GitHub Desktop.
Save tonyfraser/84459cba4d0f1b1864b410fc37b62697 to your computer and use it in GitHub Desktop.
spark/scala : Convert all empty string records in a dataframe to null.
import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions.udf
// Usage: df.select(df.columns.map(c => emptyToNullUdf(col(c)).alias(c)): _*)
def emptyToNull(_str: String): Option[String] = {
_str match {
case d if (_str == null || _str.trim.isEmpty) => None
case _ => Some(_str)
}
}
val emptyToNullUdf = udf(emptyToNull(_: String))
// https://stackoverflow.com/questions/34037889/apply-same-function-to-all-fields-of-spark-dataframe-row/55032886#55032886
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment