Skip to content

Instantly share code, notes, and snippets.

@jlln
Last active March 9, 2021 06:00
Show Gist options
  • Save jlln/e951e95f76c25d07a8e9b6e756d79d4d to your computer and use it in GitHub Desktop.
Save jlln/e951e95f76c25d07a8e9b6e756d79d4d to your computer and use it in GitHub Desktop.
How to apply a function to every row in a Spark DataFrame.
def findNull(row:Row):String = {
if (row.anyNull) {
val indices = (0 to row.length-1).toArray.filter(i => row.isNullAt(i))
indices.mkString(",")
}
else "-1"
}
sqlContext.udf.register("findNull", findNull _)
df = df.withColumn("MissingGroups",callUDF("findNull",struct(df.columns.map(df(_)) : _*)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment