Skip to content

Instantly share code, notes, and snippets.

@ebuildy
Created June 28, 2019 07:51
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ebuildy/3c9b2663d47f7b65fbc12cfb469ae19c to your computer and use it in GitHub Desktop.
Save ebuildy/3c9b2663d47f7b65fbc12cfb469ae19c to your computer and use it in GitHub Desktop.
Apache Spark SQL UDF to get or default value in Struct, with dot notation path
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
import org.apache.spark.sql.Row
spark.udf.register("struct_def", (root:GenericRowWithSchema, path: String, defaultValue: String) => {
var fields = path.split("\\.")
var buffer:Row = root
val lastItem = fields.last
fields = fields.dropRight(1)
fields.foreach( (field:String) => {
if (buffer != null) {
if (buffer.schema.fieldNames.contains(field)) {
buffer = buffer.getStruct(buffer.fieldIndex(field))
} else {
buffer = null
}
}
})
if (buffer == null) {
defaultValue
} else {
buffer.getString(buffer.fieldIndex(lastItem))
}
})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment