Skip to content

Instantly share code, notes, and snippets.

@johnmuller87
Last active February 1, 2018 13:23
Show Gist options
  • Save johnmuller87/50fb79a1304a0e0c7a4101cf8448690c to your computer and use it in GitHub Desktop.
Save johnmuller87/50fb79a1304a0e0c7a4101cf8448690c to your computer and use it in GitHub Desktop.
Using Scala UDF in Pyspark
# Pre Spark 2.1, use the tag 'pre-2.1'
spark._jvm.com.ing.wbaa.spark.udf.ValidateIBAN.registerUDF(spark._jsparkSession)
# Spark 2.1+, use the tag '2.1+'
from pyspark.sql.types import BooleanType
sqlContext.registerJavaFunction("validate_iban", "com.ing.wbaa.spark.udf.ValidateIBAN", BooleanType())
# Spark 2.3+ use the tag '2.1+'
from pyspark.sql.types import BooleanType
spark.udf.registerJavaFunction("validate_iban", "com.ing.wbaa.spark.udf.ValidateIBAN", BooleanType())
# Use your UDF!
spark.sql("""SELECT validate_iban('NL20INGB0001234567')""").show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment