Skip to content

Instantly share code, notes, and snippets.

@icexelloss
Last active March 1, 2018 15:24
Show Gist options
  • Save icexelloss/d7d88a123762f8970a4ac20489551712 to your computer and use it in GitHub Desktop.
Save icexelloss/d7d88a123762f8970a4ac20489551712 to your computer and use it in GitHub Desktop.
from pyspark.sql.functions import pandas_udf, PandasUDFType
# Use pandas_udf to define a Pandas UDF
@pandas_udf('double', PandasUDFType.SCALAR)
# Input/output are both a pandas.Series of doubles
def pandas_plus_one(v):
return v + 1
df.withColumn('v2', pandas_plus_one(df.v))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment