Skip to content

Instantly share code, notes, and snippets.

@gcsfred
Created November 17, 2018 19:32
Show Gist options
  • Save gcsfred/787593831e8551aab4eb4ccf43616205 to your computer and use it in GitHub Desktop.
Save gcsfred/787593831e8551aab4eb4ccf43616205 to your computer and use it in GitHub Desktop.
Concatenate two columns of a DataFrame using UDF
import pyspark.sql.functions as f
import pyspark.sql.types as t
# ...
def udf_concat_vec(a, b):
# a and b of type SparseVector
return np.concatenate((a.toArray(), b.toArray())).tolist()
my_udf_concat_vec = f.UserDefinedFunction(udf_concat_vec, t.ArrayType(t.FloatType()))
df2 = df.withColumn("togetherAB", my_udf_concat_vec('columnA', 'columnB'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment