Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save karpanGit/77141a4cfa57ac21eac265eda7592f90 to your computer and use it in GitHub Desktop.
Save karpanGit/77141a4cfa57ac21eac265eda7592f90 to your computer and use it in GitHub Desktop.
pyspark, create struct from columns
# simple example, create struct
import pyspark.sql.functions as F
df = [[1, 'mplah', 'gogo'], [2, 'mplah2', 'gogo2'], [3, 'mplah3', 'gogo3']]
df = spark.createDataFrame(df, schema=['x', 'y', 'z'])
res = df.select(F.col('x'), F.struct(F.col('x').alias('_x'), F.col('y').alias('_y')).alias('_xy'))
res.show()
# | x| _xy|
# +---+-----------+
# | 1| {1, mplah}|
# | 2|{2, mplah2}|
# | 3|{3, mplah3}|
# +---+-----------+
res.printSchema()
# root
# |-- x: long (nullable = true)
# |-- _xy: struct (nullable = false)
# | |-- _x: long (nullable = true)
# | |-- _y: string (nullable = true)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment