Skip to content

Instantly share code, notes, and snippets.

@dreyco676
Created October 9, 2018 04:22
Show Gist options
  • Save dreyco676/0476deffea85713047f61eb03deef6a6 to your computer and use it in GitHub Desktop.
Save dreyco676/0476deffea85713047f61eb03deef6a6 to your computer and use it in GitHub Desktop.
PySpark List Column to Boolean Columns for each value
from pyspark.sql.functions import split, explode, lit, coalesce, first
# split 'ROOF' column by comma
df = df.withColumn('roof_list', split(df['ROOF'], ', '))
# explode each value to a new record
ex_df = df.withColumn('ex_roof_list', explode(df['roof_list']))
# create a new record to agg by later
ex_df = ex_df.withColumn('constant_val', lit(1))
# pivot on the exploded column, coalesce null values and take the first value
piv_df = ex_df.groupBy('NO').pivot('ex_roof_list').agg(coalesce(first('constant_val')))
# fill nulls with 0
piv_df = piv_df.fillna(0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment