Skip to content

Instantly share code, notes, and snippets.

@canimus
Created May 19, 2022 18:17
Show Gist options
  • Save canimus/3082e5218c9353f5bb9ed5459d0fb5a6 to your computer and use it in GitHub Desktop.
Save canimus/3082e5218c9353f5bb9ed5459d0fb5a6 to your computer and use it in GitHub Desktop.
PySpark Map Transform for Null Counts
df.select(
F.array(
F.create_map(
F.lit("k1"), F.col("c1"), F.lit("k2"), F.col("c2"), F.lit("k3"), F.col("c3")
)
).alias("losing_bids")
).select(
F.transform(
"losing_bids",
lambda m: F.transform_values(m, lambda k, v: v.isNull().cast("integer")),
).alias("nulls_in_maps")
).select(
F.transform(
"nulls_in_maps",
lambda x: F.aggregate(F.map_values(x), F.lit(0.0), lambda acc, y: acc + y),
)
.getItem(0)
.cast("integer")
.alias("total_nulls")
).show(
truncate=False
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment