Skip to content

Instantly share code, notes, and snippets.

@melissakou
Created October 1, 2021 15:16
Show Gist options
  • Save melissakou/73acd5c4203d29a58f28973956216070 to your computer and use it in GitHub Desktop.
Save melissakou/73acd5c4203d29a58f28973956216070 to your computer and use it in GitHub Desktop.
rdd2 = sc.parallelize(range(20)).map(lambda x: [x])
df2 = rdd2.toDF(schema=["index"]) \
.withColumn("key", F.when(F.col("index") < 2, F.concat_ws("-", F.lit("key"), F.col("index"))).otherwise("key-2")) \
.cache()
df2.groupBy("key").count().show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment