Skip to content

Instantly share code, notes, and snippets.

@melissakou
Last active October 1, 2021 15:17
Show Gist options
  • Save melissakou/9b75dcfb564b2d660a199a2268124497 to your computer and use it in GitHub Desktop.
Save melissakou/9b75dcfb564b2d660a199a2268124497 to your computer and use it in GitHub Desktop.
rdd1 = sc.parallelize(range(int(1e8)), 200).map(lambda x: [x])
df1 = rdd1.toDF(schema=["index"]) \
.withColumn("key", F.when(F.col("index") < 2, F.concat_ws("-", F.lit("key"), F.col("index"))).otherwise("key-2")) \
.cache()
df1.groupBy("key").count().show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment