Skip to content

Instantly share code, notes, and snippets.

@1ambda
Created December 21, 2021 23:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1ambda/a8935f824f79110c868e66e01df1f3ad to your computer and use it in GitHub Desktop.
Save 1ambda/a8935f824f79110c868e66e01df1f3ad to your computer and use it in GitHub Desktop.
df\
    .selectExpr("CAST(event_time AS DATE) as event_date", "brand", "product_id", "ARRAY(category_code, category_id) as category")\
    .where(
        col("brand").isNotNull() &
        ((col("category_code").isNotNull()))
    )\
    .groupBy("event_date")\
    .agg(
        collect_set("product_id").alias("product_id_set"),
        collect_set("category").alias("category_set")
    )\
    .orderBy(asc("event_date"))\
    .limit(1)\
    .show(truncate=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment