Skip to content

Instantly share code, notes, and snippets.

@melissakou
Last active October 2, 2021 17:46
Show Gist options
  • Save melissakou/cf58d137c0e3edd5a9b4587c50e4d691 to your computer and use it in GitHub Desktop.
Save melissakou/cf58d137c0e3edd5a9b4587c50e4d691 to your computer and use it in GitHub Desktop.
sales = spark.read.option("header", True).csv("sales_train_evaluation.csv")
cols = sales.columns[6:]
groupby_state = sales \
.selectExpr("id", "item_id", "dept_id", "cat_id", "store_id", "state_id",
"stack({}, {}) as (d, amount)".format(len(cols), ', '.join(("'{}', {}".format(i, i) for i in cols)))) \
.groupBy("state_id") \
.agg(F.sum("amount").alias('amt_tot')) \
.orderBy(F.col("amt_tot").desc())
groupby_state.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment