Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dgadiraju/56a9b3f3628ece7f185935577ebd5814 to your computer and use it in GitHub Desktop.
Save dgadiraju/56a9b3f3628ece7f185935577ebd5814 to your computer and use it in GitHub Desktop.
orderItems = spark. \
read. \
json('/Users/itversity/Research/data/retail_db_json/order_items')
from pyspark.sql.window import *
from pyspark.sql.functions import *
spark.conf.set('spark.sql.shuffle.partitions', '2')
# spec = Window.partitionBy('order_item_order_id')
spec = Window.partitionBy(orderItems.order_item_order_id)
orderItemsWithRevenue = orderItems. \
withColumn('order_revenue', round(sum(orderItems.order_item_subtotal).over(spec), 2))
orderItemsWithRevenue.printSchema()
orderItemsWithRevenue.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment