Skip to content

Instantly share code, notes, and snippets.

@anbento0490
Last active April 27, 2021 08:41
Show Gist options
  • Save anbento0490/828d9159796b275d659b64ce6d2cc020 to your computer and use it in GitHub Desktop.
Save anbento0490/828d9159796b275d659b64ce6d2cc020 to your computer and use it in GitHub Desktop.
#APPLY MANIPULATION TO CREATE NEW DF
df_final = (df.withColumn("order_id", df["Order ID"]).drop("Order ID")
.withColumn("order_date", to_date(col("Order Date"), "M/d/yyyy")).drop("Order Date")
.withColumn("item_type", df["Item Type"]).drop("Item Type")
.withColumn("sales_channel", df["Sales Channel"]).drop("Sales Channel")
.withColumn("units_sold", df["Units Sold"].cast('float')).drop("Units Sold")
.withColumn("unit_price", df["Unit Price"].cast('float')).drop("Unit Price")
.withColumn("total_revenue", df["Total Revenue"].cast("float")).drop("Total Revenue")
.drop("Region", "Country", "Order Priority", "Ship Date", "Total Profit", "Total Cost", "Unit Cost")
.distinct()
)
df_final.show(5) # source_1
# df_final.count() # 2,097,152 rows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment