Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Last active September 17, 2019 13:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dgadiraju/482282fe08353660666bd67f6ef19a83 to your computer and use it in GitHub Desktop.
Save dgadiraju/482282fe08353660666bd67f6ef19a83 to your computer and use it in GitHub Desktop.
val ordersDF = spark.
read.
schema("order_id INT, order_date STRING, order_customer_id INT, order_status STRING").
csv("/Users/itversity/Research/data/retail_db/orders")
val ordersDF = spark.
read.
schema("order_id INT, order_date STRING, order_customer_id INT, order_status STRING").
format("csv").
load("/Users/itversity/Research/data/retail_db/orders")
// Using Data Frame Operations
ordersDF.
groupBy("order_date").
count.
show
ordersDF.
groupBy("order_date").
agg(count("order_date").alias("order_count")).
show
// Using Spark SQL
ordersDF.createTempView("orders")
spark.
sql("select order_date, count(1) order_count from orders group by order_date").
show
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment