Skip to content

Instantly share code, notes, and snippets.

@kashifmin
Created November 22, 2018 09:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kashifmin/fb424ee9b2a820e4520772b0b4f111fd to your computer and use it in GitHub Desktop.
Save kashifmin/fb424ee9b2a820e4520772b0b4f111fd to your computer and use it in GitHub Desktop.
import org.apache.spark.sql._
// Create a Spark Session
val spark = SparkSession.builder().appName("test").master("local").getOrCreate()
// read some text source file
val srcDF = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/home/jovyan/sales_src.csv")
// self explainatory i guess ? multiply Units Sold column by 2
val unitsBy2 = srcDF.withColumn("Units Sold", $"Units Sold" * 2) // transformation
// Filter rows by order id
val filterOrderId = unitsBy2.filter($"Order Id" > 100) // transformation
// select only
val select = filterOrderId.select($"Region") // transformation
select.take(10) // action
select.explain(extended=true) // spark, please tell me what you did under the hood
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment