Skip to content

Instantly share code, notes, and snippets.

@subpath
Last active May 29, 2021 11:02
Show Gist options
  • Save subpath/b1af56900c099af7b83e8705be2527d9 to your computer and use it in GitHub Desktop.
Save subpath/b1af56900c099af7b83e8705be2527d9 to your computer and use it in GitHub Desktop.
package com.mypackage.mytasks
// for the sake of example let's imagine that you have IO common functions
import com.mypackage.mytasks.IO
import org.apache.spark.sql.{SparkSession, DataFrame}
object MyObject {
def my_transformation(dataInputPath:String, dataOutputPath:String){
def runSparkTransformations(): Unit = {
val spark = SparkSession.builder
.master("local[*]")
.getOrCreate()
// here you are loading your data
// let's say it's some bucket with parquet files
// and you using some function from you project
// to load your data
val rawData = IO.loadDataFromBucket(dataInputPath)
// here you performing some transformations
val transformedData = rawData
.select(...)
.groupby(...)
.agg(...)
.cache
// and at the end you wanna store your transformed results
IO.writeResultsToBucket(transformedData, dataOutputPath)
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment