Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save asanchez75/e9fb733930edbec3cafa581c90175de7 to your computer and use it in GitHub Desktop.
Save asanchez75/e9fb733930edbec3cafa581c90175de7 to your computer and use it in GitHub Desktop.
Parallel processing with Scala-Spark
object ParallelProcessing {
val queries: List[(String, String)] = List(
("SELECT * FROM ABC", "output1"),
("SELECT * FROM XYZ", "output2")
)
// Just use parallel collection instead of futures, that's it
queries.par foreach {
case (query, path) =>
val dataPath = s"${pathPrefix}/{path}"
executeAndSave(query, dataPath)
}
def executeAndSave(query: String, dataPath: String)(implicit context: Context): Unit = {
println(s"$query starts")
context.spark.sql(query).write.mode("overwrite").parquet(dataPath)
println(s"$query completes")
}
}
@scalactic
Copy link

Did you mean SparkContext??

@p-filatov
Copy link

Oh wow ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment