Skip to content

Instantly share code, notes, and snippets.

@pavel-filatov
pavel-filatov / parallel_processing.py
Last active April 4, 2024 14:54
Example on how to trigger parallel processing in PySpark.
"""Example on how to trigger Spark actions in parallel from Python.
The core idea of this app is to use ThreadPoolExecutor to trigger Spark actions from different threads.
HOW TO RUN
To run, download/copy-paste this gist and run it from your terminal:
python parallel_processing.py
@pavel-filatov
pavel-filatov / ParallelProcessing.scala
Created May 12, 2021 08:02
Parallel processing with Scala-Spark
object ParallelProcessing {
val queries: List[(String, String)] = List(
("SELECT * FROM ABC", "output1"),
("SELECT * FROM XYZ", "output2")
)
// Just use parallel collection instead of futures, that's it
queries.par foreach {
case (query, path) =>
couple_vector <- function(x) {
couple_vector_iter <- function(y, acc) {
if (length(y) > 0) {
couple_vector_iter(y[-c(1:2)], c(acc, list(y[1:2])))
} else {
acc
}
}
couple_vector_iter(x, NULL)
}
@pavel-filatov
pavel-filatov / evaluation-rules.sc
Created May 12, 2018 11:37
Scala Evaluation Rules
def example = 2 // evaluated when called
val example = 2 // evaluated immediately
lazy val example = 2 // evaluated once when needed
def square(x: Double) // call by value
def square(x: => Double) // call by name
def myFct(bindings: Int*) = { ... } // bindings is a sequence of int, containing a varying # of arguments