tushar-rishav/spark-sched.md

## spark-sched.md

      
    Raw
  

              spark-sched.md
            
          
    Factors pertaining to Spark job scheduling
Resource allocation


spark.scheduler.mode: FIFO (default) and Fair job scheduling.
Dynamic allocation: https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

Speculation

spark.speculation: false (default). When turned on, Spark checks every spark.speculation.interval setting to determine stragglers.
Data locality

Spark tries to maintain a list of preferred locations for each partition.
A partition’s preferred location is a list of hostnames or executors where the partition’s data resides so that computation can be moved closer to the data. This information is obtained
for RDDs that are based on HDFS data. If Spark obtains a list of preferred locations, the Spark scheduler tries to run tasks
on the executors where the data is physically present so that no data transfer is required. This can have a big impact on performance.
There are five levels of data locality:

PROCESS_LOCAL — Execute a task on the executor that cached the partition.
NODE_LOCAL — Execute a task on the node where the partition is available.
RACK_LOCAL — Execute the task on the same rack as the partition if rack information is available in the cluster (currently only on YARN).
NO_PREF — No preferred locations are associated with the task.
ANY — Default if everything else fails.


If a task slot with the best locality for the task can’t be obtained (that is, all the matching task slots are taken), the scheduler waits a spark.locality.wait amount of time and then tries a location with the second-best locality, and so on.

Important configs:
spark.locality.wait, spark.locality.wait.node etc for configuring specific locality
We can force the Spark scheduler to honor the desired locality level by waiting until it becomes available.
Use a high value in situations where data locality is of critical importance. Say 10 minutes.
Memory constraints:


spark.storage.memoryFraction -- default 0.6 -- reserved memory for spark cached data storage
spark.shuffle.memoryFraction -- default 0.2 -- reserved for temporary shuffle data.
spark.storage.safetyFraction and spark.shuffle.safetyFraction -- default 0.9 -- to avoid OOM kills.