amuraru/spark-tuning.md

## spark-tuning.md

      
    Raw
  

              spark-tuning.md
            
          
      graph TD;
    A-->B;
    A-->C;
    B-->D;
    C-->D;

    
      Loading

  
Spark Memory Tuning


Increase driver heap to accommodate large DAGs
Avoid too granular executors (use larger heaps) and configure multi-threading (cores)
Set memory.fraction=0.6 to leave the rest to executor working memory (shuffle, etc)
60% of instance CPUs allocated to executors, leave headroom for other tasks
Disable offHeap memory – not stable in our tests

# instance i3.8xlarge | 244GiB | 32CPU | 4*2TiB SSD | 10Gbps
driver-memory 32g
spark.driver.maxResultSize=10g

executor-memory 32g
executor-cores 6
num-executors INSTANCES*4

spark.memory.offHeap.enabled=false
spark.executor.memoryOverhead=12g
spark.memory.fraction=0.6

spark.dynamicAllocation.enabled=false
spark.shuffle.service.enabled=false
RDD Persistence


Use  disk persistence only, when running on SSDs
Leave heap memory to Spark executor

--conf spark.driver.extraJavaOptions= 
-Dspark.persistence.useDisk=true 
-Dspark.persistence.useOnHeapMemory=false 
-Dspark.persistence.useOffHeapMemory=false 
-Dspark.persistence.keepDeserialized=false 
-Dspark.persistence.replication=2
Shuffle Tuning


Fine tune the shuffle partitions based on your number of executors/cores.
Increase the split size when reading data from blob store (e.g. S3)

spark.sql.shuffle.partitions=SPARK_NUM_EXECUTORS * SPARK_EXECUTOR_CORES*2

spark.sql.files.maxPartitionBytes= 268435456
spark.files.maxPartitionBytes=268435456
Increase tasks execution resilience


Increase network timeouts to cope with network flukes (e.g. EMR)
Enable blacklisting of executors and increase number of task retries to cope with degraded instances

spark.sql.broadcastTimeout=36000
spark.network.timeout=120

spark.task.maxFailures=20
spark.blacklist.enabled=true
spark.blacklist.timeout=99h