Skip to content

Instantly share code, notes, and snippets.

@tahashmi
Created August 6, 2020 14:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tahashmi/2db8fc04423d793ad50024c30b883293 to your computer and use it in GitHub Desktop.
Save tahashmi/2db8fc04423d793ad50024c30b883293 to your computer and use it in GitHub Desktop.
Setting up a sample Spark cluster based on Singularity container on Cartesius Surfsara Server
#!/bin/bash
#SBATCH -N 5
#SBATCH -t 1:10:00
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 16
#SBATCH --output=sparksingu.out
#SBATCH --mem=50G
############################
#For custom configurations python3.6 and Java path
#source run.txt
#export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
#export PATH=$JAVA_HOME/bin:$PATH
#export PYSPARK_PYTHON=/usr/bin/python3.6
#export PYSPARK_DRIVER_PYTHON=/usr/bin/python3.6
#alias python='/usr/bin/python3.6'
#####################################################################################
nodes=($(scontrol show hostname $SLURM_NODELIST))
nnodes=${#nodes[@]}
last=$(( $nnodes - 1 ))
ssh ${nodes[0]} hostname
echo -n "starting spark master on $MASTER... ";
singularity exec ~/tahmad/spark.simg /home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/sbin/start-master.sh
PORT=7077
export MASTER="spark://${nodes[0]}:$PORT"
echo -n "starting spark master on $MASTER... ";
echo "done";
sleep 2;
echo "spark cluster web interface: http://tcn753:8080" >$HOME/spark-info
echo " spark master URL: spark://tcn753:7077" >>$HOME/spark-info
export MASTER=$MASTER
echo starting on-node worker
echo opening ssh connections to start the other nodes worker processeses
i=0
for i in $( seq 1 $last )
do
ssh ${nodes[i]} hostname
done
echo starting remote workers
for i in $( seq 1 $last )
do
/usr/bin/ssh ${nodes[$i]} "singularity exec ~/tahmad/spark.simg /home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/sbin/start-slave.sh -c 16 -m 64G ${MASTER}; echo ${nodes[$i]} " &
done
sleep 15;
#######################################################################################
singularity exec spark.simg /home/tahmad/tahmad/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master spark://${nodes[0]}:7077 --deploy-mode client --driver-memory 12g --executor-memory 16g --num-executors 4 --executor-cores 16 /home/tahmad/tahmad/script.py
#######################################################################################
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment