rom1504/spark_on_slurm.md

## spark_on_slurm.md

      
    Raw
  

              spark_on_slurm.md
            
          
    Steps:

wget https://gist.github.com/rom1504/67ada3dedbecc113ae2dbdfd9c642d83/raw/865fb35e00f21330b5b82aeb7c31941b6c18f649/spark_on_slurm.sh
wget https://gist.github.com/rom1504/67ada3dedbecc113ae2dbdfd9c642d83/raw/865fb35e00f21330b5b82aeb7c31941b6c18f649/worker_spark_on_slurm.sh
wget https://archive.apache.org/dist/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz && tar xf spark-3.3.1-bin-hadoop3.tgz
sbatch spark_on_slurm.sh
build a venv, install pyspark, then run something like this:

(you can get https://huggingface.co/datasets/laion/laion-coco/resolve/main/part-00000-2256f782-126f-4dc6-b9c6-e6757637749d-c000.snappy.parquet as an example parquet)
from pyspark.sql import SparkSession
from  pyspark.sql.functions import lit, rand

spark = (
        SparkSession.builder
        .config("spark.submit.deployMode", "client") \
        .config("spark.executor.memory", "16GB")
        .config("spark.executor.memoryOverhead", "8GB")
        .config("spark.task.maxFailures", "100")
        .master("spark://master_node:7077")
        .appName("spark-stats")
        .getOrCreate()
    )
    
df = spark.read.parquet("part-00000-2256f782-126f-4dc6-b9c6-e6757637749d-c000.snappy.parquet")

df.count()
replace master_node by the first node you got in slurm job
you may check the spark ui by doing something along these lines :
ssh -L 4040:localhost:4040 -L 8080:localhost:8080 login_node
then
ssh -L localhost:4040:master_node:4040 -L localhost:8080:master_node:8080 master_node
and check http://localhost:4040 and http://localhost:8080 in your browser

  
## spark_on_slurm.sh
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --job-name=spark_on_slurm
#SBATCH --nodes 2
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task=48
#SBATCH --mem=0 # 0 means use all available memory (in MB)
#SBATCH --output=%x_%j.out
#SBATCH --comment bild
#SBATCH --exclusive

srun --comment bild bash worker_spark_on_slurm.sh

## worker_spark_on_slurm.sh
#!/bin/bash
#
# get environment variables
GLOBAL_RANK=$SLURM_PROCID
CPUS=`grep -c ^processor /proc/cpuinfo`
MEM=$((`grep MemTotal /proc/meminfo | awk '{print $2}'`/1000)) # seems to be in MB
MASTER_ADDR=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
LOCAL_IP=$(hostname -I | awk '{print $1}')
LOCALDIR=/scratch/spark
export SPARK_WORKER_DIR=$LOCALDIR/work
export SPARK_LOCAL_DIRS=$LOCALDIR/local

# setup the master node
if [ $GLOBAL_RANK == 0 ]
then
    # print out some info
    echo -e "MASTER ADDR: $MASTER_ADDR\tGLOBAL RANK: $GLOBAL_RANK\tCPUS PER TASK: $CPUS\tMEM PER NODE: $MEM"

    # then start the spark master node in the background
    ./spark-3.3.1-bin-hadoop3/sbin/start-master.sh -p 7077 -h $LOCAL_IP

fi

sleep 10

# then start the spark worker node in the background
MEM_IN_GB=$(($MEM / 1000))
# concat a "G" to the end of the memory string
MEM_IN_GB="$MEM_IN_GB"G
echo "MEM IN GB: $MEM_IN_GB"

./spark-3.3.1-bin-hadoop3/sbin/start-worker.sh -c $CPUS -m $MEM_IN_GB "spark://$MASTER_ADDR:7077"
echo "Hello from worker $GLOBAL_RANK"

sleep 10

if [ $GLOBAL_RANK == 0 ]
then
    # then start some script
    echo "hi"
fi

sleep 1000000
	#!/bin/bash
	#SBATCH --partition=gpu
	#SBATCH --job-name=spark_on_slurm
	#SBATCH --nodes 2
	#SBATCH --ntasks-per-node 1
	#SBATCH --cpus-per-task=48
	#SBATCH --mem=0 # 0 means use all available memory (in MB)
	#SBATCH --output=%x_%j.out
	#SBATCH --comment bild
	#SBATCH --exclusive

	srun --comment bild bash worker_spark_on_slurm.sh
	#!/bin/bash
	#
	# get environment variables
	GLOBAL_RANK=$SLURM_PROCID
	CPUS=`grep -c ^processor /proc/cpuinfo`
	MEM=$((`grep MemTotal /proc/meminfo \| awk '{print $2}'`/1000)) # seems to be in MB
	MASTER_ADDR=$(scontrol show hostnames "$SLURM_JOB_NODELIST" \| head -n 1)
	LOCAL_IP=$(hostname -I \| awk '{print $1}')
	LOCALDIR=/scratch/spark
	export SPARK_WORKER_DIR=$LOCALDIR/work
	export SPARK_LOCAL_DIRS=$LOCALDIR/local

	# setup the master node
	if [ $GLOBAL_RANK == 0 ]
	then
	# print out some info
	echo -e "MASTER ADDR: $MASTER_ADDR\tGLOBAL RANK: $GLOBAL_RANK\tCPUS PER TASK: $CPUS\tMEM PER NODE: $MEM"

	# then start the spark master node in the background
	./spark-3.3.1-bin-hadoop3/sbin/start-master.sh -p 7077 -h $LOCAL_IP

	fi

	sleep 10

	# then start the spark worker node in the background
	MEM_IN_GB=$(($MEM / 1000))
	# concat a "G" to the end of the memory string
	MEM_IN_GB="$MEM_IN_GB"G
	echo "MEM IN GB: $MEM_IN_GB"

	./spark-3.3.1-bin-hadoop3/sbin/start-worker.sh -c $CPUS -m $MEM_IN_GB "spark://$MASTER_ADDR:7077"
	echo "Hello from worker $GLOBAL_RANK"

	sleep 10

	if [ $GLOBAL_RANK == 0 ]
	then
	# then start some script
	echo "hi"
	fi

	sleep 1000000