Skip to content

Instantly share code, notes, and snippets.

@treydock
Last active October 11, 2021 15:42
Show Gist options
  • Save treydock/d0f24c50c0c3029c958f5dc507710ede to your computer and use it in GitHub Desktop.
Save treydock/d0f24c50c0c3029c958f5dc507710ede to your computer and use it in GitHub Desktop.
SLURM GPU metrics
#!/bin/bash
source /etc/slurm/prolog-epilog.config
GPU_INFO_PROM=${METRICS_DIR}/slurm_job_gpu_info-${SLURM_JOB_ID}.prom
rm -f $GPU_INFO_PROM
exit 0
$ cat /etc/slurm/prolog.d/metrics.sh
#!/bin/bash
source /etc/slurm/prolog-epilog.config
if [ "x${CUDA_VISIBLE_DEVICES}" != "x" ]; then
GPU_INFO_PROM=${METRICS_DIR}/slurm_job_gpu_info-${SLURM_JOB_ID}.prom
cat > $GPU_INFO_PROM.$$ <<EOF
# HELP slurm_job_gpu_info GPU Assigned to a SLURM job
# TYPE slurm_job_gpu_info gauge
EOF
OIFS=$IFS
IFS=','
for gpu in $CUDA_VISIBLE_DEVICES ; do
echo "slurm_job_gpu_info{jobid=\"${SLURM_JOB_ID}\",gpu=\"${gpu}\"} 1" >> $GPU_INFO_PROM.$$
done
IFS=$OIFS
/bin/mv -f $GPU_INFO_PROM.$$ $GPU_INFO_PROM
fi
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment