Skip to content

Instantly share code, notes, and snippets.

@AaronJackson
Created December 12, 2018 21:58
Show Gist options
  • Save AaronJackson/beba7e1183ac4e34d5909aaae7155175 to your computer and use it in GitHub Desktop.
Save AaronJackson/beba7e1183ac4e34d5909aaae7155175 to your computer and use it in GitHub Desktop.
Slurm GPU status script, shows available GPUs by partition, plus node degredations
#!/bin/bash
regex='s/.*gpu=\([0-9]\).*/\1/p'
p=$(while IFS='=' read x partition ; do
IFS='=' read x nodes
echo $partition $nodes
done < <(scontrol show part | \
grep -e PartitionName -e ' Nodes'))
n=$(while read node ; do
read state
read avail
read alloc
node=$(echo $node | tr '=' ' ' | awk '{ print $2 }')
state=$(echo $state | sed -n 's/.*State=\([A-Z+]*\).*/\1/p')
avail=$(echo $avail | sed -n $regex)
alloc=$(echo $alloc | sed -n $regex)
if [ $state = "DOWN" ] || [[ $state == *"DRAIN"* ]]; then
continue
fi
echo $node ${avail:-0} ${alloc:-0}
done < <(scontrol show nodes | \
grep -e NodeName -e TRES -e State))
(
echo "+,+,,+,,+,,,,+"
echo "|,|,# GPU,|,Avail,|,Pairs,Triplets,Quads,|"
echo "+,+,,+,,+,,,,+"
while read name nodes ; do
echo -n "| "$name,"|",
pn=$(while read node ; do
echo "$n" | grep $node
done < <(echo "$nodes" | tr ',' '\n'))
avail=$(echo "$pn" | awk '{sum+=$2} END { print sum }')
singl=$(echo "$pn" | awk '{sum+=$2-$3} END { print sum }')
pairs=$(echo "$pn" | awk '{sum+=int(($2 - $3)/2)} END { print sum }')
trips=$(echo "$pn" | awk '{sum+=int(($2 - $3)/3)} END { print sum }')
quads=$(echo "$pn" | awk '{sum+=int(($2 - $3)/4)} END { print sum }')
echo $avail,"|",$singl,"|",$pairs,$trips,$quads,"|"
done < <(echo "$p")
echo "+,+,,+,,+,,,,+"
) | column -t -s',' | sed '/+/s/ /-/g'
reasons=$(scontrol show node | grep Reason)
if [ ! -z "$reasons" ]; then
echo
echo "Cluster Degraded:"
while read node ; do
read state
read reason
node=$(echo $node | tr '=' ' ' | awk '{ print $2 }')
reason=$(echo $reason | awk -F= '{ print $2 }')
state=$(echo $state | sed -n 's/.*State=\([A-Z+]*\).*/\1/p')
echo " $node ($state) $reason"
done < <(scontrol show node | \
grep -e NodeName -e Reason -e State | \
grep -B2 "^ Reason" )
echo
fi
+------------+---------+---------+--------------------------+
| | # GPU | Avail | Pairs Triplets Quads |
+------------+---------+---------+--------------------------+
| general | 27 | 7 | 2 1 0 |
| undergrad | 2 | 2 | 1 0 0 |
+------------+---------+---------+--------------------------+
Cluster Degraded:
gambit (MIXED+DRAIN) needs reboot [root@2018-12-12T09:35:48]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment