Skip to content

Instantly share code, notes, and snippets.

@lucidyan
Last active March 19, 2023 09:37
Show Gist options
  • Save lucidyan/4359b5973e5c3cee818595734c0ab7a9 to your computer and use it in GitHub Desktop.
Save lucidyan/4359b5973e5c3cee818595734c0ab7a9 to your computer and use it in GitHub Desktop.
Prevent NVIDIA GPUs' throttling on headless server

Prevent NVIDIA GPUs' throttling on headless server

  • Unlock manual fan & overclock settings
    sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration
  • Reboot system
  • Create script /usr/local/bin/gpu-fan-control.sh
#!/bin/bash

export DISPLAY=:0
export XAUTHORITY=/var/run/lightdm/root/:0

setFanSpeed() {
        eval "nvidia-settings -a [gpu:$2]/GPUFanControlState=1 -a [fan:$2]/GPUTargetFanSpeed=$1" > /dev/null
        echo "Updating fans speed to $1 on GPU $2"
}

cleanup() {
        eval "nvidia-settings -a [gpu:0]/GPUFanControlState=0"
        eval "nvidia-settings -a [gpu:1]/GPUFanControlState=0"
        exit
}

declare -i gpuTemp

# Set cleanup function (clean up and exit when interrupted)
trap cleanup 1 2 3 15 20

checkGpu(){
        #echo "Checking GPU $1"
        gpuTemp=$(nvidia-settings -q gpucoretemp | grep '^  Attribute' | grep "gpu:$1" | \
                head -n 1 | perl -pe 's/^.*?(\d+)\.\s*$/\1/;')
        echo "Current GPU $1 temperature: $gpuTemp"

        # Set GPU fan speed
        if   [ $gpuTemp -ge 80 ]; then
                setFanSpeed 100 $1
        elif [ $gpuTemp -ge 75 ]; then
                setFanSpeed 90 $1
        elif [ $gpuTemp -ge 70 ]; then
                setFanSpeed 75 $1
        elif [ $gpuTemp -ge 65 ]; then
                setFanSpeed 60 $1
        elif [ $gpuTemp -ge 60 ]; then
                setFanSpeed 50 $1
        else
                setFanSpeed 40 $1
        fi

}

while : # Loop
do
        checkGpu 0
        checkGpu 1
        #checkGpu 2
        #checkGpu 3
        # Interval
        sleep 5
done
  • Make our script executable
    chmod 744 /usr/local/bin/gpu-fan-control.sh

  • Create file /etc/systemd/system/gpu-fan-control.service

[Unit]
Description=Prevent GPU throttling under load

[Service]
ExecStart=/usr/local/bin/gpu-fan-control.sh

[Install]
WantedBy=multi-user.target
  • Make properly rights for service file
    chmod 664 /etc/systemd/system/gpu-fan-control.service

  • Activate our service (now it will run on startup)
    systemctl enable /etc/systemd/system/gpu-fan-control.service

  • Check service if needed (After that fans must run with 40% speed)
    systemctl start gpu-fan-control.service

@Randl
Copy link

Randl commented Dec 25, 2018

Notes:

  1. To check what to put in XAUTHORITY run ps a |grep X (see https://devtalk.nvidia.com/default/topic/1032741/linux/tuning-nvidia-settings-over-ssh-error/ )
  2. Current Ubuntu uses Wayland for which this doesn't work. The solution is to force gdm to use X (see https://askubuntu.com/questions/967955/ubuntu-17-10-on-wayland-how-can-i-install-the-nvidia-drivers )
  3. Note that GPU can have more than one fan, check with nvidia-settings -q fans:
setFanSpeed() {
        let f1=$2*2
        let f2=$2*2+1
        eval "nvidia-settings -a [gpu:$2]/GPUFanControlState=1 -a [fan:$f1]/GPUTargetFanSpeed=$1" > /dev/null
        eval "nvidia-settings -a [gpu:$2]/GPUFanControlState=1 -a [fan:$f2]/GPUTargetFanSpeed=$1" > /dev/null
        echo "Updating fans ($f1, $f2) speed to $1 on GPU $2"
}
  1. For me the service worked only when run as root, i.e., after adding User=root under [Service]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment