Drain mode = no longer accept new incoming requests
lspci -k | grep -A 2 -E '(3D|VGA)'
# lspci -k | grep -A 2 -E '(3D|VGA)'
# 00:08.0 VGA compatible controller: NVIDIA Corporation GR666GL [GeForce GX 666] (rev a0)
# Kernel driver in use: nvidia
# Kernel modules: nvidiafb, nouveau, nvidia
# set persistence mode off
nvidia-smi --id 0000:xx:00.0 --persistence-mode 0
# set on/off drain mode
nvidia-smi drain --pciid 0000:xx:00.0 --modify 1
# nvidia-smi drain --pciid 0000:xx:00.0 --modify 1
# set persistence mode on
nvidia-smi --persistence-mode 1
Set target temperature and voltage
# display detailed info
nvidia-smi -q
# enable persistance mode = keep gpu driver loaded
nvidia-smi -pm 1
# Limit power usage
sudo nvidia-smi -i 0 -pl 300 # set 300W limit on gpu 0
sudo nvidia-smi -i 1 -gtt 78 # set target temperature on gpu 1
# Set fun speed
nvidia-settings -a "[gpu:0]/GPUFanControlState=1" # set control flag
nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=100" # 100% fun speed
- MIG - the GPU can be divided into smaller, isolated instances. Each GPU instance has its own dedicated GPU cores, memory, and cache.
- ECC mode = Error Correcting Code mode. ECC is a mechanism that helps detect and correct memory errors that may occur during the operation of a GPU.