The balance.ps1
script is a PowerShell script that helps you find the maximum number of GPU layers that can be used with the KoboldCPP in a given set of conditions without causing an out-of-memory error.
Before running the script, make sure:
- You have the permissions to run the scripts.
- The command
$command
(on line 20 of the script) matches exactly the way you launch KoboldCPP for use. Make sure it calls the same GPU API (CUDA? Vulkan?) and that you normally have cache quantization set to 4 bit (--quantkv 2
). Edit the command as you need it. - If you use Nvidia, be sure that in the driver settings for Nvidia you have "Sysmem Fallback Policy" set to "No Fallback" globally or for KoboldCPP.exe. It is not set by default, and failure to set it will lead to freezes when running this script.
To use the script, follow these steps:
- Place the Koboldcpp executable in the same directory as the
balance.ps1
script. - Ensure that the Koboldcpp model file is stored in a location of your choice.
- Open a PowerShell terminal.
- Navigate to the directory where the
balance.ps1
script is located. - Run the script with the following command:
Replace <model_path> with the path to the Koboldcpp model file, and with the desired context size.
.\balance.ps1 <model_path> <contextSize>
- The script will run and find the maximum number of GPU layers that can be used without causing an out-of-memory error.
If you need license, its MIT, aka "I don't care license".