Skip to content

Instantly share code, notes, and snippets.

@Barafu
Last active July 14, 2024 01:56
Show Gist options
  • Save Barafu/cd65b0eefc360a30d0d5a9aad450fe0e to your computer and use it in GitHub Desktop.
Save Barafu/cd65b0eefc360a30d0d5a9aad450fe0e to your computer and use it in GitHub Desktop.
A Powershell script to find the maximum gpulayer value for a given model.
$min = 1
$max = 100
$exitCode = 0
if ($args.Length -lt 2) {
Write-Host "Usage: .\balance.ps1 <model> <contextSize>"
exit 1
}
$model = $args[0]
$contextSize = $args[1]
if (-not (Test-Path $model)) {
Write-Host "Model file does not exist: $model"
exit 1
}
while ($min -lt $max) {
$mid = [Math]::Floor(($min + $max) / 2)
$command = ".\koboldcpp.exe --model $model --usecublas --contextsize $contextSize --flashattention --quantkv 2 --benchmark --gpulayers $mid"
Write-Host "Running command: $command"
$process = Start-Process -FilePath powershell.exe -ArgumentList "-Command", $command -PassThru -Wait
$exitCode = $process.ExitCode
Write-Host "Exit code: $exitCode"
if ($exitCode -eq 0) {
$min = $mid + 1
} else {
$max = $mid
}
}
Write-Host "============================================"
Write-Host "Maximum possible value for --gpulayers: $($min - 1)"

Usage Manual for balance.ps1

Introduction

The balance.ps1 script is a PowerShell script that helps you find the maximum number of GPU layers that can be used with the KoboldCPP in a given set of conditions without causing an out-of-memory error.

Prerequisites

Before running the script, make sure:

  1. You have the permissions to run the scripts.
  2. The command $command (on line 20 of the script) matches exactly the way you launch KoboldCPP for use. Make sure it calls the same GPU API (CUDA? Vulkan?) and that you normally have cache quantization set to 4 bit (--quantkv 2). Edit the command as you need it.
  3. If you use Nvidia, be sure that in the driver settings for Nvidia you have "Sysmem Fallback Policy" set to "No Fallback" globally or for KoboldCPP.exe. It is not set by default, and failure to set it will lead to freezes when running this script.

Usage

To use the script, follow these steps:

  1. Place the Koboldcpp executable in the same directory as the balance.ps1 script.
  2. Ensure that the Koboldcpp model file is stored in a location of your choice.
  3. Open a PowerShell terminal.
  4. Navigate to the directory where the balance.ps1 script is located.
  5. Run the script with the following command:
    .\balance.ps1 <model_path> <contextSize>
    Replace <model_path> with the path to the Koboldcpp model file, and with the desired context size.
  6. The script will run and find the maximum number of GPU layers that can be used without causing an out-of-memory error.
@Barafu
Copy link
Author

Barafu commented Jul 12, 2024

If you need license, its MIT, aka "I don't care license".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment