Koboldcpp is a hybrid LLM model interface which involves the use of llamacpp + GGML for loading models shared on both the CPU and GPU. It can also be used to completely load models on the GPU. This interface tends to be used with OpenBLAS or CLBlast which uses frameworks such as OpenCL.
However, OpenCL can be slow and those with GPUs would like to use their own frameworks. This guide will focus on those with an Nvidia GPU that can run CUDA on Windows. Nvidia provides its own framework called CuBLAS. This can be used with koboldcpp!
Please do not annoy the koboldcpp developers for help! Sometimes the CMakefile can go bad or things might break, but the devs are NOT responsible for having CuBLAS issues since they outright state that support is limited. Those rules also apply for this guide. Build at your own peril.
Well, the devs publish CUDA builds, but never really gave a straightforward windows guide to build from latest release since things can break and it's technically unsupported. I hope this guide fills that gap.
-
Microsoft Visual Studio 2022 - Install
Desktop development with C++
and make sureC++ CMake tools for Windows
is checked -
Python 3.10 (Miniconda is recommended)
-
Koboldcpp repo -
git clone
it and cd into it
-
Open the koboldcpp repo in Visual Studio
-
Wait for the bottom terminal to finish configuring the project
-
Open
CMakeLists.txt
in solution explorer -
Hit
Build > Build All
and wait for the build to finish -
Your dll file is located in
out/build/x64-debug/bin/koboldcpp.dll
-
cd
into koboldcpp's directory -
cmake .
-
cmake --build .
-
Your dll is located in
bin/Debug/koboldcpp.dll
IMPORTANT: Put koboldcpp.dll
in koboldcpp's root directory.
Copy and paste these DLLs into koboldcpp's root directory (DO NOT CUT THESE!)
From CUDA (C:\Program Files\Nvidia GPU Computing Toolkit\CUDA\v11.7\bin
):
-
cublas64_11.dll
-
cublasLt64_11.dll
-
cudart64_110.dll
From System32 (C:\Windows\System32
)
-
vcruntime140.dll (rename to
VCRUNTIME140.dll
) -
vcruntime140_1.dll
-
msvcp140.dll
-
Create a venv or conda environment and enter it
-
Run
python koboldcpp.py
. If you have arguments, add them just like you would with the official binary. -
Select your normal settings and see if everything loads properly, you're using CuBLAS!
-
Enter your existing venv or create a new one
-
Install the
Pyinstaller
package -
Run the command (from
make_old_pyinstaller_cuda.bat
):
PyInstaller --noconfirm --onefile --clean --console --icon "./niko.ico" --add-data "./klite.embd;." --add-data "./koboldcpp.dll;." --add-data "./cublas64_11.dll;." --add-data "./cublasLt64_11.dll;." --add-data "./cudart64_110.dll;." --add-data "./msvcp140.dll;." --add-data "./vcruntime140.dll;." --add-data "./vcruntime140_1.dll;." --add-data "./rwkv_vocab.embd;." --add-data "./rwkv_world_vocab.embd;." "./koboldcpp.py" -n "koboldcpp_CUDA_only.exe"
The packaged exe should be located in dist. Everything is now self-contained and can be run via only the exe.