Skip to content

Instantly share code, notes, and snippets.

@bdashore3
Last active June 23, 2023 22:19
Show Gist options
  • Save bdashore3/24b6d1781e4879eb09e30c9d26f05c8c to your computer and use it in GitHub Desktop.
Save bdashore3/24b6d1781e4879eb09e30c9d26f05c8c to your computer and use it in GitHub Desktop.
Guide explaining how to build CuBLAS koboldcpp on windows

Building KoboldCpp with CuBLAS on Windows

Koboldcpp is a hybrid LLM model interface which involves the use of llamacpp + GGML for loading models shared on both the CPU and GPU. It can also be used to completely load models on the GPU. This interface tends to be used with OpenBLAS or CLBlast which uses frameworks such as OpenCL.

However, OpenCL can be slow and those with GPUs would like to use their own frameworks. This guide will focus on those with an Nvidia GPU that can run CUDA on Windows. Nvidia provides its own framework called CuBLAS. This can be used with koboldcpp!

Disclaimer

Please do not annoy the koboldcpp developers for help! Sometimes the CMakefile can go bad or things might break, but the devs are NOT responsible for having CuBLAS issues since they outright state that support is limited. Those rules also apply for this guide. Build at your own peril.

Shouldn't koboldcpp have a guide?

Well, the devs publish CUDA builds, but never really gave a straightforward windows guide to build from latest release since things can break and it's technically unsupported. I hope this guide fills that gap.

Prerequisites

Build the DLL

Using Visual Studio's GUI:

  1. Open the koboldcpp repo in Visual Studio

  2. Wait for the bottom terminal to finish configuring the project

  3. Open CMakeLists.txt in solution explorer

  4. Hit Build > Build All and wait for the build to finish

  5. Your dll file is located in out/build/x64-debug/bin/koboldcpp.dll

Using the commandline (x64 native tools command prompt for VS 2022)

  1. cd into koboldcpp's directory

  2. cmake .

  3. cmake --build .

  4. Your dll is located in bin/Debug/koboldcpp.dll

IMPORTANT: Put koboldcpp.dll in koboldcpp's root directory.

DLL Dependencies

Copy and paste these DLLs into koboldcpp's root directory (DO NOT CUT THESE!)

From CUDA (C:\Program Files\Nvidia GPU Computing Toolkit\CUDA\v11.7\bin):

  • cublas64_11.dll

  • cublasLt64_11.dll

  • cudart64_110.dll

From System32 (C:\Windows\System32)

  • vcruntime140.dll (rename to VCRUNTIME140.dll)

  • vcruntime140_1.dll

  • msvcp140.dll

Testing/Running from source

  1. Create a venv or conda environment and enter it

  2. Run python koboldcpp.py. If you have arguments, add them just like you would with the official binary.

  3. Select your normal settings and see if everything loads properly, you're using CuBLAS!

Packaging (optional)

  1. Enter your existing venv or create a new one

  2. Install the Pyinstaller package

  3. Run the command (from make_old_pyinstaller_cuda.bat):

PyInstaller --noconfirm --onefile --clean --console --icon "./niko.ico" --add-data "./klite.embd;." --add-data "./koboldcpp.dll;." --add-data "./cublas64_11.dll;." --add-data "./cublasLt64_11.dll;." --add-data "./cudart64_110.dll;." --add-data "./msvcp140.dll;." --add-data "./vcruntime140.dll;." --add-data "./vcruntime140_1.dll;." --add-data "./rwkv_vocab.embd;." --add-data "./rwkv_world_vocab.embd;." "./koboldcpp.py" -n "koboldcpp_CUDA_only.exe"

The packaged exe should be located in dist. Everything is now self-contained and can be run via only the exe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment