bdashore3/koboldcpp-CuBLAS-win.md

## koboldcpp-CuBLAS-win.md

      
    Raw
  

              koboldcpp-CuBLAS-win.md
            
          
    Building KoboldCpp with CuBLAS on Windows

Koboldcpp is a hybrid LLM model interface which involves the use of llamacpp + GGML for loading models shared on both the CPU and GPU. It can also be used to completely load models on the GPU. This interface tends to be used with OpenBLAS or CLBlast which uses frameworks such as OpenCL.
However, OpenCL can be slow and those with GPUs would like to use their own frameworks. This guide will focus on those with an Nvidia GPU that can run CUDA on Windows. Nvidia provides its own framework called CuBLAS. This can be used with koboldcpp!
Disclaimer

Please do not annoy the koboldcpp developers for help! Sometimes the CMakefile can go bad or things might break, but the devs are NOT responsible for having CuBLAS issues since they outright state that support is limited. Those rules also apply for this guide. Build at your own peril.
Shouldn't koboldcpp have a guide?

Well, the devs publish CUDA builds, but never really gave a straightforward windows guide to build from latest release since things can break and it's technically unsupported. I hope this guide fills that gap.
Prerequisites


CUDA toolkit 11.7


Microsoft Visual Studio 2022 - Install Desktop development with C++ and make sure C++ CMake tools for Windows is checked


Python 3.10 (Miniconda is recommended)


Koboldcpp repo - git clone it and cd into it


Build the DLL

Using Visual Studio's GUI:


Open the koboldcpp repo in Visual Studio


Wait for the bottom terminal to finish configuring the project


Open CMakeLists.txt in solution explorer


Hit Build > Build All and wait for the build to finish


Your dll file is located in out/build/x64-debug/bin/koboldcpp.dll


Using the commandline (x64 native tools command prompt for VS 2022)


cd into koboldcpp's directory


cmake .


cmake --build .


Your dll is located in bin/Debug/koboldcpp.dll


IMPORTANT: Put koboldcpp.dll in koboldcpp's root directory.
DLL Dependencies

Copy and paste these DLLs into koboldcpp's root directory (DO NOT CUT THESE!)
From CUDA (C:\Program Files\Nvidia GPU Computing Toolkit\CUDA\v11.7\bin):


cublas64_11.dll


cublasLt64_11.dll


cudart64_110.dll


From System32 (C:\Windows\System32)


vcruntime140.dll (rename to VCRUNTIME140.dll)


vcruntime140_1.dll


msvcp140.dll


Testing/Running from source


Create a venv or conda environment and enter it


Run python koboldcpp.py. If you have arguments, add them just like you would with the official binary.


Select your normal settings and see if everything loads properly, you're using CuBLAS!


Packaging (optional)


Enter your existing venv or create a new one


Install the Pyinstaller package


Run the command (from make_old_pyinstaller_cuda.bat):


PyInstaller --noconfirm --onefile --clean --console --icon "./niko.ico" --add-data "./klite.embd;." --add-data "./koboldcpp.dll;." --add-data "./cublas64_11.dll;." --add-data "./cublasLt64_11.dll;." --add-data "./cudart64_110.dll;." --add-data "./msvcp140.dll;." --add-data "./vcruntime140.dll;." --add-data "./vcruntime140_1.dll;." --add-data "./rwkv_vocab.embd;." --add-data "./rwkv_world_vocab.embd;." "./koboldcpp.py" -n "koboldcpp_CUDA_only.exe"

The packaged exe should be located in dist. Everything is now self-contained and can be run via only the exe.