Llama-cpp-python Installation procedure
# HW
0. nVidia RTX 4090 (551.86 - latest or close to one) drivers, Intel CPU gen13, 64Gb RAM DDR5, Win11
Command prompt CLI
NVIDIA-SMI 551.86 Driver Version: 551.86 CUDA Version: 12.4
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| 0 NVIDIA GeForce RTX 4090 WDDM | 00000000:01:00.0 On | Off |
| 0% 51C P8 5W / 450W | 1028MiB / 24564MiB | 1% Default |
| | | N/A |
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
| 0 N/A N/A 1048 C+G ...m Files\Mozilla Firefox\firefox.exe N/A |
| 0 N/A N/A 1208 C+G ...Data\Local\Programs\Opera\opera.exe N/A |
| 0 N/A N/A 1832 C ...les\LibreOffice\program\soffice.bin N/A |
| 0 N/A N/A 1852 C+G ...__8wekyb3d8bbwe\WindowsTerminal.exe N/A |
| 0 N/A N/A 2020 C+G ...__8wekyb3d8bbwe\Notepad\Notepad.exe N/A |
| 0 N/A N/A 3520 C+G ...siveControlPanel\SystemSettings.exe N/A |
| 0 N/A N/A 4344 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 N/A N/A 8048 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 10320 C+G ...nt.CBS_cw5n1h2txyewy\SearchHost.exe N/A |
| 0 N/A N/A 10348 C+G ...2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 11556 C+G ...US\ArmouryDevice\asus_framework.exe N/A |
| 0 N/A N/A 12644 C+G ...t.LockApp_cw5n1h2txyewy\LockApp.exe N/A |
| 0 N/A N/A 15200 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 18456 C+G ...GeForce Experience\NVIDIA Share.exe N/A |
| 0 N/A N/A 18468 C+G C:\Program Files\Joplin\Joplin.exe N/A |
# SW
1. ==Visual Studio (Updated)==
A. ==Visual Studio Build Tools 2022==
* Desktop Development with C++
* Included
+ C++ Build Tools core features
+ C++ 2022 Redestributable Update
+ C++ core desktop features
* Optional
+ MSVC v143 - VS 2022 C++ x64/x86 biold t..
+ Win11 SDK (10.0.22621.0)
+ C++ CMake tools for Windows
+ Testing tools core features - Build Tools
+ C++ AdressSanitizer
+ C++/CLI support for v143 build tools (Latest)
* Individual components
+ .NET Framework 4.8 SDK
+ .NET Framework 4.7.2 targeting pack
+ C++/CLI support for v143 build tools (Latest)
+ Container debelopment tools
+ MSBuild support for LLVM (clang-cl) toolset
+ Node.js MSBuild support
+ MVSC v142 - VS 2019 C++ x64/x86 build tools
B. ==Visual Studio Community 2019==
* Visual Studio core editor
* Python development
+ Python language support
+ Python web support
+ Live share
* Node.js development
+ Node.js development tools
+ Javascript & Typescript language support
+ Javascript diagnostics
+ Web deploy
+ Live share
+ IntelliCode
+ Connectivity and publishing tools
+ Developer analytics tools
+ C++ core features
+ MSVC v142 - VS 2019 C++ x64/x86 build t...
* Desktop Development with C++
+ C++ core desktop system
+ MSVC v142 - VS 2019 C++ x64/x86 build t...
+ Windows 10 SDK
+ Just-in-time debugger
+ C++ profiling tools for Windows
+ C++ CMake tools for Windows
+ C++ ATL for latest v142 build tools (x86/..)
+ Test Adapter for Boost Test
+ Test Adapter for Google Test
+ Live Share
+ IntelliCode
+ MSVC v142 - VS 2019 C++ ARM64 build t..
+ Javascript diagnostics
+ Windows 11 SDK (10.0.22000.0)
+ MSVC v141 - VS 2017 C++ x64/x86 build tools
* Universal Windows Platform development
+ Blend for Visual Studio
+ .NET Native & .NET Standard
+ NuGet package manager
+ Universal Windows Platform tools
+ Windows 10 SDK
+ IntelliCode
+ .NET SDK (out of support)
+ C++ (v142) Universal Windows platform to..
+ C++ (v141) Universal Windows platform to..
+ Graphics debugger
+ Windows 11 SDK
* Linux development with C++
+ C++ core features
+ C++ for Linux Development
+ C++ CMake tools for Linux
+ IntelliCode
* Individual components
+ TypeScript 4.3 SDK
+ JavaScript and TypeScript language support
+ Connectivity and publishing tools
+ Web Deploy
+ Python language support
+ Python web support
2. ==Git (latest)==
3. ==Anaconda (updated)==
(I install separate Python versions in conda virtual environments, usually 3.11)
`conda update -n base -c defaults conda`
`echo %errorlevel%`
5. CMake, MSYS2 MINGW64, w64devkit-1.21.0 - all in "C:\xAppz\" folder. I tryed, but don't know how to use any of them..
6. Node.js - in C:\xAppz\ folder
7. Windows11 Evironment variables - ==System / PATH==
Win11 env variables see CUDA (bin, libvvp, include, x64), cuDNN, VS Studio 2019...
* C:\xAppz\Anaconda3\condabin;
* C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;
* C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;
* C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include;
* C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64;
* C:\xAppz\cuDNN\bin;
* C:\xAppz\cuDNN\include;
* C:\xAppz\cuDNN\lib\x64;
* C:\Program Files\dotnet;
* C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64;
* C:\xAppz\Git\cmd;
* C:\xAppz\Git\Git LFS;
* C:\xAppz\nodejs;
* C:\Users\user\miniconda3;
* C:\Users\user\miniconda3\Library\mingw-w64\bin;
* C:\Users\user\miniconda3\Library\usr\bin;
* C:\Users\user\miniconda3\Library\bin;
* C:\Users\user\miniconda3\Scripts;
* C:\xAppz\GnuWin32\bin;
* C:\xAppz\gs\gs10.02.1\bin;
* C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler;
* ....other
8. ==Windows11 Evironment variables - ==User / PATH==
* C:\xAppz\Cmake\bin
* C:\xAppz\mysys64\mingw64\bin
* C:\xAppz\Microsoft VS Code\bin
* C:\xAppz\miniconda3
* C:\xAppz\miniconda3\Library\mingw-w64\bin
* C:\xAppz\miniconda3\Library\usr\bin
* C:\xAppz\miniconda3\Library\bin
* C:\xAppz\miniconda3\Scripts\
* %USERPROFILE%\.dotnet\tools
* ...other
Note: I installed miniconda3 in C:\xAppz\miniconda3 (User path) for sure, and I assume Pinokio installed miniconda3 on System Path.
# Llama-cpp-python installation
Everything is installed through anaconda prompt, I use VSC after installation only to write scripts
9. conda prompt - create vEnv
conda deactivate
# conda config --set restore_free_channel_defaults true # optional
conda update conda
conda --no-plugins env list
# conda environments:
cu118 C:\path_to_anaconda\envs\cu118
base C:\xAppz\Anaconda3
* D:\LLM\vLlamaCppPython
conda create --prefix D:\LLM\vLlamaCppPython python=3.11 -y
conda activate D:\LLM\vLlamaCppPython
echo %errorlevel% # 0
pip check # No broken requirements found.
d: && cd D:\LLM\vLlamaCppPython
10. Installed CUDA12.1 succesfully - HOME & PATH variable are ok
pip install --upgrade setuptools pip wheel
pip install nvidia-pyindex
echo %CUDA_HOME%
conda --no-plugins env config vars set CUDA_HOME="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" -p "D:\LLM\vLlamaCppPython"
conda deactivate
conda activate D:\LLM\vLlamaCppPython
echo %CUDA_HOME% # C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
nvidia-smi && nvcc --version && where nvcc
(D:\LLM\vLlamaCppPython) D:\LLM\vLlamaCppPython>nvcc --version && where nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe
Echo %Path% # PATH is listed, but I set it again - just in case
set PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin; %PATH%
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip check
# torch 2.3.0 requires fsspec, which is not installed.
# torch 2.3.0 requires mkl, which is not installed.
pip install mkl==2021.4.0 fsspec
pip check # No broken requirements found.
## python CLI
`where python && python --version`
(D:\LLM\vLlamaCppPython) D:\LLM\vLlamaCppPython>where python && python --version
C:\Program Files\LibreOffice\program\python.exe
Python 3.11.9
import torch
torch.cuda.is_available() # True
torch.cuda.device_count() # 1
torch.cuda.device(0) # <torch.cuda.device object at 0x000002C3092A1210>
torch.cuda.get_device_name(0) # 'NVIDIA GeForce RTX 4090'
print(torch.__version__) # 2.3.0
print(torch.tensor([1.0, 2.0]).cuda()) # tensor([1., 2.], device='cuda:0')
x = torch.rand(5, 3)
print(x) # it works
...And here start the challenge
[PyPI - llama-cpp-python 0.2.69](
* pip install llama-cpp-python
[ggerganov llama-cpp repo](
* Windows (via CMake)
* Make sure to have the CUDA toolkit installed. - CUDA Toolkit has been previously installed - v12.1
* repo lists latest release - " (413MB)" - don't know what to do with this..
* echo %CUDA_HOME% - have the same response as before
* echo %PATH% - have the same response as before
* nvcc --version -- have the same response as before
echo %CMAKE_ARGS% # returns -DLLAMA_CUBLAS=on
Also installed scikit-build-core, but didn't made any difference..
pip install scikit-build-core
All versions brak at CMake...
* Version A.
pip install llama-cpp-python --verbose --force-reinstall --no-cache-dir --upgrade
* Version B.
pip install llama-cpp-python --verbose --force-reinstall --upgrade
* Version C.
pip install llama-cpp-python --verbose --force-reinstall
* Version D.
pip install llama-cpp-python
* Version E.
pip install llama-cpp-python==0.2.69 --verbose --force-reinstall
set CMAKE_ARGS="-DLLAMA_CUBLAS=on -DLLAMA_CUDA=on" && set FORCE_CMAKE=1 && pip install --verbose --force-reinstall --no-cache-dir --upgrade llama-cpp-python
* Version F.
pip install llama_cpp_python_cuda_tensorcores-0.2.65+cu121-cp311-cp311-win_amd64.whl
(D:\LLM\vLlamaCppPython) D:\LLM\vLlamaCppPython>pip install llama-cpp-python --verbose --force-reinstall --no-cache-dir --upgrade
Using pip 24.0 from D:\LLM\vLlamaCppPython\Lib\site-packages\pip (python 3.11)
Looking in indexes:,
Collecting llama-cpp-python
Downloading llama_cpp_python-0.2.69.tar.gz (42.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.5/42.5 MB 38.4 MB/s eta 0:00:00
Running command pip subprocess to install build dependencies
Looking in indexes:,,
Collecting scikit-build-core>=0.9.2 (from scikit-build-core[pyproject]>=0.9.2)
Downloading scikit_build_core-0.9.3-py3-none-any.whl.metadata (19 kB)
Collecting packaging>=21.3 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting pathspec>=0.10.1 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading pathspec-0.12.1-py3-none-any.whl.metadata (21 kB)
Downloading scikit_build_core-0.9.3-py3-none-any.whl (151 kB)
---------------------------------------- 151.6/151.6 kB 8.8 MB/s eta 0:00:00
Downloading packaging-24.0-py3-none-any.whl (53 kB)
---------------------------------------- 53.5/53.5 kB ? eta 0:00:00
Downloading pathspec-0.12.1-py3-none-any.whl (31 kB)
Installing collected packages: pathspec, packaging, scikit-build-core
Successfully installed packaging-24.0 pathspec-0.12.1 scikit-build-core-0.9.3
Installing build dependencies ... done
Running command Getting requirements to build wheel
Getting requirements to build wheel ... done
Running command pip subprocess to install backend dependencies
Looking in indexes:,,
Collecting cmake>=3.21
Downloading cmake-3.29.2-py3-none-win_amd64.whl.metadata (6.1 kB)
Downloading cmake-3.29.2-py3-none-win_amd64.whl (36.2 MB)
---------------------------------------- 36.2/36.2 MB 36.4 MB/s eta 0:00:00
Installing collected packages: cmake
Successfully installed cmake-3.29.2
Installing backend dependencies ... done
Running command Preparing metadata (pyproject.toml)
*** scikit-build-core 0.9.3 using CMake 3.29.2 (metadata_wheel)
Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
Obtaining dependency information for typing-extensions>=4.5.0 from
Downloading typing_extensions-4.11.0-py3-none-any.whl.metadata (3.0 kB)
Link requires a different Python (3.11.9 not in: '>=3.7,<3.11'): (from (requires-python:>=3.7,<3.11)
Link requires a different Python (3.11.9 not in: '>=3.7,<3.11'): (from (requires-python:>=3.7,<3.11)
Link requires a different Python (3.11.9 not in: '>=3.7,<3.11'): (from (requires-python:>=3.7,<3.11)
Link requires a different Python (3.11.9 not in: '>=3.7,<3.11'): (from (requires-python:>=3.7,<3.11)
Link requires a different Python (3.11.9 not in: '>=3.7,<3.11'): (from (requires-python:>=3.7,<3.11)
Collecting numpy>=1.20.0 (from llama-cpp-python)
Obtaining dependency information for numpy>=1.20.0 from
Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 3.2 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama-cpp-python)
Obtaining dependency information for diskcache>=5.6.1 from
Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting jinja2>=2.11.3 (from llama-cpp-python)
Obtaining dependency information for jinja2>=2.11.3 from
Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama-cpp-python)
Obtaining dependency information for MarkupSafe>=2.0 from
Downloading MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl.metadata (3.1 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB ? eta 0:00:00
Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB ? eta 0:00:00
Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.8/15.8 MB 38.4 MB/s eta 0:00:00
Downloading typing_extensions-4.11.0-py3-none-any.whl (34 kB)
Downloading MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl (17 kB)
Building wheels for collected packages: llama-cpp-python
Running command Building wheel for llama-cpp-python (pyproject.toml)
*** scikit-build-core 0.9.3 using CMake 3.29.2 (wheel)
*** Configuring CMake...
2024-05-06 22:22:03,200 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None
loading initial cache file C:\Users\user\AppData\Local\Temp\tmp3299cb5m\build\CMakeInit.txt
-- Building for: Visual Studio 17 2022
-- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.22631.
-- The C compiler identification is MSVC 19.39.33523.0
-- The CXX compiler identification is MSVC 19.39.33523.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: C:/xAppz/Git/cmd/git.exe (found version "")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
CMake Warning at vendor/llama.cpp/CMakeLists.txt:387 (message):
LLAMA_CUBLAS is deprecated and will be removed in the future.
Use LLAMA_CUDA instead
-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/include (found version "12.1.66")
-- CUDA found
CMake Error at C:/Users/user/AppData/Local/Temp/pip-build-env-k68w0ekk/normal/Lib/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCompilerId.cmake:563 (message):
No CUDA toolset found.
Call Stack (most recent call first):
C:/Users/user/AppData/Local/Temp/pip-build-env-k68w0ekk/normal/Lib/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
C:/Users/user/AppData/Local/Temp/pip-build-env-k68w0ekk/normal/Lib/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
C:/Users/user/AppData/Local/Temp/pip-build-env-k68w0ekk/normal/Lib/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCUDACompiler.cmake:131 (CMAKE_DETERMINE_COMPILER_ID)
vendor/llama.cpp/CMakeLists.txt:398 (enable_language)
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
error: subprocess-exited-with-error
× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: 'D:\LLM\vLlamaCppPython\python.exe' 'D:\LLM\vLlamaCppPython\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\' build_wheel 'C:\Users\user\AppData\Local\Temp\tmpmoos4jo4'
cwd: C:\Users\user\AppData\Local\Temp\pip-install-d01agesv\llama-cpp-python_e46cc363b4ac47929d1f8cb8dbfc97e6
Building wheel for llama-cpp-python (pyproject.toml) ... error
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
