llama.cpp version: https://github.com/ggerganov/llama.cpp/commit/925e5584a058afb612f9c20bc472c130f5d0f891
LLM: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q4_K_M.gguf
llama-bench -m ../models/llama-2-7b-chat.Q4_K_M.gguf
model | size | params | backend | threads | test | t/s |
---|