agalea91/install_llama_mac.md

## install_llama_mac.md

      
    Raw
  

              install_llama_mac.md
            
          
    Follow this guide
https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md
Download a GGUF quantized model. Pick a recommended one, e.g. codellama-7b.Q5_K_M.gguf
Check how many GPU cores you have
https://www.reddit.com/r/macbook/comments/o3k9a1/comment/h2c9jmu/?utm_source=share&utm_medium=web2x&context=3
Run the server
python -m llama_cpp.server --model $MODEL --n_gpu_layers 14

View docs
http://localhost:8000/docs
Chat
>>> cat > chat.json
{
  "prompt": "USER: Tell me something interesting.\nASSISTANT:",
  "stop": ["USER:"]
}

>>> curl -X POST -H "Content-Type: application/json" -d @chat.json http://localhost:8000/v1/completions