Skip to content

Instantly share code, notes, and snippets.

@agalea91
Last active November 15, 2023 18:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save agalea91/5ffae6f2ad104496844b79b41508e401 to your computer and use it in GitHub Desktop.
Save agalea91/5ffae6f2ad104496844b79b41508e401 to your computer and use it in GitHub Desktop.
Install LLaMA2 GPU on Mac Silicon

Follow this guide https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md

Download a GGUF quantized model. Pick a recommended one, e.g. codellama-7b.Q5_K_M.gguf

Check how many GPU cores you have https://www.reddit.com/r/macbook/comments/o3k9a1/comment/h2c9jmu/?utm_source=share&utm_medium=web2x&context=3

Run the server

python -m llama_cpp.server --model $MODEL --n_gpu_layers 14

View docs http://localhost:8000/docs

Chat

>>> cat > chat.json
{
  "prompt": "USER: Tell me something interesting.\nASSISTANT:",
  "stop": ["USER:"]
}

>>> curl -X POST -H "Content-Type: application/json" -d @chat.json http://localhost:8000/v1/completions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment