Follow this guide https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md
Download a GGUF quantized model. Pick a recommended one, e.g. codellama-7b.Q5_K_M.gguf
Check how many GPU cores you have https://www.reddit.com/r/macbook/comments/o3k9a1/comment/h2c9jmu/?utm_source=share&utm_medium=web2x&context=3
Run the server