Skip to content

Instantly share code, notes, and snippets.

@automata
Last active November 29, 2023 19:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save automata/9a01e2d81797509e7d89911e8c7e8297 to your computer and use it in GitHub Desktop.
Save automata/9a01e2d81797509e7d89911e8c7e8297 to your computer and use it in GitHub Desktop.
Running finetuned Mistral 7B on MacBook Air M2 (~7 tok/sec)
# Instead of using llama.cpp directly, interface it through llama-cpp-python.
#
# Run:
# python3 -m venv venv
# source venv/bin/activate
# CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python
#
from llama_cpp import Llama
if __name__ == "__main__":
model = "mistral-7b-instruct-v0.1.Q6_K.gguf"
llm = Llama(
model_path=model,
n_ctx=8192,
n_batch=512,
n_threads=7,
n_gpu_layers=2,
verbose=True,
seed=1337
)
system = """
You are a Python coder. You just display the code, no need to explain it.
"""
user = """
Create a Python script to calculate the fibonacci sequence for any given number.
"""
message = f"<s>[INST] {system} [/INST]</s>{user}"
output = llm(message, echo=True, stream=False, max_tokens=4096)
print(output['usage'])
output = output['choices'][0]['text']
print(output)
wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q6_K.gguf
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cmake .. -DCMAKE_APPLE_SILICON_PROCESSOR=arm64
make -j
./build/bin/main --color --model "mistral-7b-instruct-v0.1.Q6_K.gguf" -t 7 -b 24 -n -1 --temp 0 -ngl 1 -ins
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment