- Maybe read this: https://www.understandingai.org/p/large-language-models-explained-with
- Browse this guy's profile and understand that this site provides a way for users to share language models: https://huggingface.co/TheBloke
- Use linux, preferably Debian or Ubuntu
- Install build-essential, git, and make:
apt install build-essential make git
- You may need to install other dependencies needed to build software
- Create a project folder:
mkdir llama-project
- Navigate into the llama-project folder:
cd llama-project
- From inside the llama-project folder, create a model folder:
mkdir models
- From inside the llama-project folder, clone llama.cpp:
git clone https://github.com/ggerganov/llama.cpp.git
- We need two copies of llama.cpp because we want to run old models and the older version of the software lets us run older models more easily. So, rename the llama.cpp folder to llama.cpp-old:
mv llama.cpp llama.cpp-old
- From inside the llama-project folder, clone llama.cpp again:
git clone https://github.com/ggerganov/llama.cpp.git
- Navigate into the llama.cpp folder:
cd llama.cpp
- From inside the llama.cpp folder, take a moment to notice what this command tells you about the software branch. No need to overthink, just keep this in mind:
git status
- From inside the llama.cpp folder, take a moment to notice the dates and comments when you run this command. Again, don't overthink, just keep it in mind:
git log
(and press q to escape) - Anyway, from inside the llama.cpp folder, make the llama.cpp program:
make
- You may get errors, please search the internet for solutions to the errors, or ask chatgpt.
- Navigate back up to the llama-project folder and then down into the llama.cpp-old folder:
cd ../llama.cpp-old
- From inside the llama.cpp-old folder, checkout/create a new branch of an older copy of llama.cpp. These weird hashes come from some documentation I found online, and they reference a specific moment in time when llama.cpp ran older models easily:
git checkout dadbed99e65252d79f81101a392d0d6497b86caa -b my-dadbed99e65252d79f81101a392d0d6497b86caa
- From inside the llama.cpp-old folder, take a moment to notice the output of git status, what changed?:
git status
- From inside the llama.cpp-old folder, do the same with git log. Again, what changed? Notice the dates:
git log
(press q to escape) - We can now make the older version of llama.cpp:
make
Let's get one old model (GGML) and one new model (GGUF)
- Get an older-style model (GGML) by downloading from here. You can read about the various files in the "Provided files" section. Read about each file, and think about the one you want. In order to download the file, you actually download it from the "Files and versions" tab. You click on the file you want (it will end in .bin) and then click the download button: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML
- Place the model in the models folder, and copy its name to to use in step 3.
- Inside the llama.cpp-old folder run the model. Make sure to put your model name in the right spot. Also tweak the promot as you see fit. Also, read up on the model's website about how to turn it into interactive mode:
./main -t 10 -ngl 32 -m ../models/PUT_THE_MODEL_NAME_HERE --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 --in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -p "[INST] <<SYS>> You are a helpful, respectful and honest assistant. <</SYS>> Write a story about llamas. [/INST]"
- Get a newer model (GGUF) like this one, and place it in the models folder: https://huggingface.co/TheBloke/CodeLlama-13B-GGUF
- Inside the llama.cpp folder, you can run it with this command, but take a moment to note that I get these instructions from link in step 4. They have instructions on how to make it interactive if you follow the link in step 4:
./main -t 10 -ngl 32 -m ../models/YOUR_MODEL_HERE --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"