This guide assumes you have a pretty beefy system. You probably want a Nvidia 3090 or 4090 GPU and 32 GB of RAM. You have to spend money to make money, as they say.
-
Install and run Transmission QT client
-
[CTRL] + [U]
and paste the link:magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
- Select a folder to download into
- To only install LLaMA 7b and 13b:
- select the LLaMA download press
[ALT] + [ENTER]
- Click on the Files tab
- check the checkboxes for 7B and 13B
-
Install Python3 from the Windows store
-
Open Windows Terminal
-
pip3 install fschat
-
Convert downloaded LLaMA into the HuggingFace format
-
python3 -m transformers.models.llama.convert_llama_weights_to_hf --input_dir $HOME/Downloads/LLaMA --model_size 13B --output_dir $HOME/Downloads/LLaMA_13b_hf
-
-
Apply the conversion script. This one will try to use approximately 60GB of memory. If you do not have that, you can rely on Windows to automatically page for you, or add the
--low-cpu-mem
option to split the work into 16GB pieces.-
python3 -m fastchat.model.apply_delta --base-model-path $HOME/Downloads/LLaMA_13b_hf --target-model-path $HOME/Downloads/LLaMA_hfvicuna-13b --delta-path lmsys/vicuna-13b-delta-v1.1 --low-cpu-mem
-
-
Start the FastChat CLI (with
--load-8bit
for GPUs with less than 24 GB memory)-
python3 -m fastchat.serve.cli --model-path $HOME/Downloads/LLaMA_hfvicuna-13b --load-8bit
-