alecGraves/install_vicuna_windows.md

## install_vicuna_windows.md

      
    Raw
  

              install_vicuna_windows.md
            
          
    Install Vicuna (Windows)

This guide assumes you have a pretty beefy system. You probably want a Nvidia 3090 or 4090 GPU and 32 GB of RAM. You have to spend money to make money, as they say.


Install and run Transmission QT client


[CTRL] + [U]  and paste the link: magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

Select a folder to download into
To only install LLaMA 7b and 13b:
select the LLaMA download press [ALT] + [ENTER]
Click on the Files tab
check the checkboxes for 7B and 13B


Install Python3 from the Windows store


Open Windows Terminal


pip3 install fschat


Convert downloaded LLaMA into the HuggingFace format


python3 -m transformers.models.llama.convert_llama_weights_to_hf --input_dir $HOME/Downloads/LLaMA --model_size 13B --output_dir $HOME/Downloads/LLaMA_13b_hf


Apply the conversion script. This one will try to use approximately 60GB of memory. If you do not have that, you can rely on Windows to automatically page for you, or add the --low-cpu-mem option to split the work into 16GB pieces.


python3 -m fastchat.model.apply_delta --base-model-path $HOME/Downloads/LLaMA_13b_hf --target-model-path $HOME/Downloads/LLaMA_hfvicuna-13b --delta-path lmsys/vicuna-13b-delta-v1.1 --low-cpu-mem


Start the FastChat CLI (with --load-8bit for GPUs with less than 24 GB memory)


python3 -m fastchat.serve.cli --model-path $HOME/Downloads/LLaMA_hfvicuna-13b --load-8bit