Skip to content

Instantly share code, notes, and snippets.

@alecGraves
Last active May 7, 2023 22:38
Show Gist options
  • Save alecGraves/cb751a66126408e13663ae783a05e932 to your computer and use it in GitHub Desktop.
Save alecGraves/cb751a66126408e13663ae783a05e932 to your computer and use it in GitHub Desktop.

Install Vicuna (Windows)

This guide assumes you have a pretty beefy system. You probably want a Nvidia 3090 or 4090 GPU and 32 GB of RAM. You have to spend money to make money, as they say.

  1. Install and run Transmission QT client

  2. [CTRL] + [U] and paste the link: magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

    1. Select a folder to download into
    2. To only install LLaMA 7b and 13b:
    3. select the LLaMA download press [ALT] + [ENTER]
    4. Click on the Files tab
    5. check the checkboxes for 7B and 13B
  3. Install Python3 from the Windows store

  4. Open Windows Terminal

  5. pip3 install fschat

  6. Convert downloaded LLaMA into the HuggingFace format

    1. python3 -m transformers.models.llama.convert_llama_weights_to_hf --input_dir $HOME/Downloads/LLaMA --model_size 13B --output_dir $HOME/Downloads/LLaMA_13b_hf
      
  7. Apply the conversion script. This one will try to use approximately 60GB of memory. If you do not have that, you can rely on Windows to automatically page for you, or add the --low-cpu-mem option to split the work into 16GB pieces.

    1. python3 -m fastchat.model.apply_delta --base-model-path $HOME/Downloads/LLaMA_13b_hf --target-model-path $HOME/Downloads/LLaMA_hfvicuna-13b --delta-path lmsys/vicuna-13b-delta-v1.1 --low-cpu-mem
      
  8. Start the FastChat CLI (with --load-8bit for GPUs with less than 24 GB memory)

    1. python3 -m fastchat.serve.cli --model-path $HOME/Downloads/LLaMA_hfvicuna-13b --load-8bit
      
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment