visualDust/convert llama2 weights to huggingface.md

## convert llama2 weights to huggingface.md

      
    Raw
  

              convert llama2 weights to huggingface.md
            
          
    Download and convert llama-7b model to huggingface format

This is a refined version of huggingface/LLaMA2 Usage tips.
Go to somewhere you want to put the original llama2 7b model weights, maybe in the download folder. Create a folder called LLaMA:
cd ~/downloads (or maybe somewhere else)
mkdir LLaMA
cd LLaMA
Clone the model weights, it takes about 15GB disk space:
git lfs clone https://huggingface.co/meta-llama/Llama-2-7b
Once you finish, rename the folder to 7B, and copy tokenizer.model from inner 7B folder out:
mv Llama-2-7b 7B
cp ./7B/tokenizer.model ./
go back to where want to put converted version of model weights:
cd ~/huggingface (or maybe somewhere else as you wish)
mkdir llama2-7b-base-hf
activate an virtual environment with required libraries installed:
conda activate your_env_name
pip install torch transformers tokenizers protobuf sentencepiece accelerate flash_attn bitsandbytes datasets
convert the model:
python -m transformers.models.llama.convert_llama_weights_to_hf --input_dir ~/downloads/LLaMA --model_size 7B --llama_version 2 --output_dir ~/huggingface/llama2-7b-base-hf
Here we use previously created directories, you should replace input_dir and output_dir with your path, just in case.
Once you've done, try to load model from the output_dir folder:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("/home/zgong6/repos/experimental/llama2-hf/7b-base")
model.cuda() # it takes about 26GB GPU memory to load the weight on GPU, or you can remove this line to load the weight with CPU.
tokenizer = AutoTokenizer.from_pretrained("/home/zgong6/repos/experimental/llama2-hf/7b-base")
Issue an inference to see if its working well:
inputs = tokenizer("this is a story,", return_tensors='pt').to('cuda')
output = model.generate(**inputs)

print(tokenizer.batch_decode(output))
check your result and enjoy.