pip install transformers torch
git clone https://huggingface.co/EleutherAI/gpt-j-6b # depends on git-lfs
from transformers import AutoTokenizer, pipeline
model_dir=<path to cloned model>
tokenizer = AutoTokenizer.from_pretrained(model_dir)
generator_pipe = pipeline('text-generation', model=model_dir, tokenizer=tokenizer)
output = generator_pipe("I love the Avengers", max_length=30, num_return_sequences=1)
print(output)
For large model, the pytorch checkpoint will be split into multiple files during cloning, Running a pipeline using it will decompress them into one single pytorch_model.bin
import torch
sd=torch.load("/path/to/pytorch.bin")
# Number of parameters by layer
for k, v in sd.items():
print(f'{v.numel():10} | {k}')
8-bit quantized LLAMA
git clone https://huggingface.co/mit-han-lab/opt-30b-smoothquant
see model list here and change accordingly. , official+paper.3/4-bit quantized LLM torch checkpoint
git clone https://huggingface.co/datasets/mit-han-lab/awq-model-zoo
see model list here, official+paperThe cloned files are the optimized quantization hyperparameters of respective model,.
To realize the weight quantization, follow step 3 of the usage section.