Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save eramax/052a665742218c439bfe2d38e1835837 to your computer and use it in GitHub Desktop.
Save eramax/052a665742218c439bfe2d38e1835837 to your computer and use it in GitHub Desktop.
zephyr-7b-beta-gptq-transformers
!pip install transformers optimum
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "TheBloke/zephyr-7B-beta-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=False,
revision="main")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,
use_fast=True)
prompt = "What the dog doin?"
prompt_template=f'''<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>
'''
input_ids = tokenizer(prompt_template,
return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids,
do_sample=True,
max_new_tokens=512)
print(tokenizer.decode(output[0]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment