Skip to content

Instantly share code, notes, and snippets.

@DhruvaBansal00
Created May 9, 2024 20:29
Show Gist options
  • Save DhruvaBansal00/422cc5f266227c1b3fd396c45799f505 to your computer and use it in GitHub Desktop.
Save DhruvaBansal00/422cc5f266227c1b3fd396c45799f505 to your computer and use it in GitHub Desktop.
Running inference with Llama 3 Refueled
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "refuelai/Llama-3-Refueled"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
messages = [{"role": "user", "content": "Is this comment toxic or non-toxic: RefuelLLM is the new way to label text data!"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
outputs = model.generate(inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment