Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Created May 12, 2024 19:42
Show Gist options
  • Save bigsnarfdude/e7e6910ce36d383ca0f4528921e4427b to your computer and use it in GitHub Desktop.
Save bigsnarfdude/e7e6910ce36d383ca0f4528921e4427b to your computer and use it in GitHub Desktop.
llama3_instruct_inference.py
import transformers
import torch
from huggingface_hub import login
login(token = '')
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.float16},
device_map="mps",
)
messages = [
{"role": "user", "content": """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
How is AI similar to the Industrial Revolution?[/INST]"""},
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
prompt,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment