Skip to content

Instantly share code, notes, and snippets.

@lmmx
Last active February 7, 2023 20:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lmmx/6ed7f7fd610da3605bc699a896950ceb to your computer and use it in GitHub Desktop.
Save lmmx/6ed7f7fd610da3605bc699a896950ceb to your computer and use it in GitHub Desktop.
Running GPT-NeoX-20B (H/T small usage tip in https://github.com/huggingface/accelerate/issues/938)
conda create -n gptneox python numpy -y
conda activate gptneox
conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia -y
pip install transformers accelerate bitsandbytes
from transformers import AutoModelForCausalLM, AutoTokenizer
weights_path = "EleutherAI/gpt-neox-20b"
tokenizer = AutoTokenizer.from_pretrained(weights_path)
model = AutoModelForCausalLM.from_pretrained(
weights_path,
device_map="auto",
offload_folder="/tmp/gpt-neox-20b-offload-accelerate",
)
prompt = input("> Prompt: ") # "Huggingface is"
input_tokenized = tokenizer(prompt, return_tensors="pt")
output = model.generate(input_tokenized["input_ids"].to(0), do_sample=True)
output_text = tokenizer.decode(output[0].tolist())
print(output_text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment