Skip to content

Instantly share code, notes, and snippets.

@navjack
Last active June 10, 2023 14:10
Show Gist options
  • Save navjack/32197772df1c0a8dbb8628676bc4e25a to your computer and use it in GitHub Desktop.
Save navjack/32197772df1c0a8dbb8628676bc4e25a to your computer and use it in GitHub Desktop.
Local GPT-J 8-Bit on WSL 2

Local GPT-J 8-Bit on WSL 2

This should would on GPUs with as little as 8GB of ram but in practice I've seen usage go up to 9-10GB

I have only personally tested this to be functional in WSL 2 and Windows 11's latest Dev preview build. Attempts to run natively in Windows didn't work but I won't stop you from trying.

I have personally backed up any possibly one day could be lost to time remote files. I could provide those if needed.

Now, why is this neat? Why is this cool?

image

I'm not sure how much LAMBADA is lost with the optimization done to GPT-J 6B to make it work in such small memory footprint but this should be way better than the previous best easy to run at home model, GPT-2 1.5B.

Prereqs:

  • WSL 2
  • Ubuntu installed in WSL 2
  • CUDA Toolkit 11.6 installed in Windows
  • Latest Anaconda3 installed in your WSL 2 distro (Ubuntu in my case: Anaconda3-2022.10-Linux-x86_64.sh)

You can easily install Anaconda3 with (in your WSL 2 distro):

wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh

and then

./Anaconda3-2022.10-Linux-x86_64.sh

Run these commands in Windows Terminal:

  1. Make a directory called gpt-j and then CD to it. Note that the bulk of the data is not stored here and is instead stored in your WSL 2's Anaconda3 envs folder.
  2. cd "C:\gpt-j"
  3. wsl

Once the WSL 2 terminal boots up:

  1. conda create -n gptj python=3.8
  2. conda activate gptj
  3. conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
  4. pip uninstall -y transformers && pip install --no-cache-dir https://github.com/deniskamazur/transformers/archive/gpt-j-8bit.zip
  5. pip install bitsandbytes-cuda111
  6. pip install datasets==1.16.1
  7. Make a file called prompt.py and put the code below into it
import torch
import transformers
from transformers.models.gptj import GPTJForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = transformers.AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6b")
gpt = GPTJForCausalLM.from_pretrained(
    "hivemind/gpt-j-6B-8bit", low_cpu_mem_usage=True
).to(device)
raw_text = open("prompts/delandzombie.txt", "r").read()
text = raw_text
prompt = tokenizer((raw_text), return_tensors="pt")
prompt = {key: value.to(device) for key, value in prompt.items()}
out = gpt.generate(
    **prompt,
    do_sample=True,
    temperature=1.03,
    top_k=500,
    top_p=0.98,
    max_new_tokens=200,
)
out = tokenizer.decode(out[0])
text = out
print(
    "\n",
    "\n",
    str(text),
    "\n",
    "\n",
    end="",
)
raw_text += text
output = open("out.txt", "a")
output.write(
    str(text)
    + "\n"
    + "\n"
    + "------"
    + "\n"
    + "\n"
)
output.close()
  1. Make a directory called prompts
  2. Put the text files you want to be your prompt starters into this folder
  3. Make a file called delandzombie.txt in the prompts folder and put the text below into the file
This is a conversation between Del the Dark Human Male and a Zombie.
Del: GIMME A DAGGER.
Zombie: Awt SHUT UP, DEL.
Del:

Running the thing!

If everything went according to plan and the gptj Anaconda3 enviromnment is still active, when you run the Python script, the Python should grab the last needed files and then read that delandzombie.txt prompt file and then generate 200 tokens and then print the results to the terminal and also save them to out.txt.

  1. python ./prompt.py

This is what running the python script looks like on a 2nd run. If this is what you see then everything is working correctly.

==============================WARNING: DEPRECATED!==============================
WARNING! This version of bitsandbytes is deprecated. Please switch to `pip install bitsandbytes` and the new repo: https://github.com/TimDettmers/bitsandbytes
==============================WARNING: DEPRECATED!==============================
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 This is a conversation between Del the Dark Human Male and a Zombie.
Del: GIMME A DAGGER.
Zombie: Awt SHUT UP, DEL.
Del: I just want a _dagger_.
Zombie: That’s not what I’m saying, DEL.
Del: Where the hell is
Blunt Dagger?
Zombie: Blunt Dagger is around here somewhere, DEL.
Del: Where is
Blunt Dagger?
Zombie: He’s out there, DEL.
Del: WHERE THE FUCK IS
HE?
Zombie: He wants his dagger BACK, DEL.
Del: Oh my god, I fuckin’ gotta find
that dumbass Blunt Dagger.
What happened to
Dead Earl? What happened to Earlz?
I woke up here (to this godforsaken place)
from a stupendous,
stellar, dream-like NAP,
was I pissed off? (:D)

Now
I sit here, like a spooked tree,
and listen to the wind whistle.
It�

I have attached files that might be helpful in setting things up.

import torch
import transformers
from transformers.models.gptj import GPTJForCausalLM
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = transformers.AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6b")
gpt = GPTJForCausalLM.from_pretrained(
"hivemind/gpt-j-6B-8bit", low_cpu_mem_usage=True
).to(device)
raw_text = open("prompts/delandzombie.txt", "r").read()
text = raw_text
prompt = tokenizer((raw_text), return_tensors="pt")
prompt = {key: value.to(device) for key, value in prompt.items()}
out = gpt.generate(
**prompt,
do_sample=True,
temperature=1.03,
top_k=500,
top_p=0.98,
max_new_tokens=200,
)
out = tokenizer.decode(out[0])
text = out
print(
"\n",
"\n",
str(text),
"\n",
"\n",
end="",
)
raw_text += text
output = open("out.txt", "a")
output.write(
str(text)
+ "\n"
+ "\n"
+ "------"
+ "\n"
+ "\n"
)
output.close()
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
max_length: int
min_length: int
do_sample: bool
early_stopping: bool
num_beams: int
temperature: float
top_k: int
top_p: float
typical_p: float
repetition_penalty: float
bad_words_ids: Iterable[int]
bos_token_id: int
pad_token_id: int
eos_token_id: int
length_penalty: float
no_repeat_ngram_size: int
encoder_no_repeat_ngram_size: int
num_return_sequences: int
max_time: float
max_new_tokens: int
decoder_start_token_id: int
use_cache: bool
num_beam_groups: int
diversity_penalty: float
prefix_allowed_tokens_fn: ((int, Tensor) -> List[int])
logits_processor: LogitsProcessorList
stopping_criteria: StoppingCriteriaList
constraints: List[Constraint]
output_attentions: bool
output_hidden_states: bool
output_scores: bool
return_dict_in_generate: bool
forced_bos_token_id: int
forced_eos_token_id: int
remove_invalid_values: bool
synced_gpus: bool
Fridouli Cocktail Recipe:
This is a conversation between Del the Dark Human Male and a Zombie.
Del: GIMME A DAGGER.
Zombie: Awt SHUT UP, DEL.
Del:
Top 10 Discontinued Fast Food Items:
1. The Bell Beefer
2. McDLT
3.
a bunch of gameboy games:
1. Agro Soar
2. Bamse
3. Baby T-Rex
4.
Top 10 stupidest star wars names.
10. Droopy McCool
9. Elan Sleazebaggano
8. Kit Fisto
7. Glup Shitto
6.
Top 10 Ray-Traced Video Games
10. Portal RTX (PC)
9. Quake 2 (PC)
8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment