navjack/1gpt-j 8bit readme.md

## 1gpt-j 8bit readme.md

      
    Raw
  

              1gpt-j 8bit readme.md
            
          
    Local GPT-J 8-Bit on WSL 2

This should would on GPUs with as little as 8GB of ram but in practice I've seen usage go up to 9-10GB
I have only personally tested this to be functional in WSL 2 and Windows 11's latest Dev preview build. Attempts to run natively in Windows didn't work but I won't stop you from trying.
I have personally backed up any possibly one day could be lost to time remote files. I could provide those if needed.
Now, why is this neat? Why is this cool?

I'm not sure how much LAMBADA is lost with the optimization done to GPT-J 6B to make it work in such small memory footprint but this should be way better than the previous best easy to run at home model, GPT-2 1.5B.
Prereqs:


WSL 2
Ubuntu installed in WSL 2
CUDA Toolkit 11.6 installed in Windows
Latest Anaconda3 installed in your WSL 2 distro (Ubuntu in my case: Anaconda3-2022.10-Linux-x86_64.sh)

You can easily install Anaconda3 with (in your WSL 2 distro):
wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
and then
./Anaconda3-2022.10-Linux-x86_64.sh
Run these commands in Windows Terminal:


Make a directory called gpt-j and then CD to it. Note that the bulk of the data is not stored here and is instead stored in your WSL 2's Anaconda3 envs folder.
cd "C:\gpt-j"
wsl

Once the WSL 2 terminal boots up:

conda create -n gptj python=3.8
conda activate gptj
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip uninstall -y transformers && pip install --no-cache-dir https://github.com/deniskamazur/transformers/archive/gpt-j-8bit.zip
pip install bitsandbytes-cuda111
pip install datasets==1.16.1
Make a file called prompt.py and put the code below into it

import torch
import transformers
from transformers.models.gptj import GPTJForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = transformers.AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6b")
gpt = GPTJForCausalLM.from_pretrained(
    "hivemind/gpt-j-6B-8bit", low_cpu_mem_usage=True
).to(device)
raw_text = open("prompts/delandzombie.txt", "r").read()
text = raw_text
prompt = tokenizer((raw_text), return_tensors="pt")
prompt = {key: value.to(device) for key, value in prompt.items()}
out = gpt.generate(
    **prompt,
    do_sample=True,
    temperature=1.03,
    top_k=500,
    top_p=0.98,
    max_new_tokens=200,
)
out = tokenizer.decode(out[0])
text = out
print(
    "\n",
    "\n",
    str(text),
    "\n",
    "\n",
    end="",
)
raw_text += text
output = open("out.txt", "a")
output.write(
    str(text)
    + "\n"
    + "\n"
    + "------"
    + "\n"
    + "\n"
)
output.close()

Make a directory called prompts
Put the text files you want to be your prompt starters into this folder
Make a file called delandzombie.txt in the prompts folder and put the text below into the file

This is a conversation between Del the Dark Human Male and a Zombie.
Del: GIMME A DAGGER.
Zombie: Awt SHUT UP, DEL.
Del:

Running the thing!

If everything went according to plan and the gptj Anaconda3 enviromnment is still active, when you run the Python script, the Python should grab the last needed files and then read that delandzombie.txt prompt file and then generate 200 tokens and then print the results to the terminal and also save them to out.txt.

python ./prompt.py

This is what running the python script looks like on a 2nd run. If this is what you see then everything is working correctly.
==============================WARNING: DEPRECATED!==============================
WARNING! This version of bitsandbytes is deprecated. Please switch to `pip install bitsandbytes` and the new repo: https://github.com/TimDettmers/bitsandbytes
==============================WARNING: DEPRECATED!==============================
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 This is a conversation between Del the Dark Human Male and a Zombie.
Del: GIMME A DAGGER.
Zombie: Awt SHUT UP, DEL.
Del: I just want a _dagger_.
Zombie: That’s not what I’m saying, DEL.
Del: Where the hell is
Blunt Dagger?
Zombie: Blunt Dagger is around here somewhere, DEL.
Del: Where is
Blunt Dagger?
Zombie: He’s out there, DEL.
Del: WHERE THE FUCK IS
HE?
Zombie: He wants his dagger BACK, DEL.
Del: Oh my god, I fuckin’ gotta find
that dumbass Blunt Dagger.
What happened to
Dead Earl? What happened to Earlz?
I woke up here (to this godforsaken place)
from a stupendous,
stellar, dream-like NAP,
was I pissed off? (:D)

Now
I sit here, like a spooked tree,
and listen to the wind whistle.
It�


I have attached files that might be helpful in setting things up.

  
## 2prompt.py
import torch
import transformers
from transformers.models.gptj import GPTJForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = transformers.AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6b")
gpt = GPTJForCausalLM.from_pretrained(
    "hivemind/gpt-j-6B-8bit", low_cpu_mem_usage=True
).to(device)
raw_text = open("prompts/delandzombie.txt", "r").read()
text = raw_text
prompt = tokenizer((raw_text), return_tensors="pt")
prompt = {key: value.to(device) for key, value in prompt.items()}
out = gpt.generate(
    **prompt,
    do_sample=True,
    temperature=1.03,
    top_k=500,
    top_p=0.98,
    max_new_tokens=200,
)
out = tokenizer.decode(out[0])
text = out
print(
    "\n",
    "\n",
    str(text),
    "\n",
    "\n",
    end="",
)
raw_text += text
output = open("out.txt", "a")
output.write(
    str(text)
    + "\n"
    + "\n"
    + "------"
    + "\n"
    + "\n"
)
output.close()

## 3notebook.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              3notebook.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 4params.txt
max_length: int
min_length: int
do_sample: bool
early_stopping: bool
num_beams: int
temperature: float
top_k: int
top_p: float
typical_p: float
repetition_penalty: float
bad_words_ids: Iterable[int]
bos_token_id: int
pad_token_id: int
eos_token_id: int
length_penalty: float
no_repeat_ngram_size: int
encoder_no_repeat_ngram_size: int
num_return_sequences: int
max_time: float
max_new_tokens: int
decoder_start_token_id: int
use_cache: bool
num_beam_groups: int
diversity_penalty: float
prefix_allowed_tokens_fn: ((int, Tensor) -> List[int])
logits_processor: LogitsProcessorList
stopping_criteria: StoppingCriteriaList
constraints: List[Constraint]
output_attentions: bool
output_hidden_states: bool
output_scores: bool
return_dict_in_generate: bool
forced_bos_token_id: int
forced_eos_token_id: int
remove_invalid_values: bool
synced_gpus: bool

## cocktail.txt
Fridouli Cocktail Recipe:

## delandzombie.txt
This is a conversation between Del the Dark Human Male and a Zombie.
Del: GIMME A DAGGER.
Zombie: Awt SHUT UP, DEL.
Del:

## fastfood.txt
Top 10 Discontinued Fast Food Items:
1. The Bell Beefer
2. McDLT
3.

## gameboygames.txt
a bunch of gameboy games:
1. Agro Soar
2. Bamse
3. Baby T-Rex
4.

## starwarsnames.txt
Top 10 stupidest star wars names.
10. Droopy McCool
9. Elan Sleazebaggano
8. Kit Fisto
7. Glup Shitto
6.

## test.txt
Top 10 Ray-Traced Video Games
10. Portal RTX (PC)
9. Quake 2 (PC)
8.
	import torch
	import transformers
	from transformers.models.gptj import GPTJForCausalLM

	device = "cuda" if torch.cuda.is_available() else "cpu"
	tokenizer = transformers.AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6b")
	gpt = GPTJForCausalLM.from_pretrained(
	"hivemind/gpt-j-6B-8bit", low_cpu_mem_usage=True
	).to(device)
	raw_text = open("prompts/delandzombie.txt", "r").read()
	text = raw_text
	prompt = tokenizer((raw_text), return_tensors="pt")
	prompt = {key: value.to(device) for key, value in prompt.items()}
	out = gpt.generate(
	**prompt,
	do_sample=True,
	temperature=1.03,
	top_k=500,
	top_p=0.98,
	max_new_tokens=200,
	)
	out = tokenizer.decode(out[0])
	text = out
	print(
	"\n",
	"\n",
	str(text),
	"\n",
	"\n",
	end="",
	)
	raw_text += text
	output = open("out.txt", "a")
	output.write(
	str(text)
	+ "\n"
	+ "\n"
	+ "------"
	+ "\n"
	+ "\n"
	)
	output.close()
	max_length: int
	min_length: int
	do_sample: bool
	early_stopping: bool
	num_beams: int
	temperature: float
	top_k: int
	top_p: float
	typical_p: float
	repetition_penalty: float
	bad_words_ids: Iterable[int]
	bos_token_id: int
	pad_token_id: int
	eos_token_id: int
	length_penalty: float
	no_repeat_ngram_size: int
	encoder_no_repeat_ngram_size: int
	num_return_sequences: int
	max_time: float
	max_new_tokens: int
	decoder_start_token_id: int
	use_cache: bool
	num_beam_groups: int
	diversity_penalty: float
	prefix_allowed_tokens_fn: ((int, Tensor) -> List[int])
	logits_processor: LogitsProcessorList
	stopping_criteria: StoppingCriteriaList
	constraints: List[Constraint]
	output_attentions: bool
	output_hidden_states: bool
	output_scores: bool
	return_dict_in_generate: bool
	forced_bos_token_id: int
	forced_eos_token_id: int
	remove_invalid_values: bool
	synced_gpus: bool
	This is a conversation between Del the Dark Human Male and a Zombie.
	Del: GIMME A DAGGER.
	Zombie: Awt SHUT UP, DEL.
	Del:
	Top 10 Discontinued Fast Food Items:
	1. The Bell Beefer
	2. McDLT
	3.
	a bunch of gameboy games:
	1. Agro Soar
	2. Bamse
	3. Baby T-Rex
	4.
	Top 10 stupidest star wars names.
	10. Droopy McCool
	9. Elan Sleazebaggano
	8. Kit Fisto
	7. Glup Shitto
	6.
	Top 10 Ray-Traced Video Games
	10. Portal RTX (PC)
	9. Quake 2 (PC)
	8.