Tim Dettmers TimDettmers

## gist:385014b37f998c7857b15a3ea60b4cae
Config: max_steps: 17500, lora_r: 64  , lr: 0.0002, bf16: False, lora_modules: all , bits: 4   , full_finetune: False, lora_dropout: 0.0 , warmup_steps: 100 , compress_statistics: True, dataset: NaN , gradient_accumulation_steps: NaN , learning_rate: NaN , quant_type: fp4 , adam_beta2: 0.999, update_freq: 6
eval_bert_f1 mean (SE): 64.8716 (21.8331). 95% CI (22.079, 107.664). Sample size: 4
eval_bert_f1 mean (SE): 64.8716 (21.8331). 95% CI (22.079, 107.664). Sample size: 4
eval_rougeL mean (SE): 33.1083 (19.1162). 95% CI (-4.359, 70.576). Sample size: 4
================================================================================
Config: max_steps: 17500, lora_r: 64  , lr: 0.0002, bf16: False, lora_modules: all , bits: 4   , full_finetune: False, lora_dropout: 0.0 , warmup_steps: 100 , compress_statistics: False, dataset: NaN , gradient_accumulation_steps: NaN , learning_rate: NaN , quant_type: fp4 , adam_beta2: 0.999, update_freq: 6
eval_bert_f1 mean (SE): 67.0044 (22.3593). 95% CI (23.180, 110.829).

## gist:a96b0d948a97583f8ab0599fa888c35c
This table contains data from multiple software versions. Some hyperparamter names are "NaN" meaning, they did not exist in that software version. The best 7B result is 40.08 MMLU.

================================================================================
Config: learning_rate: 0.005, adam_beta2: 0.999, lora_dropout: 0.0 , max_grad_norm: 1.0 , max_steps: 7320, lr_scheduler_type: <SchedulerType.COSINE: cosine>, weight_decay: 0.0 , base_model: /gscratch/zlab/llama/7B, quant_type: nf4 , gradient_accumulation_steps: 6   , per_device_train_batch_size: 2
acc mean (SE): 0.2290 (nan). 95% CI (nan, nan). Sample size: 1
================================================================================
Config: learning_rate: 0.0002, adam_beta2: 0.999, lora_dropout: 0.0 , max_grad_norm: 0.3 , max_steps: 9750, lr_scheduler_type: <SchedulerType.CONSTANT: constant>, weight_decay: 0.0 , base_model: NaN , quant_type: nf4 , gradient_accumulation_steps: 2   , per_device_train_batch_size: 8
acc mean (SE): 0.2290 (0.0

## inference_hf_8bit.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MAX_NEW_TOKENS = 128
model_name = 'facebook/opt-6.7b'

text = """Hello, I am a prompt. Who are you?"""
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer(text, return_tensors="pt").input_ids

## find_huffman_ratio.py
import torch
import bitsandbytes as bnb

from heapq import heappush, heappop, heapify

a = torch.normal(0, 0.5, size=(1024, 1024),device='cuda')

def get_compression(x:torch.Tensor)->float:
    """Yields the compression rate of Huffman Coding"""
    assert x.device.type == 'cuda'
	Config: max_steps: 17500, lora_r: 64 , lr: 0.0002, bf16: False, lora_modules: all , bits: 4 , full_finetune: False, lora_dropout: 0.0 , warmup_steps: 100 , compress_statistics: True, dataset: NaN , gradient_accumulation_steps: NaN , learning_rate: NaN , quant_type: fp4 , adam_beta2: 0.999, update_freq: 6
	eval_bert_f1 mean (SE): 64.8716 (21.8331). 95% CI (22.079, 107.664). Sample size: 4
	eval_bert_f1 mean (SE): 64.8716 (21.8331). 95% CI (22.079, 107.664). Sample size: 4
	eval_rougeL mean (SE): 33.1083 (19.1162). 95% CI (-4.359, 70.576). Sample size: 4
	================================================================================
	Config: max_steps: 17500, lora_r: 64 , lr: 0.0002, bf16: False, lora_modules: all , bits: 4 , full_finetune: False, lora_dropout: 0.0 , warmup_steps: 100 , compress_statistics: False, dataset: NaN , gradient_accumulation_steps: NaN , learning_rate: NaN , quant_type: fp4 , adam_beta2: 0.999, update_freq: 6
	eval_bert_f1 mean (SE): 67.0044 (22.3593). 95% CI (23.180, 110.829).
	This table contains data from multiple software versions. Some hyperparamter names are "NaN" meaning, they did not exist in that software version. The best 7B result is 40.08 MMLU.

	================================================================================
	Config: learning_rate: 0.005, adam_beta2: 0.999, lora_dropout: 0.0 , max_grad_norm: 1.0 , max_steps: 7320, lr_scheduler_type: <SchedulerType.COSINE: cosine>, weight_decay: 0.0 , base_model: /gscratch/zlab/llama/7B, quant_type: nf4 , gradient_accumulation_steps: 6 , per_device_train_batch_size: 2
	acc mean (SE): 0.2290 (nan). 95% CI (nan, nan). Sample size: 1
	================================================================================
	Config: learning_rate: 0.0002, adam_beta2: 0.999, lora_dropout: 0.0 , max_grad_norm: 0.3 , max_steps: 9750, lr_scheduler_type: <SchedulerType.CONSTANT: constant>, weight_decay: 0.0 , base_model: NaN , quant_type: nf4 , gradient_accumulation_steps: 2 , per_device_train_batch_size: 8
	acc mean (SE): 0.2290 (0.0
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	MAX_NEW_TOKENS = 128
	model_name = 'facebook/opt-6.7b'

	text = """Hello, I am a prompt. Who are you?"""
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	input_ids = tokenizer(text, return_tensors="pt").input_ids
	import torch
	import bitsandbytes as bnb

	from heapq import heappush, heappop, heapify

	a = torch.normal(0, 0.5, size=(1024, 1024),device='cuda')

	def get_compression(x:torch.Tensor)->float:
	"""Yields the compression rate of Huffman Coding"""
	assert x.device.type == 'cuda'