Birch-san

## arb.py
import numpy as np
import math
from numpy.typing import NDArray

# we are trying to make buckets of varying aspect ratios,
# all with about the same area (equivalent to a 512x512 square)
square_side = 512
buckets = 8

widest_aspect: float = math.atan2(1, 2)  # 1/2 = 0.5 aspect ratio

## flash_attn_processor.py
import torch
from typing import Optional
from flash_attn import flash_attn_func, flash_attn_qkvpacked_func
from diffusers.models.attention import Attention

class FlashAttnProcessor:
    r"""
    Processor for implementing memory efficient attention using flash_attn.
    """

## flash_attn_processor.py
import torch
from typing import Optional
from flash_attn import flash_attn_func
from diffusers.models.attention import Attention

class FlashAttnProcessor:
    r"""
    Processor for implementing memory efficient attention using flash_attn.
    """

## bnb-correctness-test.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Birch-san
                / bnb-correctness-test.md
            
            
              Last active
              July 10, 2023 17:43
            
              
                Correctness-testing bitsandbytes `0.40.0`
              
          
    correctness-testing 0.40.0

Here we've ramped up the bnb_4bit_compute_dtype to float32, in the hopes of making the model stay on-topic.

Since we were concerned by the responses measured with bnb_4bit_compute_dtype=bfloat16
llama 7b

`I was under the effect of a counterspell, so none of the superpower-wielding monsters could see me anyway. My eyes had begun to change as a result of my battle with Melvin. The transformation was complete. I was in the true look of my chosen form. As you can see, a true-blue beauty. There was only one of me, though, so I would have to make sure that this was the end.
I went to catch the culprit. He was in the same clothes he was wearing when he committed the first murder. I did not recognize the man from that time, nor did he from me, but his face was twisted with an evil grin. He had the same shaved head. However, his hair seemed to change color. It was dark brown when I met him, but it turned to

  
## bnb-perf-test.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Birch-san
                / bnb-perf-test.md
            
            
              Last active
              July 10, 2023 17:05
            
              
                Perf-testing bitsandbytes `0.39.1` vs `0.40.0`
              
          
    perf-testing bitsandbytes 0.39.1 vs 0.40.0

4090 on CUDA 12.1
seed=64
Evaluated using evaluate.py,
python -m evaluate --model_name_or_path huggyllama/llama-7b --tokenizer_model_name_or_path huggyllama/llama-7b --bf16 --overrun_countermeasures False --prompt_style bare

  
## gist:d539b53ff30381228180bef9a8653956
[14.614647   14.526281   14.438574   14.351521   14.265114   14.179349
 14.094221   14.009725   13.925854   13.842604   13.759968   13.677942
 13.596522   13.515701   13.435474   13.355838   13.276786   13.198313
 13.120416   13.043088   12.966325   12.890123   12.814478   12.739382
 12.664833   12.590827   12.517358   12.444422   12.372013   12.300129
 12.2287655  12.157917   12.087579   12.017748   11.94842    11.879589
 11.8112545  11.743409   11.67605    11.609174   11.542775   11.4768505
 11.411397   11.346409   11.281884   11.217818   11.154207   11.091047
 11.028336   10.966067   10.90424    10.842849   10.781891   10.721362
 10.661261   10.601581   10.542321   10.483477   10.425045   10.367022

## falcon-40b-spqr.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Birch-san
                / falcon-40b-spqr.md
            
            
              Created
              June 9, 2023 22:43
            
              
                Run Falcon-40B with 3.35-bit quantization via SpQR
              
          
    Instructions are a work-in-progress (I haven't managed it yet, just writing what I do as I go along).

  
## code-assist.md

      
              1 file
            
          
              3 forks
            
          
              5 comments
            
          
              32 stars
            
          
                Birch-san
                / code-assist.md
            
            
              Last active
              March 4, 2024 19:32
            
              
                Local VSCode AI code assistance via starcoder + 4-bit quantization in ~11GB VRAM
              
          
    Install HF Code Autocomplete VSCode plugin.
We are not going to set an API token. We are going to specify an API endpoint.

We will try to deploy that API ourselves, to use our own GPU to provide the code assistance.
We will use bigcode/starcoder, a 15.5B param model.

We will use NF4 4-bit quantization to fit this into 10787MiB VRAM.

It would require 23767MiB VRAM unquantized. (still fits on a 4090, which has 24564MiB)!
Setup API


## data_collator.py
class ExtractedCriticSample(TypedDict):
    prompt: str
    continuation: str
    rating: int

@dataclass
class DataCollatorForCriticLM(object):
    tokenizer: transformers.PreTrainedTokenizer
    prompt_max_len: int
    continuation_max_len: int

## pahse1_train_sample_1.json
{
  "labeler": "e90a38f3-3135-4465-87af-3e6322e3d772",
  "timestamp": "2022-07-17T16:56:51.323252",
  "generation": null,
  "is_quality_control_question": false,
  "is_initial_screening_question": false,
  "question":
  {
    "problem": "How many positive two-digit integers leave a remainder of 2 when divided by 8?",
    "ground_truth_answer": "12"
	import numpy as np
	import math
	from numpy.typing import NDArray

	# we are trying to make buckets of varying aspect ratios,
	# all with about the same area (equivalent to a 512x512 square)
	square_side = 512
	buckets = 8

	widest_aspect: float = math.atan2(1, 2) # 1/2 = 0.5 aspect ratio
	import torch
	from typing import Optional
	from flash_attn import flash_attn_func, flash_attn_qkvpacked_func
	from diffusers.models.attention import Attention

	class FlashAttnProcessor:
	r"""
	Processor for implementing memory efficient attention using flash_attn.
	"""
	import torch
	from typing import Optional
	from flash_attn import flash_attn_func
	from diffusers.models.attention import Attention

	class FlashAttnProcessor:
	r"""
	Processor for implementing memory efficient attention using flash_attn.
	"""
	[14.614647 14.526281 14.438574 14.351521 14.265114 14.179349
	14.094221 14.009725 13.925854 13.842604 13.759968 13.677942
	13.596522 13.515701 13.435474 13.355838 13.276786 13.198313
	13.120416 13.043088 12.966325 12.890123 12.814478 12.739382
	12.664833 12.590827 12.517358 12.444422 12.372013 12.300129
	12.2287655 12.157917 12.087579 12.017748 11.94842 11.879589
	11.8112545 11.743409 11.67605 11.609174 11.542775 11.4768505
	11.411397 11.346409 11.281884 11.217818 11.154207 11.091047
	11.028336 10.966067 10.90424 10.842849 10.781891 10.721362
	10.661261 10.601581 10.542321 10.483477 10.425045 10.367022
	class ExtractedCriticSample(TypedDict):
	prompt: str
	continuation: str
	rating: int

	@dataclass
	class DataCollatorForCriticLM(object):
	tokenizer: transformers.PreTrainedTokenizer
	prompt_max_len: int
	continuation_max_len: int
	{
	"labeler": "e90a38f3-3135-4465-87af-3e6322e3d772",
	"timestamp": "2022-07-17T16:56:51.323252",
	"generation": null,
	"is_quality_control_question": false,
	"is_initial_screening_question": false,
	"question":
	{
	"problem": "How many positive two-digit integers leave a remainder of 2 when divided by 8?",
	"ground_truth_answer": "12"