Birch-san

## .gitconfig
[url "https://oauth2:github_pat_REDACTED@github.com/"]
	insteadOf = https://github.com/
[url "https://oauth2:github_pat_REDACTED@github.com/MYCOOLORG/"]
	insteadOf = git@github.com:MYCOOLORG/

## img-folder-chunking.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                Birch-san
                / img-folder-chunking.md
            
            
              Last active
              February 3, 2024 18:30
            
              
                Chunking a folder of pngs into .tar files
              
          
    Uploading a folder of many files to HF, by chunking it into .tars

So you generated 50000 images for computing FID or whatever, and now you want to upload those samples to HF.

You try, but one of the filetransfers fails, and you lose all your progress.

I mean it'd be nice if HF could just… fix this… like, put retries into huggingface-cli upload instead of just discarding tens of gigabytes of progress… but we live in the world in which we live.
So let's make it easier. instead of 50k small files, let's upload 50 big files. Collate 'em into .tars.
I'm not sure this makes a valid WDS, but it's close; I think you would need to rename the files to 000000.img.png if you wanted that.

  
## installing-python-proxy.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Birch-san
                / installing-python-proxy.md
            
            
              Last active
              January 29, 2024 10:36
            
              
                Installing Python when behind a corporate proxy
              
          
    Behind a corporate proxy? Can't add PPAs to your apt listings?
A typical HTTP proxy URL may look like:

http://proxy.mycoolproxy.com:8080
Let's configure all our tools to use this proxy.
apt
sudo nano /etc/apt/apt.conf.d/00proxy.conf

  
## install-bpftrace-on-wsl2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Birch-san
                / install-bpftrace-on-wsl2.md
            
            
              Last active
              January 29, 2024 10:02
            
              
                Installing bpftrace on WSL2
              
          
    wsl --update --web-download
wsl --install -d Ubuntu-22.04 --web-download
wsl --setdefault Ubuntu-22.04
sudo apt-get install -y bpftrace bpftrace-dbgsym linux-headers-generic libc6-dev

  
## 8bit_adam_memory_usage.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Birch-san
                / 8bit_adam_memory_usage.md
            
            
              Last active
              October 3, 2023 18:20
            
              
                Unexplained memory usage of 8-bit AdamW (paged vs unpaged)
              
          
    Some weird memory usage (VRAM) is reported (by torch and by NVML) when using 8-bit AdamW, paged or unpaged.
Here we train llama 2 on 4096-token sequences, using either --optim adamw_8bit or --optim paged_adamw_8bit.

We do a full finetune using qlora.py --full-finetune, with our qlora.py fork, stepwise branch, commit 9a1045d.

We print the memory usage using HF transformers trainer's on_step_end callback. This is after optimizer.step(); model.zero_grad().
One would expect the memory usage at the end of step 1 to be the same as the end of step 2.

Yet for unpaged optimizer: memory usage leaps by 13.2GiB. End of step 1=70.4GiB, end of step 2=81.6GiB.

This appears to be a leap in PyTorch reserved memory only (32.6GiB -> 43.9GiB).

  
## t5-small-weight-inits.py
import torch
from transformers import T5ForConditionalGeneration

model: T5ForConditionalGeneration = T5ForConditionalGeneration.from_pretrained('google/t5-v1_1-small')

_inference_mode_context = torch._C._InferenceMode(True)
_inference_mode_context.__enter__()

model.shared.weight.std()
tensor(11.6375)

## local-copilot.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              23 stars
            
          
                Birch-san
                / local-copilot.md
            
            
              Last active
              March 12, 2024 15:14
            
              
                Running GitHub Copilot against local Code Llama model
              
          
    Running GitHub Copilot VSCode extension against local Code Llama model


Tested on NVIDIA RTX 4090, but these instructions also cover AMD and Mac in case you wanna try those.

This guide assumes you are running Linux (I ran this on Ubuntu).
Before you get excited:

  
## mask-test.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Birch-san
                / mask-test.ipynb
            
            
              Created
              September 2, 2023 15:32
            
              
                Tester for neighbourhood_mask, perimeter_mask
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## mask_test.py
from typing import Optional, NamedTuple
from torch import BoolTensor, arange, meshgrid, clamp
import torch

class Dimensions(NamedTuple):
  height: int
  width: int

def make_neighbourhood_mask(size: Dimensions, size_orig: Dimensions, device='cpu') -> BoolTensor:
  h, w = size

## llama_flash.py
from transformers import (
  AutoConfig,
  AutoTokenizer,
  BitsAndBytesConfig,
  GenerationConfig,
  AutoModelForCausalLM,
  LlamaTokenizerFast,
  PreTrainedModel,
  TextIteratorStreamer,
  StoppingCriteria,
	[url "https://oauth2:github_pat_REDACTED@github.com/"]
	insteadOf = https://github.com/
	[url "https://oauth2:github_pat_REDACTED@github.com/MYCOOLORG/"]
	insteadOf = git@github.com:MYCOOLORG/
	import torch
	from transformers import T5ForConditionalGeneration

	model: T5ForConditionalGeneration = T5ForConditionalGeneration.from_pretrained('google/t5-v1_1-small')

	_inference_mode_context = torch._C._InferenceMode(True)
	_inference_mode_context.__enter__()

	model.shared.weight.std()
	tensor(11.6375)
	from typing import Optional, NamedTuple
	from torch import BoolTensor, arange, meshgrid, clamp
	import torch

	class Dimensions(NamedTuple):
	height: int
	width: int

	def make_neighbourhood_mask(size: Dimensions, size_orig: Dimensions, device='cpu') -> BoolTensor:
	h, w = size
	from transformers import (
	AutoConfig,
	AutoTokenizer,
	BitsAndBytesConfig,
	GenerationConfig,
	AutoModelForCausalLM,
	LlamaTokenizerFast,
	PreTrainedModel,
	TextIteratorStreamer,
	StoppingCriteria,