choltha

## README.md

      
              1 file
            
          
              8 forks
            
          
              13 comments
            
          
              92 stars
            
          
                Artefact2
                / README.md
            
            
              Last active
              July 13, 2024 04:58
            
              
                GGUF quantizations overview
              
          
    Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggerganov/llama.cpp#5962
In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.
llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

  
## rwkv.py
# Taken from https://johanwind.github.io/2023/03/23/rwkv_details.html.
# I've added additional comments restructured it a tiny bit, which makes it clearer for me.

import numpy as np
from torch import load as torch_load  # Only for loading the model weights
from tokenizers import Tokenizer

exp = np.exp
layer_norm = lambda x, w, b : (x - np.mean(x)) / np.std(x) * w + b
sigmoid = lambda x : 1/(1 + exp(-x))

## tmux cheat sheet
https://duckduckgo.com/?q=tmux+cheat+sheet&atb=v47-1_x&ia=cheatsheet&iax=1

## iterm2-solarized.md

      
              2 files
            
          
              1632 forks
            
          
              494 comments
            
          
              8815 stars
            
          
                kevin-smets
                / iterm2-solarized.md
            
            
              Last active
              July 15, 2024 21:14
            
              
                iTerm2 + Oh My Zsh + Solarized color scheme + Source Code Pro Powerline + Font Awesome + [Powerlevel10k] - (macOS)
              
          
    Default


Powerlevel10k


## latency.txt
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy             3,000   ns        3 us
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD
	# Taken from https://johanwind.github.io/2023/03/23/rwkv_details.html.
	# I've added additional comments restructured it a tiny bit, which makes it clearer for me.

	import numpy as np
	from torch import load as torch_load # Only for loading the model weights
	from tokenizers import Tokenizer

	exp = np.exp
	layer_norm = lambda x, w, b : (x - np.mean(x)) / np.std(x) * w + b
	sigmoid = lambda x : 1/(1 + exp(-x))
	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD