Skip to content

Instantly share code, notes, and snippets.

@Artefact2
Artefact2 / README.md
Last active July 13, 2024 04:58
GGUF quantizations overview

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggerganov/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

@mattiasarro
mattiasarro / rwkv.py
Last active May 27, 2024 09:17
RWKV MVP
# Taken from https://johanwind.github.io/2023/03/23/rwkv_details.html.
# I've added additional comments restructured it a tiny bit, which makes it clearer for me.
import numpy as np
from torch import load as torch_load # Only for loading the model weights
from tokenizers import Tokenizer
exp = np.exp
layer_norm = lambda x, w, b : (x - np.mean(x)) / np.std(x) * w + b
sigmoid = lambda x : 1/(1 + exp(-x))
@zcourts
zcourts / tmux cheat sheet
Last active April 30, 2024 10:53
tmux cheat sheet image from DuckDuckGo UI
https://duckduckgo.com/?q=tmux+cheat+sheet&atb=v47-1_x&ia=cheatsheet&iax=1
@kevin-smets
kevin-smets / iterm2-solarized.md
Last active July 15, 2024 21:14
iTerm2 + Oh My Zsh + Solarized color scheme + Source Code Pro Powerline + Font Awesome + [Powerlevel10k] - (macOS)

Default

Default

Powerlevel10k

Powerlevel10k

@jboner
jboner / latency.txt
Last active July 17, 2024 03:12
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD