Skip to content

Instantly share code, notes, and snippets.

View dranger003's full-sized avatar

DAN™ dranger003

  • Canada
  • 22:58 (UTC -04:00)
View GitHub Profile
@Artefact2
Artefact2 / README.md
Last active April 30, 2024 17:18
GGUF quantizations overview

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggerganov/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix