Manuel Morales manuelmorales

## README.md

      
        
          
            
              
              1 file
            
          
          
            
              
              8 forks
            
          
          
            
              
              13 comments
            
          
          
            
              
              86 stars
            
          
        
        
          
              
          
          
            
                Artefact2
                / README.md
            
            
              Last active
              June 25, 2024 19:00
            
              
                GGUF quantizations overview
              
          
        
      
        
  
      
    Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggerganov/llama.cpp#5962
In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.
llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix