thesven/GGUF-Instructions.md

## GGUF-Instructions.md

      
    Raw
  

              GGUF-Instructions.md
            
          
LLama.cpp

git clone git@github.com:ggerganov/llama.cpp.git
pip install -r llama.cpp/requirements.txt
Compile llama.cpp
a) cd llama.cpp
b) make


wikiextractor (this is used for creating imatrix files from wikipedia dumps)

git clone git@github.com:attardi/wikiextractor.git
cd wikiextractor
wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

this is a 20gb + file so may take a while to download


python -m wikiextractor/WikiExtractor -o extracted enwiki-latest-pages-articles.xml.bz2
cat extracted// > wiki.train.raw
mv wiki.train.raw ../
cd ../
rm -rf wikiextractor


Download the HF model (see download script)
Convert HF model to gguf format

python llama.cpp/convert.py {hf_model} --outfile {gguf_model} --outtype fp16


[Optional if doing IQ style quantization] Generate the raw data for creating an imatrix file

python ./wikiextractor/WikiExtractor.py -o extracted enwiki-latest-pages-articles.xml.bz2


[Optional if doing IQ style quantization] Generate the imatrix file

./llama.cpp/imatrix -m {gguf_model} -f wiki.train.raw -o imatrix_{gguf_model}.dat --chunks 100


Quantize

[iQ] ./llama.cpp/quantize --imatrix imatrix_{gguf_model}.dat {gguf_model} quantized_model.gguf iq2_xxs
[legacy] ./llama.cpp/quantize  {gguf_model} quantized_model_Q4_K_M.gguf Q4_K_M


Upload to HF (see upload script)