Skip to content

Instantly share code, notes, and snippets.

@thesven
Created May 16, 2024 21:31
Show Gist options
  • Save thesven/dbbc3b8dff7ba14820991fcefe2e1602 to your computer and use it in GitHub Desktop.
Save thesven/dbbc3b8dff7ba14820991fcefe2e1602 to your computer and use it in GitHub Desktop.
  1. LLama.cpp
    • git clone git@github.com:ggerganov/llama.cpp.git
    • pip install -r llama.cpp/requirements.txt
    • Compile llama.cpp a) cd llama.cpp b) make
  2. wikiextractor (this is used for creating imatrix files from wikipedia dumps)
  3. Download the HF model (see download script)
  4. Convert HF model to gguf format
    • python llama.cpp/convert.py {hf_model} --outfile {gguf_model} --outtype fp16
  5. [Optional if doing IQ style quantization] Generate the raw data for creating an imatrix file
    • python ./wikiextractor/WikiExtractor.py -o extracted enwiki-latest-pages-articles.xml.bz2
  6. [Optional if doing IQ style quantization] Generate the imatrix file
    • ./llama.cpp/imatrix -m {gguf_model} -f wiki.train.raw -o imatrix_{gguf_model}.dat --chunks 100
  7. Quantize
    • [iQ] ./llama.cpp/quantize --imatrix imatrix_{gguf_model}.dat {gguf_model} quantized_model.gguf iq2_xxs
    • [legacy] ./llama.cpp/quantize {gguf_model} quantized_model_Q4_K_M.gguf Q4_K_M
  8. Upload to HF (see upload script)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment