JohanAR/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Usage

I've tried to make the script as robust as possible, but use at your own risk, no warranties given, etc.

Set the environment variable LLAMA_CPP_DIR to wherever you've checked out https://github.com/ggerganov/llama.cpp
Make sure it contains the convert-llama-ggml-to-gguf.py script, and that you've installed all necessary requirements to run it.
Locate the original model repo on huggingface.co and copy its URL, i.e. if you downloaded a model quantized by TheBloke, check the model card for the link to the original model (unless it's one of TheBloke's own models.. The important thing is that the repo URL you use contains a config.json with more than one line of text).
Run the script with the GGML files you want to convert as first argument, and the URL as second. If everything is set up right it will create a GGUF file next to the input file.

Example

export LLAMA_CPP_DIR=../../llama.cpp
./conv-ggmlv3.sh mythomax-l2-13b.ggmlv3.q5_K_M.bin https://huggingface.co/Gryphe/MythoMax-L2-13b


## conv-ggmlv3.sh
#!/bin/bash

INPUT=$1
URL=$2

usage () {
cat <<EOT
Usage:

  $0 <INPUT> <URL>

  INPUT   local file path to GGMLv3 file
  URL     path to huggingface.co model

Example:

  export LLAMA_CPP_DIR=../../llama.cpp
  $0 mythomax-l2-13b.ggmlv3.q5_K_M.bin https://huggingface.co/Gryphe/MythoMax-L2-13b

  Downloads metadata files from HuggingFace and generates a GGUF file which is placed next to the INPUT file.
EOT
}

[[ -z "${INPUT}" && -z "${URL}" ]] && { usage; exit 1; }

[[ -z "${LLAMA_CPP_DIR}" ]] && { echo "environment variable LLAMA_CPP_DIR must be set to llama.cpp dir path"; exit 1; }

CONV_SCRIPT="${LLAMA_CPP_DIR}/convert-llama-ggml-to-gguf.py"
[[ ! -f "${CONV_SCRIPT}" ]] && { echo "convert-llama-ggml-to-gguf.py script not found in LLAMA_CPP_DIR"; exit 1; }

[[ ! -f "${INPUT}" ]] && { echo "input file does not exist"; exit 1; }
[[ "${INPUT##*.}" != "bin" ]] && { echo "input file must have '.bin' suffix"; exit 1; }
[[ ! "${INPUT}" =~ ggmlv3 ]] && { echo "input file must contain string 'ggmlv3'"; exit 1; }

OUTPUT=$(echo "${INPUT%.bin}.gguf" | sed 's/\.ggmlv3//')
[[ -f "${OUTPUT}" ]] && { echo "output file '${OUTPUT}' already exists"; exit 1; }

[[ ! "${URL}" =~ ^https?://huggingface.co/[A-Za-z0-9_-]+/[A-Za-z0-9_-]+ ]] && { echo "URL must be huggingface.co model"; exit 1; }

if [[ "${URL}" =~ /tree/main$ ]]; then
  URL=$(echo "${URL}" | sed "s_/tree/main_/resolve/main_")
elif [[ ! "${URL}" =~ /resolve/main$ ]]; then
  URL="${URL}/resolve/main"
fi

TEMPDIR="$(mktemp --directory)"
trap 'rm -rf -- "$TEMPDIR"' EXIT

for FILE in "added_tokens.json" "config.json" "params.json" "vocab.json" "tokenizer.model" "tokenizer_config.json"; do
  wget --directory-prefix="${TEMPDIR}" "${URL}/${FILE}"
done

"${CONV_SCRIPT}" --input "${INPUT}" --output "${OUTPUT}" --model-metadata-dir "${TEMPDIR}"
exit $?
	#!/bin/bash

	INPUT=$1
	URL=$2

	usage () {
	cat <<EOT
	Usage:

	$0 <INPUT> <URL>

	INPUT local file path to GGMLv3 file
	URL path to huggingface.co model

	Example:

	export LLAMA_CPP_DIR=../../llama.cpp
	$0 mythomax-l2-13b.ggmlv3.q5_K_M.bin https://huggingface.co/Gryphe/MythoMax-L2-13b

	Downloads metadata files from HuggingFace and generates a GGUF file which is placed next to the INPUT file.
	EOT
	}

	[[ -z "${INPUT}" && -z "${URL}" ]] && { usage; exit 1; }

	[[ -z "${LLAMA_CPP_DIR}" ]] && { echo "environment variable LLAMA_CPP_DIR must be set to llama.cpp dir path"; exit 1; }

	CONV_SCRIPT="${LLAMA_CPP_DIR}/convert-llama-ggml-to-gguf.py"
	[[ ! -f "${CONV_SCRIPT}" ]] && { echo "convert-llama-ggml-to-gguf.py script not found in LLAMA_CPP_DIR"; exit 1; }

	[[ ! -f "${INPUT}" ]] && { echo "input file does not exist"; exit 1; }
	[[ "${INPUT##*.}" != "bin" ]] && { echo "input file must have '.bin' suffix"; exit 1; }
	[[ ! "${INPUT}" =~ ggmlv3 ]] && { echo "input file must contain string 'ggmlv3'"; exit 1; }

	OUTPUT=$(echo "${INPUT%.bin}.gguf" \| sed 's/\.ggmlv3//')
	[[ -f "${OUTPUT}" ]] && { echo "output file '${OUTPUT}' already exists"; exit 1; }

	[[ ! "${URL}" =~ ^https?://huggingface.co/[A-Za-z0-9_-]+/[A-Za-z0-9_-]+ ]] && { echo "URL must be huggingface.co model"; exit 1; }

	if [[ "${URL}" =~ /tree/main$ ]]; then
	URL=$(echo "${URL}" \| sed "s_/tree/main_/resolve/main_")
	elif [[ ! "${URL}" =~ /resolve/main$ ]]; then
	URL="${URL}/resolve/main"
	fi

	TEMPDIR="$(mktemp --directory)"
	trap 'rm -rf -- "$TEMPDIR"' EXIT

	for FILE in "added_tokens.json" "config.json" "params.json" "vocab.json" "tokenizer.model" "tokenizer_config.json"; do
	wget --directory-prefix="${TEMPDIR}" "${URL}/${FILE}"
	done

	"${CONV_SCRIPT}" --input "${INPUT}" --output "${OUTPUT}" --model-metadata-dir "${TEMPDIR}"
	exit $?