Here are benchmarks for combinations of the following changes (& master @ this commit):
- resampling fix: ggerganov/llama.cpp#7424
- grammar-fast (codepoints caching): ggerganov/llama.cpp#6811
- early exit: ggerganov/llama.cpp#7370
Results: resampling fix is by far the most useful (might be undoing the regression to ggerganov/llama.cpp#4306 likely introduced in ggerganov/llama.cpp#6240; ~2x faster overall inference w/ Phi-2), and codepoint caching is also a bit useful on top of it. The early exit might require extra confirmation but seems useless on top of the others.
git clone https://github.com/ochafik/llama.cpp llama.cpp-ochafik
cd llama.cpp-ochafik
Tests results below are on an M3 Pro 36GB w/ 18GPU cores
hyperfine --warmup 1 --runs 10 \
-L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master \
--setup 'git checkout {branch} && \
make clean && \
make -j LLAMA_CURL=1 main' \
'BRANCH={branch} \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '"'"'{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}'"'"' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt'
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.889 s ± 0.058 s [User: 0.297 s, System: 0.100 s]
Range (min … max): 1.809 s … 1.969 s 10 runs
Benchmark 2: BRANCH=grammar-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.932 s ± 0.067 s [User: 0.303 s, System: 0.101 s]
Range (min … max): 1.821 s … 2.055 s 10 runs
Benchmark 3: BRANCH=grammar-fast \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 2.112 s ± 0.127 s [User: 0.378 s, System: 0.121 s]
Range (min … max): 2.005 s … 2.436 s 10 runs
Benchmark 4: BRANCH=grammar-fast-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.895 s ± 0.118 s [User: 0.256 s, System: 0.111 s]
Range (min … max): 1.766 s … 2.109 s 10 runs
Benchmark 5: BRANCH=grammar-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.913 s ± 0.062 s [User: 0.303 s, System: 0.120 s]
Range (min … max): 1.854 s … 2.074 s 10 runs
Benchmark 6: BRANCH=grammars-early-exit \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 3.751 s ± 0.200 s [User: 1.893 s, System: 0.159 s]
Range (min … max): 3.558 s … 4.221 s 10 runs
Benchmark 7: BRANCH=master \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 3.619 s ± 0.075 s [User: 1.882 s, System: 0.148 s]
Range (min … max): 3.561 s … 3.737 s 10 runs
Summary
'BRANCH=grammar-fast-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt' ran
1.00 ± 0.07 times faster than 'BRANCH=grammar-fast-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt'
1.01 ± 0.05 times faster than 'BRANCH=grammar-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt'
1.02 ± 0.05 times faster than 'BRANCH=grammar-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt'
1.12 ± 0.08 times faster than 'BRANCH=grammar-fast \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt'
1.92 ± 0.07 times faster than 'BRANCH=master \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt'
1.99 ± 0.12 times faster than 'BRANCH=grammars-early-exit \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
-p "JSON list of 50 integers starting from 100000" \
--seed 12345 --no-display-prompt'
( export COMMON_ARGS=(
-mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf
-m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf
--prompt-cache issue4218.bin
--grammar-file issue4218.gbnf
-f issue4218.txt
-c 3400
) && \
hyperfine --warmup 1 --runs 10 \
-L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master \
--setup "\
git checkout {branch} && \
make clean && make -j LLAMA_CURL=1 main && \
rm -f issue4218.bin && \
./main ${COMMON_ARGS[*]} -n 1" \
"BRANCH={branch} \
./main ${COMMON_ARGS[*]} -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt" )
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
Time (mean ± σ): 4.852 s ± 0.090 s [User: 0.450 s, System: 0.209 s]
Range (min … max): 4.737 s … 4.987 s 10 runs
Benchmark 2: BRANCH=grammar-resampling ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
Time (mean ± σ): 4.877 s ± 0.067 s [User: 0.549 s, System: 0.245 s]
Range (min … max): 4.777 s … 4.995 s 10 runs
Benchmark 3: BRANCH=grammar-fast ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
Time (mean ± σ): 7.498 s ± 0.053 s [User: 3.444 s, System: 0.287 s]
Range (min … max): 7.461 s … 7.642 s 10 runs
Benchmark 4: BRANCH=grammar-fast-resampling ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
Time (mean ± σ): 4.685 s ± 0.048 s [User: 0.485 s, System: 0.205 s]
Range (min … max): 4.633 s … 4.778 s 10 runs
Benchmark 5: BRANCH=grammar-resampling-early ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
Time (mean ± σ): 4.605 s ± 0.064 s [User: 0.442 s, System: 0.231 s]
Range (min … max): 4.486 s … 4.729 s 10 runs
Benchmark 6: BRANCH=grammars-early-exit ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
Time (mean ± σ): 5.725 s ± 0.082 s [User: 1.718 s, System: 0.310 s]
Range (min … max): 5.643 s … 5.901 s 10 runs
Benchmark 7: BRANCH=master ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
Time (mean ± σ): 7.995 s ± 0.098 s [User: 3.847 s, System: 0.305 s]
Range (min … max): 7.885 s … 8.169 s 10 runs
Summary
'BRANCH=grammar-resampling-early ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt' ran
1.02 ± 0.02 times faster than 'BRANCH=grammar-fast-resampling ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
1.05 ± 0.02 times faster than 'BRANCH=grammar-fast-resampling-early ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
1.06 ± 0.02 times faster than 'BRANCH=grammar-resampling ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
1.24 ± 0.02 times faster than 'BRANCH=grammars-early-exit ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
1.63 ± 0.03 times faster than 'BRANCH=grammar-fast ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
1.74 ± 0.03 times faster than 'BRANCH=master ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
hyperfine --warmup 1 --runs 10 \
-L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master \
--setup 'git checkout {branch} && \
make clean && \
make -j LLAMA_CURL=1 main' \
'BRANCH={branch} \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt'
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 3.342 s ± 0.055 s [User: 1.073 s, System: 0.153 s]
Range (min … max): 3.280 s … 3.479 s 10 runs
Benchmark 2: BRANCH=grammar-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 3.379 s ± 0.083 s [User: 1.074 s, System: 0.152 s]
Range (min … max): 3.312 s … 3.549 s 10 runs
Benchmark 3: BRANCH=grammar-fast \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 3.584 s ± 0.196 s [User: 0.602 s, System: 0.220 s]
Range (min … max): 3.435 s … 4.082 s 10 runs
Benchmark 4: BRANCH=grammar-fast-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 2.579 s ± 0.029 s [User: 0.345 s, System: 0.142 s]
Range (min … max): 2.549 s … 2.637 s 10 runs
Benchmark 5: BRANCH=grammar-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 3.373 s ± 0.061 s [User: 1.074 s, System: 0.148 s]
Range (min … max): 3.316 s … 3.482 s 10 runs
Benchmark 6: BRANCH=grammars-early-exit \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 6.659 s ± 0.081 s [User: 3.621 s, System: 0.276 s]
Range (min … max): 6.575 s … 6.804 s 10 runs
Benchmark 7: BRANCH=master \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt
Time (mean ± σ): 6.655 s ± 0.054 s [User: 3.618 s, System: 0.289 s]
Range (min … max): 6.589 s … 6.786 s 10 runs
Summary
'BRANCH=grammar-fast-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt' ran
1.30 ± 0.03 times faster than 'BRANCH=grammar-fast-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt'
1.31 ± 0.03 times faster than 'BRANCH=grammar-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt'
1.31 ± 0.04 times faster than 'BRANCH=grammar-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt'
1.39 ± 0.08 times faster than 'BRANCH=grammar-fast \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt'
2.58 ± 0.04 times faster than 'BRANCH=master \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt'
2.58 ± 0.04 times faster than 'BRANCH=grammars-early-exit \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file tsconfig.gbnf \
-p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
--seed 12345 --no-display-prompt'
hyperfine \
--warmup 1 --runs 10 \
-L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master \
--setup 'git checkout {branch} && make clean && make -j LLAMA_CURL=1 main' \
'BRANCH={branch} \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344'
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344
Time (mean ± σ): 1.055 s ± 0.041 s [User: 0.251 s, System: 0.077 s]
Range (min … max): 1.017 s … 1.153 s 10 runs
Benchmark 2: BRANCH=grammar-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344
Time (mean ± σ): 1.069 s ± 0.071 s [User: 0.245 s, System: 0.071 s]
Range (min … max): 1.007 s … 1.213 s 10 runs
Benchmark 3: BRANCH=grammar-fast \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344
Time (mean ± σ): 1.055 s ± 0.066 s [User: 0.215 s, System: 0.085 s]
Range (min … max): 0.996 s … 1.222 s 10 runs
Benchmark 4: BRANCH=grammar-fast-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344
Time (mean ± σ): 1.003 s ± 0.042 s [User: 0.184 s, System: 0.090 s]
Range (min … max): 0.949 s … 1.106 s 10 runs
Benchmark 5: BRANCH=grammar-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344
Time (mean ± σ): 1.052 s ± 0.062 s [User: 0.239 s, System: 0.091 s]
Range (min … max): 1.008 s … 1.219 s 10 runs
Benchmark 6: BRANCH=grammars-early-exit \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344
Time (mean ± σ): 1.605 s ± 0.067 s [User: 0.718 s, System: 0.102 s]
Range (min … max): 1.528 s … 1.706 s 10 runs
Benchmark 7: BRANCH=master \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344
Time (mean ± σ): 1.578 s ± 0.049 s [User: 0.718 s, System: 0.099 s]
Range (min … max): 1.533 s … 1.709 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
'BRANCH=grammar-fast-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344' ran
1.05 ± 0.08 times faster than 'BRANCH=grammar-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344'
1.05 ± 0.08 times faster than 'BRANCH=grammar-fast \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344'
1.05 ± 0.06 times faster than 'BRANCH=grammar-fast-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344'
1.07 ± 0.08 times faster than 'BRANCH=grammar-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344'
1.57 ± 0.08 times faster than 'BRANCH=master \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344'
1.60 ± 0.09 times faster than 'BRANCH=grammars-early-exit \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
--grammar-file json_numbers.grammar \
-p "List of 20 integers starting from 0" \
--seed 12344'
hyperfine --warmup 1 --runs 10 -L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master --setup 'git checkout {branch} && \
make clean && \
make -j LLAMA_CURL=1 main' 'BRANCH={branch} \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '"'"'{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}'"'"' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt'
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.395 s ± 0.079 s [User: 0.337 s, System: 0.088 s]
Range (min … max): 1.330 s … 1.608 s 10 runs
Benchmark 2: BRANCH=grammar-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.373 s ± 0.039 s [User: 0.339 s, System: 0.092 s]
Range (min … max): 1.332 s … 1.457 s 10 runs
Benchmark 3: BRANCH=grammar-fast \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.291 s ± 0.120 s [User: 0.209 s, System: 0.100 s]
Range (min … max): 1.203 s … 1.571 s 10 runs
Benchmark 4: BRANCH=grammar-fast-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.196 s ± 0.022 s [User: 0.188 s, System: 0.087 s]
Range (min … max): 1.167 s … 1.246 s 10 runs
Benchmark 5: BRANCH=grammar-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.417 s ± 0.092 s [User: 0.349 s, System: 0.087 s]
Range (min … max): 1.339 s … 1.556 s 10 runs
Benchmark 6: BRANCH=grammars-early-exit \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
⠋ Performing warmup runs ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:0 Time (mean ± σ): 1.804 s ± 0.131 s [User: 0.639 s, System: 0.109 s]
Range (min … max): 1.646 s … 2.016 s 10 runs
Benchmark 7: BRANCH=master \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
Time (mean ± σ): 1.682 s ± 0.060 s [User: 0.629 s, System: 0.099 s]
Range (min … max): 1.618 s … 1.792 s 10 runs
Summary
BRANCH=grammar-fast-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt ran
1.08 ± 0.10 times faster than BRANCH=grammar-fast \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
1.15 ± 0.04 times faster than BRANCH=grammar-resampling \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
1.17 ± 0.07 times faster than BRANCH=grammar-fast-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
1.18 ± 0.08 times faster than BRANCH=grammar-resampling-early \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
1.41 ± 0.06 times faster than BRANCH=master \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt
1.51 ± 0.11 times faster than BRANCH=grammars-early-exit \
./main \
-mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
-p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
--seed 12345 --no-display-prompt