Skip to content

Instantly share code, notes, and snippets.

@ochafik
Last active May 21, 2024 16:29
Show Gist options
  • Save ochafik/56733ecd349874fd66cedcfdfa619e9c to your computer and use it in GitHub Desktop.
Save ochafik/56733ecd349874fd66cedcfdfa619e9c to your computer and use it in GitHub Desktop.
llama.cpp resampling fix test

Here are benchmarks for combinations of the following changes (& master @ this commit):

Results: resampling fix is by far the most useful (might be undoing the regression to ggerganov/llama.cpp#4306 likely introduced in ggerganov/llama.cpp#6240; ~2x faster overall inference w/ Phi-2), and codepoint caching is also a bit useful on top of it. The early exit might require extra confirmation but seems useless on top of the others.

git clone https://github.com/ochafik/llama.cpp llama.cpp-ochafik
cd llama.cpp-ochafik

Tests results below are on an M3 Pro 36GB w/ 18GPU cores

hyperfine --warmup 1 --runs 10 \
    -L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master \
    --setup 'git checkout {branch} && \
             make clean && \
             make -j LLAMA_CURL=1 main' \
    'BRANCH={branch} \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '"'"'{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}'"'"' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt'
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.889 s ±  0.058 s    [User: 0.297 s, System: 0.100 s]
  Range (min … max):    1.809 s …  1.969 s    10 runs
 
Benchmark 2: BRANCH=grammar-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.932 s ±  0.067 s    [User: 0.303 s, System: 0.101 s]
  Range (min … max):    1.821 s …  2.055 s    10 runs
 
Benchmark 3: BRANCH=grammar-fast \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      2.112 s ±  0.127 s    [User: 0.378 s, System: 0.121 s]
  Range (min … max):    2.005 s …  2.436 s    10 runs
 
Benchmark 4: BRANCH=grammar-fast-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.895 s ±  0.118 s    [User: 0.256 s, System: 0.111 s]
  Range (min … max):    1.766 s …  2.109 s    10 runs
 
Benchmark 5: BRANCH=grammar-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.913 s ±  0.062 s    [User: 0.303 s, System: 0.120 s]
  Range (min … max):    1.854 s …  2.074 s    10 runs
 
Benchmark 6: BRANCH=grammars-early-exit \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      3.751 s ±  0.200 s    [User: 1.893 s, System: 0.159 s]
  Range (min … max):    3.558 s …  4.221 s    10 runs
 
Benchmark 7: BRANCH=master \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      3.619 s ±  0.075 s    [User: 1.882 s, System: 0.148 s]
  Range (min … max):    3.561 s …  3.737 s    10 runs
 
Summary
  'BRANCH=grammar-fast-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt' ran
    1.00 ± 0.07 times faster than 'BRANCH=grammar-fast-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt'
    1.01 ± 0.05 times faster than 'BRANCH=grammar-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt'
    1.02 ± 0.05 times faster than 'BRANCH=grammar-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt'
    1.12 ± 0.08 times faster than 'BRANCH=grammar-fast \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt'
    1.92 ± 0.07 times faster than 'BRANCH=master \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt'
    1.99 ± 0.12 times faster than 'BRANCH=grammars-early-exit \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"items": {"type": "number"}, "minItems": 10, "maxItems": 100}' \
            -p "JSON list of 50 integers starting from 100000" \
            --seed 12345 --no-display-prompt'
( export COMMON_ARGS=(
    -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf
    -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf
    --prompt-cache issue4218.bin
    --grammar-file issue4218.gbnf
    -f issue4218.txt
    -c 3400
  ) && \
  hyperfine --warmup 1 --runs 10 \
    -L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master \
    --setup "\
      git checkout {branch} && \
      make clean && make -j LLAMA_CURL=1 main && \
      rm -f issue4218.bin && \
      ./main ${COMMON_ARGS[*]} -n 1" \
    "BRANCH={branch} \
      ./main ${COMMON_ARGS[*]} -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt" )
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
  Time (mean ± σ):      4.852 s ±  0.090 s    [User: 0.450 s, System: 0.209 s]
  Range (min … max):    4.737 s …  4.987 s    10 runs
 
Benchmark 2: BRANCH=grammar-resampling       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
  Time (mean ± σ):      4.877 s ±  0.067 s    [User: 0.549 s, System: 0.245 s]
  Range (min … max):    4.777 s …  4.995 s    10 runs
 
Benchmark 3: BRANCH=grammar-fast       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
  Time (mean ± σ):      7.498 s ±  0.053 s    [User: 3.444 s, System: 0.287 s]
  Range (min … max):    7.461 s …  7.642 s    10 runs
 
Benchmark 4: BRANCH=grammar-fast-resampling       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
  Time (mean ± σ):      4.685 s ±  0.048 s    [User: 0.485 s, System: 0.205 s]
  Range (min … max):    4.633 s …  4.778 s    10 runs
 
Benchmark 5: BRANCH=grammar-resampling-early       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
  Time (mean ± σ):      4.605 s ±  0.064 s    [User: 0.442 s, System: 0.231 s]
  Range (min … max):    4.486 s …  4.729 s    10 runs
 
Benchmark 6: BRANCH=grammars-early-exit       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
  Time (mean ± σ):      5.725 s ±  0.082 s    [User: 1.718 s, System: 0.310 s]
  Range (min … max):    5.643 s …  5.901 s    10 runs
 
Benchmark 7: BRANCH=master       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt
  Time (mean ± σ):      7.995 s ±  0.098 s    [User: 3.847 s, System: 0.305 s]
  Range (min … max):    7.885 s …  8.169 s    10 runs
 
Summary
  'BRANCH=grammar-resampling-early       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt' ran
    1.02 ± 0.02 times faster than 'BRANCH=grammar-fast-resampling       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
    1.05 ± 0.02 times faster than 'BRANCH=grammar-fast-resampling-early       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
    1.06 ± 0.02 times faster than 'BRANCH=grammar-resampling       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
    1.24 ± 0.02 times faster than 'BRANCH=grammars-early-exit       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
    1.63 ± 0.03 times faster than 'BRANCH=grammar-fast       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
    1.74 ± 0.03 times faster than 'BRANCH=master       ./main -mu https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf -m models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf --prompt-cache issue4218.bin --grammar-file issue4218.gbnf -f issue4218.txt -c 3400 -n 128 --prompt-cache-ro --seed 12345 --no-display-prompt'
hyperfine --warmup 1 --runs 10 \
    -L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master \
    --setup 'git checkout {branch} && \
             make clean && \
             make -j LLAMA_CURL=1 main' \
    'BRANCH={branch} \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt'
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      3.342 s ±  0.055 s    [User: 1.073 s, System: 0.153 s]
  Range (min … max):    3.280 s …  3.479 s    10 runs
 
Benchmark 2: BRANCH=grammar-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      3.379 s ±  0.083 s    [User: 1.074 s, System: 0.152 s]
  Range (min … max):    3.312 s …  3.549 s    10 runs
 
Benchmark 3: BRANCH=grammar-fast \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      3.584 s ±  0.196 s    [User: 0.602 s, System: 0.220 s]
  Range (min … max):    3.435 s …  4.082 s    10 runs
 
Benchmark 4: BRANCH=grammar-fast-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      2.579 s ±  0.029 s    [User: 0.345 s, System: 0.142 s]
  Range (min … max):    2.549 s …  2.637 s    10 runs
 
Benchmark 5: BRANCH=grammar-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      3.373 s ±  0.061 s    [User: 1.074 s, System: 0.148 s]
  Range (min … max):    3.316 s …  3.482 s    10 runs
 
Benchmark 6: BRANCH=grammars-early-exit \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      6.659 s ±  0.081 s    [User: 3.621 s, System: 0.276 s]
  Range (min … max):    6.575 s …  6.804 s    10 runs
 
Benchmark 7: BRANCH=master \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      6.655 s ±  0.054 s    [User: 3.618 s, System: 0.289 s]
  Range (min … max):    6.589 s …  6.786 s    10 runs
 
Summary
  'BRANCH=grammar-fast-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt' ran
    1.30 ± 0.03 times faster than 'BRANCH=grammar-fast-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt'
    1.31 ± 0.03 times faster than 'BRANCH=grammar-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt'
    1.31 ± 0.04 times faster than 'BRANCH=grammar-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt'
    1.39 ± 0.08 times faster than 'BRANCH=grammar-fast \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt'
    2.58 ± 0.04 times faster than 'BRANCH=master \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt'
    2.58 ± 0.04 times faster than 'BRANCH=grammars-early-exit \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            --grammar-file tsconfig.gbnf \
            -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:" \
            --seed 12345 --no-display-prompt'
hyperfine \
    --warmup 1 --runs 10 \
    -L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master \
    --setup 'git checkout {branch} && make clean && make -j LLAMA_CURL=1 main' \
    'BRANCH={branch} \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344'
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344
  Time (mean ± σ):      1.055 s ±  0.041 s    [User: 0.251 s, System: 0.077 s]
  Range (min … max):    1.017 s …  1.153 s    10 runs
 
Benchmark 2: BRANCH=grammar-resampling \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344
  Time (mean ± σ):      1.069 s ±  0.071 s    [User: 0.245 s, System: 0.071 s]
  Range (min … max):    1.007 s …  1.213 s    10 runs
 
Benchmark 3: BRANCH=grammar-fast \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344
  Time (mean ± σ):      1.055 s ±  0.066 s    [User: 0.215 s, System: 0.085 s]
  Range (min … max):    0.996 s …  1.222 s    10 runs
 
Benchmark 4: BRANCH=grammar-fast-resampling \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344
  Time (mean ± σ):      1.003 s ±  0.042 s    [User: 0.184 s, System: 0.090 s]
  Range (min … max):    0.949 s …  1.106 s    10 runs
 
Benchmark 5: BRANCH=grammar-resampling-early \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344
  Time (mean ± σ):      1.052 s ±  0.062 s    [User: 0.239 s, System: 0.091 s]
  Range (min … max):    1.008 s …  1.219 s    10 runs
 
Benchmark 6: BRANCH=grammars-early-exit \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344
  Time (mean ± σ):      1.605 s ±  0.067 s    [User: 0.718 s, System: 0.102 s]
  Range (min … max):    1.528 s …  1.706 s    10 runs
 
Benchmark 7: BRANCH=master \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344
  Time (mean ± σ):      1.578 s ±  0.049 s    [User: 0.718 s, System: 0.099 s]
  Range (min … max):    1.533 s …  1.709 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'BRANCH=grammar-fast-resampling \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344' ran
    1.05 ± 0.08 times faster than 'BRANCH=grammar-resampling-early \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344'
    1.05 ± 0.08 times faster than 'BRANCH=grammar-fast \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344'
    1.05 ± 0.06 times faster than 'BRANCH=grammar-fast-resampling-early \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344'
    1.07 ± 0.08 times faster than 'BRANCH=grammar-resampling \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344'
    1.57 ± 0.08 times faster than 'BRANCH=master \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344'
    1.60 ± 0.09 times faster than 'BRANCH=grammars-early-exit \
      ./main \
        -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
        --grammar-file json_numbers.grammar \
        -p "List of 20 integers starting from 0" \
        --seed 12344'
hyperfine --warmup 1 --runs 10     -L branch grammar-fast-resampling-early,grammar-resampling,grammar-fast,grammar-fast-resampling,grammar-resampling-early,grammars-early-exit,master     --setup 'git checkout {branch} && \
             make clean && \
             make -j LLAMA_CURL=1 main'     'BRANCH={branch} \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '"'"'{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}'"'"' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt'
Show output
Benchmark 1: BRANCH=grammar-fast-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.395 s ±  0.079 s    [User: 0.337 s, System: 0.088 s]
  Range (min … max):    1.330 s …  1.608 s    10 runs
 
Benchmark 2: BRANCH=grammar-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.373 s ±  0.039 s    [User: 0.339 s, System: 0.092 s]
  Range (min … max):    1.332 s …  1.457 s    10 runs
 
Benchmark 3: BRANCH=grammar-fast \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.291 s ±  0.120 s    [User: 0.209 s, System: 0.100 s]
  Range (min … max):    1.203 s …  1.571 s    10 runs
 
Benchmark 4: BRANCH=grammar-fast-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.196 s ±  0.022 s    [User: 0.188 s, System: 0.087 s]
  Range (min … max):    1.167 s …  1.246 s    10 runs
 
Benchmark 5: BRANCH=grammar-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.417 s ±  0.092 s    [User: 0.349 s, System: 0.087 s]
  Range (min … max):    1.339 s …  1.556 s    10 runs
 
Benchmark 6: BRANCH=grammars-early-exit \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
  ⠋ Performing warmup runs         ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:0  Time (mean ± σ):      1.804 s ±  0.131 s    [User: 0.639 s, System: 0.109 s]
  Range (min … max):    1.646 s …  2.016 s    10 runs
 
Benchmark 7: BRANCH=master \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
  Time (mean ± σ):      1.682 s ±  0.060 s    [User: 0.629 s, System: 0.099 s]
  Range (min … max):    1.618 s …  1.792 s    10 runs
 
Summary
  BRANCH=grammar-fast-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt ran
    1.08 ± 0.10 times faster than BRANCH=grammar-fast \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
    1.15 ± 0.04 times faster than BRANCH=grammar-resampling \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
    1.17 ± 0.07 times faster than BRANCH=grammar-fast-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
    1.18 ± 0.08 times faster than BRANCH=grammar-resampling-early \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
    1.41 ± 0.06 times faster than BRANCH=master \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
    1.51 ± 0.11 times faster than BRANCH=grammars-early-exit \
        ./main \
            -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
            -j '{"title": "AnswerFormat", "type": "object", "properties": {"last_user_message_intent": {"type": "string" }, "function_name": {"type": "string" }, "invocation": {"type": "string" }}, "required": [ "last_user_message_intent", "function_name", "invocation"]}' \
            -p "Describe a function call of a tool in JSON format after a reminder of the last user message intent." \
            --seed 12345 --no-display-prompt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment