Skip to content

Instantly share code, notes, and snippets.

@selfup
Last active May 23, 2026 14:11
Show Gist options
  • Select an option

  • Save selfup/241448ab853aa284f71d9350194ba002 to your computer and use it in GitHub Desktop.

Select an option

Save selfup/241448ab853aa284f71d9350194ba002 to your computer and use it in GitHub Desktop.
Ministral 3 (3B, 8B) Instruct - Ubuntu 3800x 3060ti llama.cpp benchmark
set -eou pipefail
llama-bench \
-m ~/.models/Ministral-3-3B-Instruct-2512-Q4_K_M.gguf \
-m ~/.models/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf \
-p 512 -n 128 -fa 1 -ngl 99 \
-d 0,4096,8192,16384 \
-o md > local-ministral-q4km-${MACHINE}.md 2>/dev/null
@selfup

selfup commented May 10, 2026

Copy link
Copy Markdown
Author
model size params backend ngl fa test t/s
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 5390.67 ± 69.51
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 139.09 ± 2.12
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d4096 4318.27 ± 36.41
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d4096 119.71 ± 1.62
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d8192 3475.10 ± 307.83
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d8192 106.04 ± 2.56
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d16384 2612.65 ± 144.04
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d16384 87.11 ± 1.15
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 2439.48 ± 192.02
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 68.64 ± 1.31
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d4096 2112.40 ± 180.81
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d4096 61.83 ± 0.70
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d8192 1849.18 ± 124.90
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d8192 56.62 ± 0.76
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d16384 1494.13 ± 107.49
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d16384 49.11 ± 0.63

build: 1e5ad35d5 (9093)

@selfup

selfup commented May 23, 2026

Copy link
Copy Markdown
Author
model size params backend ngl fa test t/s
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 5380.04 ± 111.46
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 146.67 ± 0.21
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d4096 4342.83 ± 49.64
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d4096 126.16 ± 0.27
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d8192 3555.07 ± 150.68
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d8192 112.13 ± 0.13
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 pp512 @ d16384 2720.28 ± 18.16
mistral3 3B Q4_K - Medium 1.99 GiB 3.43 B CUDA 99 1 tg128 @ d16384 91.19 ± 0.10
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 2516.59 ± 27.66
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 72.50 ± 0.05
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d4096 2203.07 ± 11.29
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d4096 65.56 ± 0.06
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d8192 1943.73 ± 12.84
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d8192 60.38 ± 0.02
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 pp512 @ d16384 1573.30 ± 5.67
mistral3 8B Q4_K - Medium 4.83 GiB 8.49 B CUDA 99 1 tg128 @ d16384 51.96 ± 0.03

build: b0df4c0cf (9297)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment