Skip to content

Instantly share code, notes, and snippets.

@lmmx
Last active January 22, 2025 22:15
Show Gist options
  • Select an option

  • Save lmmx/ab6563e681d936fd9c3c864447fbf19f to your computer and use it in GitHub Desktop.

Select an option

Save lmmx/ab6563e681d936fd9c3c864447fbf19f to your computer and use it in GitHub Desktop.
Aider bench 1.5B (AWQ) Python subset, diff format
- dirname: 2025-01-21-18-10-00--bench-1.5b-awq-py-diff
test_cases: 34
model: hosted_vllm/casperhansen/deepseek-r1-distill-qwen-1.5b-awq
edit_format: diff
commit_hash: 843720a
pass_rate_1: 0.0
pass_rate_2: 0.0
pass_num_1: 0
pass_num_2: 0
percent_cases_well_formed: 35.3
error_outputs: 101
num_malformed_responses: 91
num_with_malformed_responses: 22
user_asks: 8
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 4
total_tests: 225
command: aider --model hosted_vllm/casperhansen/deepseek-r1-distill-qwen-1.5b-awq
date: 2025-01-21
versions: 0.72.2.dev
seconds_per_case: 2090.8
total_cost: 0.0000
- dirname: 2025-01-21-20-24-38--bench-7b-awq-py-diff
test_cases: 34
model: hosted_vllm/casperhansen/deepseek-r1-distill-qwen-7b-awq
edit_format: diff
commit_hash: 843720a
pass_rate_1: 0.0
pass_rate_2: 0.0
pass_num_1: 0
pass_num_2: 0
percent_cases_well_formed: 41.2
error_outputs: 67
num_malformed_responses: 66
num_with_malformed_responses: 20
user_asks: 20
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
total_tests: 225
command: aider --model hosted_vllm/casperhansen/deepseek-r1-distill-qwen-7b-awq
date: 2025-01-21
versions: 0.72.2.dev
seconds_per_case: 603.0
total_cost: 0.0000
- dirname: 2025-01-22-16-42-17--bench-14b-awq-py-diff
test_cases: 34
model: hosted_vllm/casperhansen/deepseek-r1-distill-qwen-14b-awq
edit_format: diff
commit_hash: 843720a
pass_rate_1: 0.0
pass_rate_2: 0.0
pass_num_1: 0
pass_num_2: 0
percent_cases_well_formed: 55.9
error_outputs: 51
num_malformed_responses: 46
num_with_malformed_responses: 15
user_asks: 8
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 2
test_timeouts: 0
total_tests: 225
command: aider --model hosted_vllm/casperhansen/deepseek-r1-distill-qwen-14b-awq
date: 2025-01-22
versions: 0.72.2.dev
seconds_per_case: 1375.5
total_cost: 0.0000
- dirname: 2025-01-22-18-16-09--bench-32b-awq-py-diff
test_cases: 34
model: hosted_vllm/casperhansen/deepseek-r1-distill-qwen-32b-awq
edit_format: diff
commit_hash: 843720a
pass_rate_1: 0.0
pass_rate_2: 0.0
pass_num_1: 0
pass_num_2: 0
percent_cases_well_formed: 100.0
error_outputs: 65
num_malformed_responses: 0
num_with_malformed_responses: 0
user_asks: 0
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
test_timeouts: 0
total_tests: 225
command: aider --model hosted_vllm/casperhansen/deepseek-r1-distill-qwen-32b-awq
date: 2025-01-22
versions: 0.72.2.dev
seconds_per_case: 3025.7
total_cost: 0.0000
@lmmx
Copy link
Author

lmmx commented Jan 21, 2025

Host:

vllm serve "casperhansen/deepseek-r1-distill-qwen-7b-awq" --quantization awq --dtype half --gpu-memory-utilization 0.95

Aider bench Docker (./benchmark/docker.sh):

export AIDER_MODEL="hosted_vllm/casperhansen/deepseek-r1-distill-qwen-1.5b-awq"
./benchmark/benchmark.py bench-1.5b-awq-py-diff --model $AIDER_MODEL --edit-format diff --threads 2000 --exercises-dir polyglot-benchmark --new --num-tests -1 --languages python

@lmmx
Copy link
Author

lmmx commented Jan 21, 2025

7B

export AIDER_MODEL="hosted_vllm/casperhansen/deepseek-r1-distill-qwen-7b-awq"
./benchmark/benchmark.py bench-7b-awq-py-diff --model $AIDER_MODEL --edit-format diff --threads 2000 --exercises-dir polyglot-benchmark --new --num-tests -1 --languages python

@lmmx
Copy link
Author

lmmx commented Jan 21, 2025

For 14B I had to switch to TGI to serve

model=casperhansen/deepseek-r1-distill-qwen-14b-awq
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8000:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:3.0.1 \
    --model-id $model

@lmmx
Copy link
Author

lmmx commented Jan 22, 2025

32B as for 14B but with:

model=casperhansen/deepseek-r1-distill-qwen-32b-awq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment