This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python3 | |
| # Cold-cache grounding benchmark for Qwen3-VL on AIR — Verkos live-frame use case. | |
| # Runs (frames × templates) cross product so each request is a fresh image+prompt | |
| # (no llama.cpp prompt-cache reuse). Parses native <|box_start|> grounding output | |
| # and reports per-template latency, bbox count, and malformed rate. | |
| # | |
| # Usage: | |
| # python bench.py --base-url http://localhost:8001 \ | |
| # --model qwen3-vl-8b \ | |
| # --frames-dir frames/ --max-tokens 256 |