Skip to content

Instantly share code, notes, and snippets.

View harshduche's full-sized avatar

Harsh Duche harshduche

View GitHub Profile
@harshduche
harshduche / bench.py
Last active May 15, 2026 11:37
qwen3-vl single-stream VQA benchmark
#!/usr/bin/env python3
# Cold-cache grounding benchmark for Qwen3-VL on AIR — Verkos live-frame use case.
# Runs (frames × templates) cross product so each request is a fresh image+prompt
# (no llama.cpp prompt-cache reuse). Parses native <|box_start|> grounding output
# and reports per-template latency, bbox count, and malformed rate.
#
# Usage:
# python bench.py --base-url http://localhost:8001 \
# --model qwen3-vl-8b \
# --frames-dir frames/ --max-tokens 256