Harsh Duche harshduche

## bench.py
#!/usr/bin/env python3
# Cold-cache grounding benchmark for Qwen3-VL on AIR — Verkos live-frame use case.
# Runs (frames × templates) cross product so each request is a fresh image+prompt
# (no llama.cpp prompt-cache reuse). Parses native <|box_start|> grounding output
# and reports per-template latency, bbox count, and malformed rate.
#
# Usage:
#   python bench.py --base-url http://localhost:8001 \
#                   --model qwen3-vl-8b \
#                   --frames-dir frames/ --max-tokens 256
	#!/usr/bin/env python3
	# Cold-cache grounding benchmark for Qwen3-VL on AIR — Verkos live-frame use case.
	# Runs (frames × templates) cross product so each request is a fresh image+prompt
	# (no llama.cpp prompt-cache reuse). Parses native <\|box_start\|> grounding output
	# and reports per-template latency, bbox count, and malformed rate.
	#
	# Usage:
	# python bench.py --base-url http://localhost:8001 \
	# --model qwen3-vl-8b \
	# --frames-dir frames/ --max-tokens 256