Skip to content

Instantly share code, notes, and snippets.

View mythikal03's full-sized avatar
๐Ÿ…

Matthew Torrey mythikal03

๐Ÿ…
View GitHub Profile
#!/usr/bin/env bash
# benchmark_vllm.sh - streaming-aware vLLM throughput suite
# Usage: ./benchmark_vllm.sh <service|base_url> [short|long|multi|spec|all]
# short up to 1k single-shot decode
# long up to 16k single-shot decode
# multi 4-turn growing context
# spec best-effort speculative decoding acceptance from /metrics
# all everything (default)
#
# Service mode reads /etc/systemd/system/<service>.service by default.