Devansh Jindal devansh21640

## Results.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                devansh21640
                / Results.md
            
            
              Created
              April 9, 2026 13:38
            
              
                Challenge LLM Inference Benchmark
              
          
    Inference Benchmark: vLLM vs LMDeploy on Kaggle T4 (AWQ 7B)

Setup


Model: Qwen/Qwen2.5-7B-Instruct-AWQ (4-bit AWQ quantisation)
Hardware: Kaggle T4 GPU (15GB VRAM)
vLLM: 0.7.1 | LMDeploy: TurbomindEngine
Measurement: time.perf_counter() around full generation; vLLM run in offline LLMEngine mode (scheduling latency captured, not true streaming TTFT)
Batch size: 1 | Runs per prompt: 1 (first-run warmup included)
SGLang: attempted but sgl_kernel wheel on Kaggle is compiled for SM100 (H100); T4 is SM75 — incompatible at import time