Skip to content

Instantly share code, notes, and snippets.

@jmamou
jmamou / demo_inference_damage.py
Last active April 29, 2026 12:08
Demo scripts for STANDALONE speculative decoding vocab mismatch bug (sgl-project/sglang PR #23838)
"""
Demonstrates the actual output damage caused by the STANDALONE vocab-mismatch
bug during inference — without requiring model weights or a GPU.
The STANDALONE speculative decoding loop assembles output token-by-token:
- Accepted draft tokens → decoded by TARGET tokenizer (CORRUPTED if vocab differs)
- Rejected positions → resampled by target (correct)
This script simulates that assembly:
1. "Target output" is produced by encoding the ground-truth continuation