Model: gemini-3.1-flash-lite-preview
Date: 2026-05-13
Context: Final tuning before beta distribution of a Japanese DTP proofreading tool.
Setting thinking_budget=2048 does not cap thinking tokens. With max_output_tokens=8192, Gemini consumed 7,862 thinking tokens for a 279-character input. Lowering max_output_tokens to 2,048 collapsed thinking to 560 tokens — a 14× reduction with identical detection results. The model uses max_output_tokens as the actual ceiling and decides "how long to think" based on the available headroom.