unyaya unhaya

## README.md

      
        
          
            
              
              1 file
            
          
          
            
              
              0 forks
            
          
            
              
                
                0 comments
              
            
          
            
              
              0 stars
            
          
        
        
          
              
          
          
            
                unhaya
                / README.md
            
            
              Created
              May 13, 2026 12:01
            
              
                gemini-3.1-flash-lite-preview: thinking_budget is a soft hint, max_output_tokens is the real ceiling (14x reduction observed)
              
          
        
      
        

      
      
    thinking_budget is a lie — max_output_tokens is the real ceiling

Model: gemini-3.1-flash-lite-preview
Date: 2026-05-13
Context: Final tuning before beta distribution of a Japanese DTP proofreading tool.
TL;DR

Setting thinking_budget=2048 does not cap thinking tokens. With max_output_tokens=8192, Gemini consumed 7,862 thinking tokens for a 279-character input. Lowering max_output_tokens to 2,048 collapsed thinking to 560 tokens — a 14× reduction with identical detection results. The model uses max_output_tokens as the actual ceiling and decides "how long to think" based on the available headroom.