davidsvaughn

## threshold_tuning.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                davidsvaughn
                / threshold_tuning.md
            
            
              Created
              August 19, 2025 13:38
            
              
                Threshold Tuning in vLLM : 2 Hacks
              
          
    Yep — you can keep a plain CausalLM behind vLLM and still get a tunable yes/no cutoff with zero architecture changes. You’ve got two clean patterns that work with the OpenAI-compatible server:

Option A (recommended): Score → threshold in your client


Ask vLLM for the next-token logprobs (no real decoding—max_tokens=1, temperature=0).
Read the logprob of the “yes” token and the “no” token at that step.
Use a threshold on the logprob difference $\Delta = \log p(\text{yes}) - \log p(\text{no})$.


## rwa.py
from keras.layers import Recurrent
import keras.backend as K
from keras import activations
from keras import initializers
from keras import regularizers
from keras import constraints
from keras.engine import Layer
from keras.engine import InputSpec
	from keras.layers import Recurrent
	import keras.backend as K
	from keras import activations
	from keras import initializers
	from keras import regularizers
	from keras import constraints
	from keras.engine import Layer
	from keras.engine import InputSpec