Stefan sdrakulich

## pref_model.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                sdrakulich
                / pref_model.md
            
            
              Created
              March 18, 2025 22:08
                — forked from kalomaze/pref_model.md
            
              
                pref modeling overview
              
          
    the generic basics of preference reward modeling

The Bradley-Terry model works like this:

It's based on a chosen/rejected split
The model is trained on binary judgements of specific content/samples as being either 'preferred' or 'dispreferred'
The log ratio between preferred and dispreferred can be used as the natural reward signal


## Weird Loss Rescaling
class RescaleDescentTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # Initialize all buffers
        self.tokens_buffer = []          # for raw token loss
        self.weighted_tokens_buffer = [] # for entropy weighted token loss
        self.unigram_rate_buffer = []
        self.bigram_rate_buffer = []
        self.trigram_rate_buffer = []
        self.weighted_unigram_buffer = []

## dps_sup_nodes.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                sdrakulich
                / dps_sup_nodes.md
            
            
              Created
              August 31, 2024 08:24
                — forked from VictorTaelin/dps_sup_nodes.md
            
              
                Accelerating Discrete Program Search with SUP Nodes
              
          
    Accelerating Discrete Program Search

I am investigating how to use Bend (a parallel language) to accelerate Symbolic AI; in special, Discrete Program Search. Basically, think of it as an alternative to LLMs, GPTs, NNs, that is also capable of generating code, but by entirely different means. This kind of approach was never scaled with mass compute before - it wasn't possible! - but Bend changes this. So, my idea was to do it, and see where it goes.
Now, while I was implementing some candidate algorithms on Bend, I realized that, rather than mass parallelism, I could use an entirely different mechanism to speed things up: SUP Nodes. Basically, it is a feature that Bend inherited from its underlying model ("Interaction Combinators") that, in simple terms, allows us to combine multiple functions into a single superposed one, and apply them all to an argument "at the same time". In short, it allows us to call N functions at a fraction of the expected cost. Or, in simple terms: why parallelize when we can sha

  
## gist:b6cbe5a099abd17e7d76fd5dc767fb0b
├── atom-dark-syntax@0.29.0
├── atom-dark-ui@0.53.2
├── atom-light-syntax@0.29.0
├── atom-light-ui@0.46.2
├── base16-tomorrow-dark-theme@1.5.0
├── base16-tomorrow-light-theme@1.5.0
├── one-dark-ui@1.12.1
├── one-light-ui@1.12.1
├── one-dark-syntax@1.8.2
├── one-light-syntax@1.8.2
	class RescaleDescentTrainer(Trainer):
	def __init__(self, args, *kwargs):
	super().__init__(args, *kwargs)
	# Initialize all buffers
	self.tokens_buffer = [] # for raw token loss
	self.weighted_tokens_buffer = [] # for entropy weighted token loss
	self.unigram_rate_buffer = []
	self.bigram_rate_buffer = []
	self.trigram_rate_buffer = []
	self.weighted_unigram_buffer = []
	├── atom-dark-syntax@0.29.0
	├── atom-dark-ui@0.53.2
	├── atom-light-syntax@0.29.0
	├── atom-light-ui@0.46.2
	├── base16-tomorrow-dark-theme@1.5.0
	├── base16-tomorrow-light-theme@1.5.0
	├── one-dark-ui@1.12.1
	├── one-light-ui@1.12.1
	├── one-dark-syntax@1.8.2
	├── one-light-syntax@1.8.2