Skip to content

Instantly share code, notes, and snippets.

View sdrakulich's full-sized avatar

Stefan sdrakulich

View GitHub Profile
@sdrakulich
sdrakulich / pref_model.md
Created March 18, 2025 22:08 — forked from kalomaze/pref_model.md
pref modeling overview

the generic basics of preference reward modeling

The Bradley-Terry model works like this:

  • It's based on a chosen/rejected split
  • The model is trained on binary judgements of specific content/samples as being either 'preferred' or 'dispreferred'
  • The log ratio between preferred and dispreferred can be used as the natural reward signal

class RescaleDescentTrainer(Trainer):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Initialize all buffers
self.tokens_buffer = [] # for raw token loss
self.weighted_tokens_buffer = [] # for entropy weighted token loss
self.unigram_rate_buffer = []
self.bigram_rate_buffer = []
self.trigram_rate_buffer = []
self.weighted_unigram_buffer = []
@sdrakulich
sdrakulich / dps_sup_nodes.md
Created August 31, 2024 08:24 — forked from VictorTaelin/dps_sup_nodes.md
Accelerating Discrete Program Search with SUP Nodes

Accelerating Discrete Program Search

I am investigating how to use Bend (a parallel language) to accelerate Symbolic AI; in special, Discrete Program Search. Basically, think of it as an alternative to LLMs, GPTs, NNs, that is also capable of generating code, but by entirely different means. This kind of approach was never scaled with mass compute before - it wasn't possible! - but Bend changes this. So, my idea was to do it, and see where it goes.

Now, while I was implementing some candidate algorithms on Bend, I realized that, rather than mass parallelism, I could use an entirely different mechanism to speed things up: SUP Nodes. Basically, it is a feature that Bend inherited from its underlying model ("Interaction Combinators") that, in simple terms, allows us to combine multiple functions into a single superposed one, and apply them all to an argument "at the same time". In short, it allows us to call N functions at a fraction of the expected cost. Or, in simple terms: why parallelize when we can sha

├── atom-dark-syntax@0.29.0
├── atom-dark-ui@0.53.2
├── atom-light-syntax@0.29.0
├── atom-light-ui@0.46.2
├── base16-tomorrow-dark-theme@1.5.0
├── base16-tomorrow-light-theme@1.5.0
├── one-dark-ui@1.12.1
├── one-light-ui@1.12.1
├── one-dark-syntax@1.8.2
├── one-light-syntax@1.8.2