Skip to content

Instantly share code, notes, and snippets.

@gamingflexer
gamingflexer / main.py
Created July 19, 2023 09:02
Anthropic's tokenizer for Claude
from transformers import PreTrainedTokenizerFast
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="/home/ubuntu/LLM/module/claude-v1-tokenization.json")
text = "Hello, this is a test input."
tokens = fast_tokenizer.tokenize(text)
tokens
@JoaoLages
JoaoLages / RLHF.md
Last active July 3, 2024 20:41
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data
@riceissa
riceissa / anki_algorithm.py
Last active December 15, 2023 09:36
my current understanding of Anki's spacing algorithm
"""
This is my understanding of the Anki scheduling algorithm, which I mostly
got from watching https://www.youtube.com/watch?v=lz60qTP2Gx0
and https://www.youtube.com/watch?v=1XaJjbCSXT0
and from reading
https://faqs.ankiweb.net/what-spaced-repetition-algorithm.html
There is also https://github.com/dae/anki/blob/master/anki/sched.py but I find
it really hard to understand.
Things I don't bother to implement here: the random fudge factor (that Anki
@zacs
zacs / tars-lines.txt
Last active October 30, 2023 14:04
Every line spoken by TARS in Interstellar. Not many, but in case you want to train a NN or Markov chain.
How did you find this place?
You had the coordinates for this facility marked on your map. Where did you get them?
How did you find us?
All here, Mr Cooper. Plenty of slaves for my robot colony.
I have a cue light I can turn on when I’m joking, if you like.
You can use it to find your way back to the ship after I blow you out the airlock.
One hundred percent.
Ninety percent.
Absolute honesty isn’t always the most diplomatic, or safe form of communication with emotional beings.
Eight months to Mars, then counter-orbital slingshot around.