Hugo Cisneros hugcis

## RLHF.md

      
              1 file
            
          
              8 forks
            
          
              37 comments
            
          
              113 stars
            
          
                JoaoLages
                / RLHF.md
            
            
              Last active
              July 18, 2024 22:10
            
              
                Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation 
              
          
    Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.
We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈
RLHF is especially useful in two scenarios 🌟:

You can’t create a good loss function

Example: how do you calculate a metric to measure if the model’s output was funny?


You want to train with production data, but you can’t easily label your production data


## latency_numbers.md

      
              1 file
            
          
              26 forks
            
          
              2 comments
            
          
              77 stars
            
          
                GLMeece
                / latency_numbers.md
            
            
              Last active
              May 22, 2024 15:57
            
              
                Latency Numbers Every Programmer Should Know - MarkDown Fork
              
          
    Latency Comparison Numbers

Note: "Forked" from Latency Numbers Every Programmer Should Know


Event
Nanoseconds
Microseconds
Milliseconds
Comparison


L1 cache reference
0.5
-
-
-


Branch mispredict
5.0
-
-
-


L2 cache reference
7.0
-
-
14x L1 cache


Mutex lock/unlock
25.0
-
-
-
Event	Nanoseconds	Microseconds	Milliseconds	Comparison
L1 cache reference	0.5	-	-	-
Branch mispredict	5.0	-	-	-
L2 cache reference	7.0	-	-	14x L1 cache
Mutex lock/unlock	25.0	-	-	-