Toan Nguyen tnq177

## RLHF.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tnq177
                / RLHF.md
            
            
              Created
              May 20, 2023 16:29
                — forked from JoaoLages/RLHF.md
            
              
                Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation 
              
          
    Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.
We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈
RLHF is especially useful in two scenarios 🌟:

You can’t create a good loss function

Example: how do you calculate a metric to measure if the model’s output was funny?


You want to train with production data, but you can’t easily label your production data


## multiple_ssh_setting.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tnq177
                / multiple_ssh_setting.md
            
            
              Created
              January 16, 2021 17:06
                — forked from jexchan/multiple_ssh_setting.md
            
              
                Multiple SSH keys for different github accounts
              
          
    Multiple SSH Keys settings for different github account

create different public key

create different ssh key according the article Mac Set-Up Git
$ ssh-keygen -t rsa -C "your_email@youremail.com"