Miguel Won miguelwon

## gist:1936e1ef8ae9b4f5d02a42d9a23d41a1
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key=get_settings().OPENAI_API_KEY,
)

# ----------------------------------------------------------------------------------------------
async def get_response_openai(aichat: AiChatDB):

    if aichat.model=="gpt-3.5-turbo" or aichat.model=="gpt-4":

## RLHF.md

      
              1 file
            
          
              7 forks
            
          
              35 comments
            
          
              109 stars
            
          
                JoaoLages
                / RLHF.md
            
            
              Last active
              March 26, 2024 18:51
            
              
                Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation 
              
          
    Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.
We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈
RLHF is especially useful in two scenarios 🌟:

You can’t create a good loss function

Example: how do you calculate a metric to measure if the model’s output was funny?


You want to train with production data, but you can’t easily label your production data
	from openai import AsyncOpenAI

	client = AsyncOpenAI(
	api_key=get_settings().OPENAI_API_KEY,
	)

	# ----------------------------------------------------------------------------------------------
	async def get_response_openai(aichat: AiChatDB):

	if aichat.model=="gpt-3.5-turbo" or aichat.model=="gpt-4":