Skip to content

Instantly share code, notes, and snippets.

View miguelwon's full-sized avatar

Miguel Won miguelwon

View GitHub Profile
@bsenftner
bsenftner / gist:1936e1ef8ae9b4f5d02a42d9a23d41a1
Last active March 23, 2024 11:52
working example of streaming responses from OpenAI and simultaniously saving them to the DB as they stream in
from openai import AsyncOpenAI
client = AsyncOpenAI(
api_key=get_settings().OPENAI_API_KEY,
)
# ----------------------------------------------------------------------------------------------
async def get_response_openai(aichat: AiChatDB):
if aichat.model=="gpt-3.5-turbo" or aichat.model=="gpt-4":
@JoaoLages
JoaoLages / RLHF.md
Last active March 26, 2024 18:51
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data