Skip to content

Instantly share code, notes, and snippets.

View eedeebee's full-sized avatar

Eric Bloch eedeebee

  • Hillsborough, CA
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@chrisjacob
chrisjacob / README.md
Created February 14, 2011 14:31
Setup GitHub Pages "gh-pages" branch as a subfolder within the "master" project on your local checkout - a step-by-step guide.

Intro

Setup GitHub Pages "gh-pages" branch as a subfolder within the "master" project on your local checkout.

IMPORTANT

If you plan on switching between different branches (e.g. git checkout master-experiment then revert back with git checkout master) you will loose your child folder from this tutorial (because it's in your .gitignore and is not part of your master branch).

@technoweenie
technoweenie / github_oauth_busy_developer_guide.md
Created May 30, 2010 18:34
GitHub OAuth Busy Developer's Guide

GitHub OAuth Busy Developer's Guide

This is a quick guide to OAuth2 support in GitHub for developers. This is still experimental and could change at any moment. This Gist will serve as a living document until it becomes finalized at Develop.GitHub.com.

OAuth2 is a protocol that lets external apps request authorization to private details in your GitHub account without getting your password. All developers need to register their application before getting started.

Web Application Flow

  • Redirect to this link to request GitHub access: