Skip to content

Instantly share code, notes, and snippets.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@vinhkhuc
vinhkhuc / min-char-rnn-tensorflow.py
Last active May 17, 2019 02:48
Vanilla Char-RNN using TensorFlow
"""
Vanilla Char-RNN using TensorFlow by Vinh Khuc (@knvinh).
Adapted from Karpathy's min-char-rnn.py
https://gist.github.com/karpathy/d4dee566867f8291f086
Requires tensorflow>=1.0
BSD License
"""
import random
import numpy as np
import tensorflow as tf
@karpathy
karpathy / min-char-rnn.py
Last active July 29, 2024 00:13
Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""
import numpy as np
# data I/O
data = open('input.txt', 'r').read() # should be simple plain text file
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
@victorquinn
victorquinn / promise_while_loop.js
Last active March 30, 2023 04:29
Promise "loop" using the Bluebird library
var Promise = require('bluebird');
var promiseWhile = function(condition, action) {
var resolver = Promise.defer();
var loop = function() {
if (!condition()) return resolver.resolve();
return Promise.cast(action())
.then(loop)
.catch(resolver.reject);
@MohamedAlaa
MohamedAlaa / tmux-cheatsheet.markdown
Last active July 30, 2024 04:25
tmux shortcuts & cheatsheet

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname
@mislav
mislav / pagination.md
Created October 12, 2010 17:20
"Pagination 101" by Faruk Ateş

Pagination 101

Article by Faruk Ateş, [originally on KuraFire.net][original] which is currently down

One of the most commonly overlooked and under-refined elements of a website is its pagination controls. In many cases, these are treated as an afterthought. I rarely come across a website that has decent pagination, and it always makes me wonder why so few manage to get it right. After all, I'd say that pagination is pretty easy to get right. Alas, that doesn't seem the case, so after encouragement from Chris Messina on Flickr I decided to write my Pagination 101, hopefully it'll give you some clues as to what makes good pagination.

Before going into analyzing good and bad pagination, I want to explain just what I consider to be pagination: Pagination is any kind of control system that lets the user browse through pages of search results, archives, or any other kind of continued content. Search results are the o