Skip to content

Instantly share code, notes, and snippets.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@HarryR
HarryR / KZG10.py
Last active June 3, 2024 02:17
Implementation of PolyCommit_{DL} from "Constant-Size Commitments to Polynomials and Their Applications" https://www.cypherpunks.ca/~iang/pubs/PolyCommit-AsiaCrypt.pdf
from typing import List, NamedTuple, Tuple, Union
from math import ceil, log2
from random import randint
from functools import reduce
import operator
from py_ecc import bn128 as curve
"""
Implementation of PolyCommit_{DL} from:
@m4nh
m4nh / nbstripout_as_precommit_hook.md
Last active June 3, 2024 02:17
#nbstripout #pre-commit #hook

Install nbstripout

pip install --upgrade nbstripout

in target repository:

nbstripout --install
@lassade
lassade / decode_hex.zig
Last active June 3, 2024 02:17
zig decodeHex SIMD
// compile with: -mcpu=x86_64+sse2 to ensure sse2 support
pub fn decodeHex(input: []const u8, output: []u8) !void {
const block_size = 16; // change block size to use avx2
const ByteBlock = @Vector(block_size, u8);
const IntBlock = @Vector(block_size / 4, u32);
if (input.len & 1 != 0) {
return error.OddLenghtInput;
}
@lassade
lassade / job_system.zig
Last active June 3, 2024 02:17
a non-blocking job system
const std = @import("std");
const Allocator = std.mem.Allocator;
const ArenaAllocator = std.heap.ArenaAllocator;
const Atomic = std.atomic.Value;
fn Queue(comptime T: type, comptime size: usize) type {
return struct {
front: Atomic(usize) = .{ .raw = 0 },
back: Atomic(usize) = .{ .raw = 0 },
items: [size]?*T = [1]?*T{null} ** size,
@kmhofmann
kmhofmann / installing_nvidia_driver_cuda_cudnn_linux.md
Last active June 3, 2024 02:12
Installing the NVIDIA driver, CUDA and cuDNN on Linux

Installing the NVIDIA driver, CUDA and cuDNN on Linux (Ubuntu 20.04)

This is a companion piece to my instructions on building TensorFlow from source. In particular, the aim is to install the following pieces of software

on an Ubuntu Linux system, in particular Ubuntu 20.04.

@in5ikt
in5ikt / versioning-your-saves.md
Created February 8, 2011 22:44
Minecraft - Versioning your saves

Minecraft - Versioning your saves

This is a tutorial show how to version your saves using Git.

Q: Why would anyone go through this process?
A: Because if you do, you can undo an otherwise un-undoable action, such as swimming in a pit of lava and accidentally dropping your Notch armpit forged special edition diamond pickaxe and your 35 stacks of cobblestone.

Step by step tutorial

Download and install the appropriate version of Git

@magnetikonline
magnetikonline / README.md
Last active June 3, 2024 02:09
Enable GitHub Dependabot for Golang based repositories.
@luizomf
luizomf / ambiente-dev-ubuntu.sh
Last active June 3, 2024 02:05
Ambiente de desenvolvimento Python no Ubuntu - Com VS Code, Google Chrome, ZSH, Oh-my-zsh, zsh-syntax-highlighting, zsh-autosuggestions e spaceship prompt.
#!/bin/bash
# Executar comandos a seguir para atualizar os pacotes
sudo apt update -y
sudo apt upgrade -y
# Só o Python
sudo apt install python3.10-full python3.10-dev -y
# Instalar pacotes a seguir
@mzdraper
mzdraper / index.html
Created April 12, 2021 19:35
feet to meters
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Change a map's style</title>
<meta name="viewport" content="initial-scale=1,maximum-scale=1,user-scalable=no">
<link href="https://api.mapbox.com/mapbox-gl-js/v2.2.0/mapbox-gl.css" rel="stylesheet">
<script src="https://api.mapbox.com/mapbox-gl-js/v2.2.0/mapbox-gl.js"></script>
<style>
body { margin: 0; padding: 0; }