Skip to content

Instantly share code, notes, and snippets.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kevinbird15
kevinbird15 / DiffEdit Implementation.ipynb
Last active December 3, 2022 00:14
diffedit implementation
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nadavrot
nadavrot / Matrix.md
Last active April 2, 2024 06:45
Efficient matrix multiplication

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of

@peti
peti / README.md
Last active January 29, 2024 00:21
Make NixOS provide version-specific LOCALE_ARCHIVE environment variables

This NixOS code ensures that the system provide version-specific $LOCALE_ARCHIVE environment variables to mitigate the effects of NixOS/nixpkgs#38991.

To deploy it, copy the file into your /etc/nixos folder using a file name like multi-glibc-locale-paths.nix. Then edit your configuration.nix file to contain the attribute:

imports = [ ./multi-glibc-locale-paths.nix ];
@acolyer
acolyer / jessfraz.md
Created November 19, 2017 13:39
Containers, operating systems and other fun things from The Morning Paper
@ahmetb
ahmetb / gcrgc.sh
Last active February 26, 2024 09:14
Script to clean up Google Container Registry images pushed before a particular date
#!/bin/bash
# Copyright © 2017 Google Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@aparrish
aparrish / spacy_intro.ipynb
Last active August 9, 2023 01:41
NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.


Effective Engineer - Notes

What's an Effective Engineer?

@dferg
dferg / howto-tomato-install-entware.markdown
Last active January 22, 2024 04:40
HOWTO: Install entware on Shibby TomatoUSB

Introduction

This howto describes installing entware for the Tomato open-source router firmware.

Requirements

  • USB stick - 1G or more in size
  • USB-capable router running TomatoUSB.

This Howto Was Tested With