Skip to content

Instantly share code, notes, and snippets.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@yoavg
yoavg / GM-level-chess-without-search.md
Last active April 16, 2024 00:56
Grand-master Level Chess without Search

Grand-master Level Chess without Search: Modeling Choices and their Implications

Yoav Golderg, February 2024.


Researchers at Google DeepMind released a paper about a learned systems that is able to play blitz-chess at a grandmaster level, without using search. This is interesting and imagination-capturing, because up to now computer-chess systems that play at this level, either based on machine-learning or not, did use a search component.[^1]

Indeed, my first reaction when reading the paper was to tweet wow, crazy and interesting. I still find it crazy and interesting, but upon a closer read, it may not be as crazy and as interesting as I initially thought. Many reactions on twitter, reddit, etc, were super-impressed, going into implications about projected learning abilities of AI systems, the ability of neural networks to learn semantics from observations, etc, which are really over-the-top. The paper does not claim any of them, but they are still perceiv

@yoavg
yoavg / LLMs.md
Last active February 17, 2024 18:39

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

@yoavg
yoavg / searle.md
Last active February 11, 2024 15:09
On Searle's Chinese Room Argument

On Searle's "Chinese Room" argument

When I first heard of Searle's "Chinese Room" argument, some twenty+ years ago, I had roughly the following dialog:


"Imagine there is a room with instructions, and someone slips a note written in chinese into this room, and you don't know chinese, but you follow the instructios in the room and based on the instructions you produce a different note in chinese and send it back out, and whoever sends you the original note thinks your note is a perfect response."

Oh, so the person outside doesn't know chinese either?

"No no, they do know chinese, you produced a perfect answer"

@yoavg
yoavg / lm_example
Created May 22, 2015 23:43
Unreasonable Effectiveness of LMs
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The unreasonable effectiveness of Character-level Language Models\n",
"## (and why RNNs are still cool)\n",
"\n",
"###[Yoav Goldberg](http://www.cs.biu.ac.il/~yogo)\n",
@yoavg
yoavg / stochastic-critique.md
Last active November 9, 2023 04:32
A criticism of Stochastic Parrots

A criticism of "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big"

Yoav Goldberg, Jan 23, 2021.

The FAccT paper "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big" by Bender, Gebru, McMillan-Major and Shmitchell has been the center of a controversary recently. The final version is now out, and, owing a lot to this controversary, would undoubtly become very widely read. I read an earlier draft of the paper, and I think that the new and updated final version is much improved in many ways: kudos for the authors for this upgrade. I also agree with and endorse most of the content. This is important stuff, you should read it.

However, I do find some aspects of the paper (and the resulting discourse around it and around technology) to be problematic. These weren't clear to me when initially reading the first draft several months ago, but they became very clear to me now. These points are for the most part

Putting papers on arxiv early vs the protections of blind review

The tension between putting papers on arxiv as soon as possible and the double-blind peer review process is ever present. Some people favor the fast-pace of progress facilitated by making papers available before or during the peer review process, while others favor the protection of double-blind reviewing (actually, of author-blind reviewing. reviewer-anonymity is not part of the debate).

As I now serve on an ACL committee which is tasked at assessing this tension, I've spend a longer-then-usual time thinking about it, and came up with an analysis which I find informative, and which others may also find useful. These are my personal opinions, and are not representative of the committee. Though naturally, I will share them there as well.

The analysis examines the dynamics of review bias due to author identities being made exposed through a pre-print, and its effect on other authors at the same conference. The conclusion, as usual with me,

Thoughts and some criticism on "Re-imagining Algorithmic Fairness in India and Beyond".

Yoav Goldberg, Jan 30, 2021

This new paper from Google Research Ethics Team (by Sambasivan, Arnesen, Hutchinson, Doshi, and Prabhakaran) touches on a very imortant topic: research (and supposedly also applied) work on algorithmic fairness---and more broadly AI-ethics---is US-centric[*], reflecting US subgroups, values, and methods. But AI is also applied elsewhere (for example, India). Do the methods and result developed for/in the US transfer? The answer is, of course, no, and the paper is doing a good job of showing it. If you are the kind of person who is impressed by the number of citations, this one has 220, a much higher number than another paper (not) from Google Research that became popular recently and which boasts many citations. I think this current paper (let's call it "the India Paper") is substantially more important, given that it raises a very serious issue that

@yoavg
yoavg / ngram-lm-leak.ipynb
Created February 27, 2018 22:51
Simple 4gram-lm also "leak secrets"
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.