Skip to content

Instantly share code, notes, and snippets.

View malcolmgreaves's full-sized avatar

Malcolm Greaves malcolmgreaves

View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

export CONTAINER_URI="gcr.io/deeplearning-platform-release/experimental.theia.1-7"
export INSTANCE_NAME=...
export PROJECT_NAME=...
export IMAGE_PROJECT="deeplearning-platform-release"
export IMAGE_FAMILY="theia-container-experimental"
export MACHINE_TYPE=... #"n1-standard-4"
export ZONE=.... #"us-central1-a"
gcloud notebooks instances create "${INSTANCE_NAME}" \
--project="${PROJECT_NAME}" \
--location="${ZONE}" \
export CONTAINER_URI="gcr.io/deeplearning-platform-release/experimental.theia.1-7"
export INSTANCE_NAME=...
export PROJECT_NAME=...
export IMAGE_PROJECT="deeplearning-platform-release"
export IMAGE_FAMILY="theia-container-experimental"
export MACHINE_TYPE=... #"n1-standard-4"
export ZONE=... #"us-central1-a"
gcloud compute instances create "${INSTANCE_NAME}" \
--project="${PROJECT_NAME}" \
--zone="${ZONE}" \
@stefan-it
stefan-it / run_ner.py
Last active April 2, 2022 13:28
NER fine-tuning with PyTorch-Transformers (heavily based on https://github.com/kamalkraj/BERT-NER)
from __future__ import absolute_import, division, print_function
import argparse
import glob
import logging
import os
import random
import numpy as np
import torch
import torch
from torch_geometric.data import InMemoryDataset
class MyOwnDataset(InMemoryDataset):
def __init__(self, root, transform=None, pre_transform=None):
super(MyOwnDataset, self).__init__(root, transform, pre_transform)
self.data, self.slices = torch.load(self.processed_paths[0])
@property
@mollymerp
mollymerp / sshfs-gcp-instance-osx.md
Last active January 10, 2024 14:52
How to mount a GCP compute instance filesystem locally using `sshfs` on MacOS

How to mount a GCP compute instance filesystem locally using sshfs

This guide assumes that:

  • you already have an instance set up on GCP that you want to mount locally
  • the GCP CLI (gcloud) is installed on your local machine
  • you have authenticated locally to your google account gcloud auth login
  1. make sure your gcloud config is correct for the instance you're trying to access:
@stevenhao
stevenhao / mode-presto-linter.js
Last active March 28, 2020 04:38
Lint prestodb sql in mode analytic's web editor
// ==UserScript==
// @name PrestoDB Linter v0.1.3
// @namespace http://tampermonkey.net/
// @version 0.1
// @description try to take over the world!
// @author Steven Hao
// @match https://modeanalytics.com/editor/*
// @grant none
// ==/UserScript==
@HarshTrivedi
HarshTrivedi / pad_packed_demo.py
Last active March 2, 2024 16:49 — forked from Tushar-N/pad_packed_demo.py
Minimal tutorial on packing (pack_padded_sequence) and unpacking (pad_packed_sequence) sequences in pytorch.
import torch
from torch import LongTensor
from torch.nn import Embedding, LSTM
from torch.autograd import Variable
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
## We want to run LSTM on a batch of 3 character sequences ['long_str', 'tiny', 'medium']
#
# Step 1: Construct Vocabulary
# Step 2: Load indexed data (list of instances, where each instance is list of character indices)
@malcolmgreaves
malcolmgreaves / download.sh
Created April 24, 2018 20:41
Bash function for a better CLI remote file download experience.
#!/bin/bash
# Bash function to download a file with wget, showing a progress bar and enables
# re-downloading if interrupted. Also can automatically determine filename from
# supplied URL or override from command line.
# First argument is URL.
# Second optional argument is filename.
download () {
local URL="$1"
local FI="$2"
@cbaziotis
cbaziotis / SelfAttention.py
Created April 21, 2018 17:31
SelfAttention implementation in PyTorch
class SelfAttention(nn.Module):
def __init__(self, attention_size, batch_first=False, non_linearity="tanh"):
super(SelfAttention, self).__init__()
self.batch_first = batch_first
self.attention_weights = Parameter(torch.FloatTensor(attention_size))
self.softmax = nn.Softmax(dim=-1)
if non_linearity == "relu":
self.non_linearity = nn.ReLU()