Skip to content

Instantly share code, notes, and snippets.

View sdimi's full-sized avatar

Dimitris Spathis sdimi

View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

#!/usr/bin/env python
import math
import sys
from moviepy.editor import AudioClip, VideoFileClip, concatenate_videoclips
# Get average RGB of part of a frame. Frame is H * W * 3 (rgb)
# Assumes x1 < x2, y1 < y2
@SamratSahoo
SamratSahoo / convolution.py
Last active November 10, 2021 23:55
2D Convolution Implementation with NumPy
def convolve2D(image, kernel, padding=0, strides=1):
# Cross Correlation
kernel = np.flipud(np.fliplr(kernel))
# Gather Shapes of Kernel + Image + Padding
xKernShape = kernel.shape[0]
yKernShape = kernel.shape[1]
xImgShape = image.shape[0]
yImgShape = image.shape[1]
Rank Type Prefix/Suffix Length
1 Prefix my+ 2
2 Suffix +online 6
3 Prefix the+ 3
4 Suffix +web 3
5 Suffix +media 5
6 Prefix web+ 3
7 Suffix +world 5
8 Suffix +net 3
9 Prefix go+ 2
@andy-thomason
andy-thomason / Genomics_A_Programmers_Guide.md
Created May 14, 2019 13:32
Genomics a programmers introduction

Genomics - A programmer's guide.

Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.

https://www.genomicsplc.com

@Tushar-N
Tushar-N / hook_activations.py
Created August 3, 2018 00:06
Pytorch code to save activations for specific layers over an entire dataset
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as tmodels
from functools import partial
import collections
# dummy data: 10 batches of images with batch size 16
dataset = [torch.rand(16,3,224,224).cuda() for _ in range(10)]
@benwattsjones
benwattsjones / gmail_mbox_parser.py
Last active May 28, 2024 15:16
Quick python code to parse mbox files, specifically those used by GMail. Extracts sender, date, plain text contents etc., ignores base64 attachments.
#! /usr/bin/env python3
# ~*~ utf-8 ~*~
import mailbox
import bs4
def get_html_text(html):
try:
return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
except AttributeError: # message contents empty
@L0SG
L0SG / freeze_example.py
Last active October 12, 2023 05:02
PyTorch example: freezing a part of the net (including fine-tuning)
import torch
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F
import torch.optim as optim
# toy feed-forward net
class Net(nn.Module):
def __init__(self):
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.