Skip to content

Instantly share code, notes, and snippets.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

Notes on using Geodesic with repos created by cloudposse/reference-architectures

How to Use this Repo

AWS Setup

  1. Log in to console for root account as your IAM user name@example.com
  2. Go to https://console.aws.amazon.com/iam/home
  3. Add an MFA device to your user
  4. Generate an access key and store it securely to be added to aws-vault
@bjorkqvist
bjorkqvist / photos_takout
Created May 29, 2019 22:41
Export images and video from Google Photos and adjust metadata
Export images and video from Google Photos and adjust metadata
-- download images from
takeout.google.com/settings/takeout
-- unzip
- flatten to new dir takeout2
exiftool -q -q -r -d takeout2 "-directory<filemodifydate" "-directory<createdate" "-directory<datetimeoriginal" Takeout
@arikfr
arikfr / README.md
Last active April 26, 2024 10:07
Setting up HTTPS with LetsEncrypt for Redash Docker Deployment
  1. Make sure the domain you picked points at the IP of your Redash server.
  2. Switch to the root user (sudo su).
  3. Create a folder named nginx in /opt/redash.
  4. Create in the nginx folder two additional folders: certs and certs-data.
  5. Create the file /opt/redash/nginx/nginx.conf and place the following in it: (replace example.redashapp.com with your domain name)
    upstream redash {
        server redash:5000;
    }
    
@krishpop
krishpop / export-toby.js
Last active March 21, 2024 22:12
Export Toby
// code courtesy of Toby team
chrome.storage.local.get("state", o => (
((f, t) => {
let e = document.createElement("a");
e.setAttribute("href", `data:text/plain;charset=utf-8,${encodeURIComponent(t)}`);
e.setAttribute("download", f);
e.click();
})(`TobyBackup${Date.now()}.json`, o.state)
));
@kellyrmilligan
kellyrmilligan / s3Sync.sh
Created June 8, 2017 13:38
Sync files to s3 and set cache control headers
#!/bin/bash
if [[ "$1" != "" ]]; then
S3BUCKETNAME="$1"
else
echo ERROR: Failed to supply S3 bucket name
exit 1
fi
aws s3 sync build s3://$S3BUCKETNAME --delete --cache-control max-age=31536000,public
@wearhere
wearhere / deploy_aws.sh
Created October 16, 2016 21:40
CI script to convert a Meteor application into a Node.js application then deploy it to AWS Elastic Beanstalk.
#!/bin/bash
#
# This script will convert a Meteor application into a Node.js application
# then deploy it to AWS Elastic Beanstalk.
#
# Run like `deploy_aws.sh my_eb_app my_eb_app-production`.
#
# That will deploy the Meteor application containing this script
# to the `my_eb_app-production` environment of the `my_eb_app` EB application.
@nbremer
nbremer / .block
Last active May 23, 2024 09:34
Radar Chart Redesign
height: 600
license: mit
acknowledgement: Please add "Nadieh Bremer | Visual Cinnamon" to your credit when re-using this code, thank you!
@jperl
jperl / Sizes.md
Created October 30, 2014 17:40
Meteor App Icon and Launch Screen Size Guide

###Icons

Name Size
iphone_2x 120x120
iphone_3x 180x180
ipad 76x76
ipad_2x 152x152
android_ldpi 36x36
android_mdpi 48x48

Last updated: 2017-03-18

Searching for Files

Find images in a directory that don't have a DateTimeOriginal

exiftool -filename -filemodifydate -createdate -r -if '(not $datetimeoriginal) and $filetype eq "JPEG"' .

###Output photos that don't have datetimeoriginal to a CSV### Note this can take a long time if you have a lot of jpgs