Skip to content

Instantly share code, notes, and snippets.

View veekaybee's full-sized avatar
💫
in the latent space

Vicki Boykis veekaybee

💫
in the latent space
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@mshafrir
mshafrir / states_hash.json
Created May 9, 2012 17:05
US states in JSON form
{
"AL": "Alabama",
"AK": "Alaska",
"AS": "American Samoa",
"AZ": "Arizona",
"AR": "Arkansas",
"CA": "California",
"CO": "Colorado",
"CT": "Connecticut",
"DE": "Delaware",
@bsweger
bsweger / useful_pandas_snippets.md
Last active April 19, 2024 18:04
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@robulouski
robulouski / gmail_imap_example.py
Last active April 19, 2024 02:27
Very basic example of using Python and IMAP to iterate over emails in a gmail folder/label. http://www.voidynullness.net/blog/2013/07/25/gmail-email-with-python-via-imap/
#!/usr/bin/env python
#
# Very basic example of using Python and IMAP to iterate over emails in a
# gmail folder/label. This code is released into the public domain.
#
# RKI July 2013
# http://www.voidynullness.net/blog/2013/07/25/gmail-email-with-python-via-imap/
#
import sys
import imaplib
@0XDE57
0XDE57 / config.md
Last active April 18, 2024 04:36
Firefox about:config privacy settings

ABOUT

about:config settings to harden the Firefox browser. Privacy and performance enhancements.
To change these settings type 'about:config' in the url bar. Then search the setting you would like to change and modify the value. Some settings may break certain websites from functioning and rendering normally. Some settings may also make firefox unstable. I am not liable for any damages/loss of data.

Not all these changes are necessary and will be dependent upon your usage and hardware. Do some research on settings if you don't understand what they do. These settings are best combined with your standard privacy extensions (HTTPS Everywhere No longer required: Enable HTTPS-Only Mode, NoScript/Request Policy, uBlock origin, agent spoofing, Privacy Badger etc), and all plugins set to "Ask To Activate".

@malarkey
malarkey / Contract Killer 3.md
Last active April 16, 2024 21:44
The latest version of my ‘killer contract’ for web designers and developers

When times get tough and people get nasty, you’ll need more than a killer smile. You’ll need a killer contract.

Used by 1000s of designers and developers Clarify what’s expected on both sides Helps build great relationships between you and your clients Plain and simple, no legal jargon Customisable to suit your business Used on countless web projects since 2008

…………………………

@dideler
dideler / unique.py
Last active April 8, 2024 04:17
Removing duplicate lines from a file in Python
#!/usr/bin/python
"""
Playing around with slightly various ways to simulate uniq in Python.
The different strategies are timed.
Only m1() and m2() do not change the order of the data.
`in` is the input file, `out*` are output files.
"""
infile = 'in' # Change filename to suit your needs.
@kapkaev
kapkaev / gist:4619127
Created January 24, 2013 09:30
MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error. Resque
$ redis-cli
> config set stop-writes-on-bgsave-error no
@tijptjik
tijptjik / wine.csv
Created March 7, 2014 09:47
Wine Dataset
Wine Alcohol Malic.acid Ash Acl Mg Phenols Flavanoids Nonflavanoid.phenols Proanth Color.int Hue OD Proline
1 14.23 1.71 2.43 15.6 127 2.8 3.06 .28 2.29 5.64 1.04 3.92 1065
1 13.2 1.78 2.14 11.2 100 2.65 2.76 .26 1.28 4.38 1.05 3.4 1050
1 13.16 2.36 2.67 18.6 101 2.8 3.24 .3 2.81 5.68 1.03 3.17 1185
1 14.37 1.95 2.5 16.8 113 3.85 3.49 .24 2.18 7.8 .86 3.45 1480
1 13.24 2.59 2.87 21 118 2.8 2.69 .39 1.82 4.32 1.04 2.93 735
1 14.2 1.76 2.45 15.2 112 3.27 3.39 .34 1.97 6.75 1.05 2.85 1450
1 14.39 1.87 2.45 14.6 96 2.5 2.52 .3 1.98 5.25 1.02 3.58 1290
1 14.06 2.15 2.61 17.6 121 2.6 2.51 .31 1.25 5.05 1.06 3.58 1295
1 14.83 1.64 2.17 14 97 2.8 2.98 .29 1.98 5.2 1.08 2.85 1045
@BretFisher
BretFisher / .travis.yml
Created February 15, 2016 21:26
Travis-CI Docker Image Build and Push to AWS ECR
sudo: required #is required to use docker service in travis
language: php #can be any language, just php for example
services:
- docker # required, but travis uses older version of docker :(
install:
- echo "install nothing!" # put your normal pre-testing installs here