Skip to content

Instantly share code, notes, and snippets.

JoaoLages /
Last active March 26, 2024 18:51
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data
excludesfile = ~/.gitignore_global
pager = diff-so-fancy | less --tabs=4 -RFX
[difftool "sourcetree"]
cmd = opendiff \"$LOCAL\" \"$REMOTE\"
path =
aa = add --all
MarkEdmondson1234 / online_google_auth.r
Last active October 5, 2018 13:42
Google OAuth2 Authentication functions for an R Shiny app
## GUIDE TO AUTH2 Authentication in R Shiny (or other online apps)
## Mark Edmondson 2015-02-16 - @HoloMarkeD |
## v 0.1
## Go to the Google API console and activate the APIs you need.
## Get your client ID, and client secret for use below, and put in the URL of your app in the redirect URIs
## e.g. I put in for the GA Effect app,
staltz /
Last active May 29, 2024 05:51
The introduction to Reactive Programming you've been missing
cboettig / knitr_defaults.R
Last active November 30, 2022 09:16
My common knitr defaults
# My preferred defaults (may be changed in individual chunks)
opts_chunk$set(tidy=FALSE, warning=FALSE, message=FALSE, cache=TRUE,
comment=NA, verbose=TRUE, fig.width=6, fig.height=4)
# Name the cache path and fig.path based on filename...
opts_chunk$set(fig.path = paste("figure/",
gsub(".Rmd", "", knitr:::knit_concord$get('infile')),
"-", sep=""),
cache.path = paste(gsub(".Rmd", "", knitr:::knit_concord$get('infile') ),
"/", sep=""))