Skip to content

Instantly share code, notes, and snippets.

@JoaoLages
JoaoLages / RLHF.md
Last active March 26, 2024 18:51
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data
[core]
excludesfile = ~/.gitignore_global
pager = diff-so-fancy | less --tabs=4 -RFX
[difftool "sourcetree"]
cmd = opendiff \"$LOCAL\" \"$REMOTE\"
path =
[alias]
aa = add --all
@MarkEdmondson1234
MarkEdmondson1234 / online_google_auth.r
Last active October 5, 2018 13:42
Google OAuth2 Authentication functions for an R Shiny app
## GUIDE TO AUTH2 Authentication in R Shiny (or other online apps)
##
## Mark Edmondson 2015-02-16 - @HoloMarkeD | http://markedmondson.me
##
## v 0.1
##
##
## Go to the Google API console and activate the APIs you need. https://code.google.com/apis/console/?pli=1
## Get your client ID, and client secret for use below, and put in the URL of your app in the redirect URIs
## e.g. I put in https://mark.shinyapps.io/ga-effect/ for the GA Effect app,
@staltz
staltz / introrx.md
Last active May 29, 2024 05:51
The introduction to Reactive Programming you've been missing
@cboettig
cboettig / knitr_defaults.R
Last active November 30, 2022 09:16
My common knitr defaults
# My preferred defaults (may be changed in individual chunks)
opts_chunk$set(tidy=FALSE, warning=FALSE, message=FALSE, cache=TRUE,
comment=NA, verbose=TRUE, fig.width=6, fig.height=4)
# Name the cache path and fig.path based on filename...
opts_chunk$set(fig.path = paste("figure/",
gsub(".Rmd", "", knitr:::knit_concord$get('infile')),
"-", sep=""),
cache.path = paste(gsub(".Rmd", "", knitr:::knit_concord$get('infile') ),
"/", sep=""))