Guilherme Pires colobas

## RLHF.md

      
              1 file
            
          
              7 forks
            
          
              35 comments
            
          
              109 stars
            
          
                JoaoLages
                / RLHF.md
            
            
              Last active
              March 26, 2024 18:51
            
              
                Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation 
              
          
    Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.
We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈
RLHF is especially useful in two scenarios 🌟:

You can’t create a good loss function

Example: how do you calculate a metric to measure if the model’s output was funny?


You want to train with production data, but you can’t easily label your production data


## zarr-links.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                alimanfoo
                / zarr-links.ipynb
            
            
              Last active
              February 28, 2024 19:01
            
              
                How to create links with zarr
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## st-gumbel.py
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

def sample_gumbel(shape, eps=1e-20):
    U = torch.rand(shape).cuda()
    return -Variable(torch.log(-torch.log(U + eps) + eps))

## main.go
package main

import (
	"fmt"
	"os"
	"os/exec"
	"syscall"
)

func main() {

## hn_search.js
/* Hacker News Search Script
 *
 * Original Script by Kristopolous:
 * https://gist.github.com/kristopolous/19260ae54967c2219da8
 *
 * Usage:
 * First, copy the script into your browser's console whilst on the Hacker News
 * jobs page. Then, you can use the query function to filter the results.
 *
 * For example,

## latency.txt
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy             3,000   ns        3 us
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD
	from __future__ import print_function
	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	from torch.autograd import Variable

	def sample_gumbel(shape, eps=1e-20):
	U = torch.rand(shape).cuda()
	return -Variable(torch.log(-torch.log(U + eps) + eps))
	package main

	import (
	"fmt"
	"os"
	"os/exec"
	"syscall"
	)

	func main() {
	/* Hacker News Search Script
	*
	* Original Script by Kristopolous:
	* https://gist.github.com/kristopolous/19260ae54967c2219da8
	*
	* Usage:
	* First, copy the script into your browser's console whilst on the Hacker News
	* jobs page. Then, you can use the query function to filter the results.
	*
	* For example,
	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD