Zhikang Li kevinlee9

## rl-for-llms.md

      
              1 file
            
          
              32 forks
            
          
                12 comments
              
            
              569 stars
            
          
                yoavg
                / rl-for-llms.md
            
            
              Last active
              September 27, 2025 08:52
            
          
    Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.
Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback".
I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

  
## 1_pytorch_distributed_ops_demo.py
#!/usr/bin/env python
import os
import torch
import torch.distributed as dist
from torch.multiprocessing import Process

def run(rank, size):
    """ Distributed function to be implemented later. """
    # collective ops are performed against groups
    group = dist.new_group([0, 1, 2])

## gpu_supervision.py
# -*- coding: utf-8 -*-

import sys
import subprocess

DB_LIST = ['db14', 'db15', 'db16', 'db17', 'db18', 'db19']
USER = 'YOUR_USERNAME'

version = sys.version[0]

## DRILL-92.01.patch
commit e1f17a09b6d402e78268753897cbdbd4f8bed169
Author: Yash Sharma <yash360@gmail.com>
Date:   Mon Jul 6 09:51:22 2015 +0530

    DRILL-92 : Cassandra storage plugin - rebased on Drill-1.2.0

diff --git a/contrib/pom.xml b/contrib/pom.xml
index 8c00e76..0269efb 100644
--- a/contrib/pom.xml
+++ b/contrib/pom.xml

## .tmux.conf
# vim style tmux config

# use C-a, since it's on the home row and easier to hit than C-b
set-option -g prefix C-a
unbind-key C-a
bind-key C-a send-prefix
set -g base-index 1

# Easy config reload
bind-key R source-file ~/.tmux.conf \; display-message "tmux.conf reloaded."

## gist:7360908

      
              1 file
            
          
              4036 forks
            
          
                1238 comments
              
            
              18563 stars
            
          
                rxaviers
                / gist:7360908
            
            
              Last active
              November 19, 2025 08:08
            
              
                Complete list of github markdown emoji markup
              
          
    People


 :bowtie:
😄 :smile:
😆 :laughing:


😊 :blush:
😃 :smiley:
☺️ :relaxed:


😏 :smirk:
😍 :heart_eyes:
😘 :kissing_heart:


😚 :kissing_closed_eyes:
😳 :flushed:
😌 :relieved:


😆 :satisfied:
😁 :grin:
😉 :wink:


😜 :stuck_out_tongue_winking_eye:
😝 :stuck_out_tongue_closed_eyes:
😀 :grinning:


😗 :kissing:
😙 :kissing_smiling_eyes:
😛 :stuck_out_tongue:


## redmine gitlab sync
#!/usr/bin/env ruby

require 'faraday'
require 'json'
require 'gitlab'

module Redmine
  Host = nil
  APIKey = nil

## git-export
git archive --format zip --output /full/path/to/zipfile.zip master
	#!/usr/bin/env python
	import os
	import torch
	import torch.distributed as dist
	from torch.multiprocessing import Process

	def run(rank, size):
	""" Distributed function to be implemented later. """
	# collective ops are performed against groups
	group = dist.new_group([0, 1, 2])
	# -- coding: utf-8 --

	import sys
	import subprocess

	DB_LIST = ['db14', 'db15', 'db16', 'db17', 'db18', 'db19']
	USER = 'YOUR_USERNAME'

	version = sys.version[0]
	commit e1f17a09b6d402e78268753897cbdbd4f8bed169
	Author: Yash Sharma <yash360@gmail.com>
	Date: Mon Jul 6 09:51:22 2015 +0530

	DRILL-92 : Cassandra storage plugin - rebased on Drill-1.2.0

	diff --git a/contrib/pom.xml b/contrib/pom.xml
	index 8c00e76..0269efb 100644
	--- a/contrib/pom.xml
	+++ b/contrib/pom.xml
	# vim style tmux config

	# use C-a, since it's on the home row and easier to hit than C-b
	set-option -g prefix C-a
	unbind-key C-a
	bind-key C-a send-prefix
	set -g base-index 1

	# Easy config reload
	bind-key R source-file ~/.tmux.conf \; display-message "tmux.conf reloaded."
`:bowtie:`	😄 `:smile:`	😆 `:laughing:`
😊 `:blush:`	😃 `:smiley:`	☺️ `:relaxed:`
😏 `:smirk:`	😍 `:heart_eyes:`	😘 `:kissing_heart:`
😚 `:kissing_closed_eyes:`	😳 `:flushed:`	😌 `:relieved:`
😆 `:satisfied:`	😁 `:grin:`	😉 `:wink:`
😜 `:stuck_out_tongue_winking_eye:`	😝 `:stuck_out_tongue_closed_eyes:`	😀 `:grinning:`
😗 `:kissing:`	😙 `:kissing_smiling_eyes:`	😛 `:stuck_out_tongue:`
	#!/usr/bin/env ruby

	require 'faraday'
	require 'json'
	require 'gitlab'

	module Redmine
	Host = nil
	APIKey = nil