Gabriel Simmons g-simmons

## rl-for-llms.md

      
              1 file
            
          
              23 forks
            
          
              11 comments
            
          
              540 stars
            
          
                yoavg
                / rl-for-llms.md
            
            
              Last active
              July 4, 2024 12:32
            
          
    Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.
Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback".
I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

  
## code-block-example.typ
#set par(justify: true)

*Goal*: being able to add line numbers, which are correct even in case of long lines that need wrapping.

*Strategy*: duplicate the code block, once for getting the line numbers correct, and the other for syntax highlighting. The idea is to split lines and prefix each line with its line number such that line wrapping should be respected.

#show raw.where(block: true): it => { set par(justify: false); grid(
  columns: (100%, 100%),
  column-gutter: -100%,
  block(width: 100%, inset: 1em, for i, line in it.text.split("\n") {

## baka_trace.py
import traceback
import openai
import sys

# list models
models = openai.Model.list()

def baka(error, character="tsundere",):
    exc_type, exc_value, exc_traceback = sys.exc_info()
    traceback_list = traceback.extract_tb(exc_traceback)

## podcast-to-transcript-to-sqlite.py
import feedparser
import whisper
import sqlite3
import requests

podcast_feed_url = "https://feeds.libsyn.com/92106/rss"
db_name = "podcast.db"

# Create the database and its tables.
con = sqlite3.connect(db_name)

## consult-ripgrep-all.el
;; Note: put `rga' in your PATH.  -*- lexical-binding: t; -*-
(require 'consult)

(defcustom consult-ripgrep-all-args
  "rga --null --line-buffered --color=never --max-columns=1000 --path-separator /\  --smart-case --no-heading --with-filename --line-number"
  "Command line arguments for ripgrep, see `consult-ripgrep-all'.
The dynamically computed arguments are appended.
Can be either a string, or a list of strings or expressions."
  :type '(choice string (repeat (choice string expression))))

## single_machine_slurm_on_ubuntu.md

      
              1 file
            
          
              12 forks
            
          
              14 comments
            
          
              31 stars
            
          
                ckandoth
                / single_machine_slurm_on_ubuntu.md
            
            
              Last active
              May 19, 2024 07:49
            
              
                Install Slurm 19.05 on a standalone machine running Ubuntu 20.04
              
          
    Use apt to install the necessary packages:
sudo apt install -y slurm-wlm slurm-wlm-doc

Load file:///usr/share/doc/slurm-wlm/html/configurator.html in a browser (or file://wsl%24/Ubuntu/usr/share/doc/slurm-wlm/html/configurator.html on WSL2), and:

Set your machine's hostname in SlurmctldHost and NodeName.
Set CPUs as appropriate, and optionally Sockets, CoresPerSocket, and ThreadsPerCore. Use command lscpu to find what you have.
Set RealMemory to the number of megabytes you want to allocate to Slurm jobs,
Set StateSaveLocation to /var/spool/slurm-llnl.
Set ProctrackType to linuxproc because processes are less likely to escape Slurm control on a single machine config.


## invert-citeproc.py
#!/bin/env python

'''

Due to changes made towards pandoc2, at the moment mostly only the inversion of (some) citations and re-wrapping of lines into somewhat semantic units.
I previously had some pandoc filters that also converted track changes to CriticMarkup, could accept or reject them, and merged comments to footnotes or html comments;
due to the change in pandoc filters they don't work at the moment, so that functionality is not used for now (but it is implemented in the script).

Usage:

## gist:3d4ed9c7f7b559289a102207facd61a7
I've seen links around the web for a list of items and a single 'add to amazon cart' button.
I don't know how to do this more easily through an amazon-provided UI, but it seems that you can hack it together through URL params easy enough, like so:

https://www.amazon.com/gp/aws/cart/add.html?AssociateTag=your_tag&tag=your_tagQ&ASIN.1=B012NH05UW&Quantity.1=1&ASIN.2=B012M8LXQW&Quantity.2=1

* ASINs are the string after /dp/ in amazon URLs. (amazon.com/dp/string_is_here)
* Add as URL params w/ incrementing identifiers and quantity couplets (ASIN.1, Quantity.1, ASIN.2, Quantity.2…)

## gist:122907e95956602e5c09
<?
/////////////////////
// slack2html
// by @levelsio
/////////////////////
//
/////////////////////
// WHAT DOES THIS DO?
/////////////////////
//

## index.html
<!doctype html>
<html lang="en">
  <!-- ... -->
  <body>
    <div class="reveal">
      <div class="slides">
        <section data-markdown="slides.md"
          data-separator="^\n---\n$"
          data-vertical="^\n------\n$"
          data-notes="^Notes:"
	#set par(justify: true)

	Goal: being able to add line numbers, which are correct even in case of long lines that need wrapping.

	Strategy: duplicate the code block, once for getting the line numbers correct, and the other for syntax highlighting. The idea is to split lines and prefix each line with its line number such that line wrapping should be respected.

	#show raw.where(block: true): it => { set par(justify: false); grid(
	columns: (100%, 100%),
	column-gutter: -100%,
	block(width: 100%, inset: 1em, for i, line in it.text.split("\n") {
	import traceback
	import openai
	import sys

	# list models
	models = openai.Model.list()

	def baka(error, character="tsundere",):
	exc_type, exc_value, exc_traceback = sys.exc_info()
	traceback_list = traceback.extract_tb(exc_traceback)
	import feedparser
	import whisper
	import sqlite3
	import requests

	podcast_feed_url = "https://feeds.libsyn.com/92106/rss"
	db_name = "podcast.db"

	# Create the database and its tables.
	con = sqlite3.connect(db_name)
	;; Note: put `rga' in your PATH. -- lexical-binding: t; --
	(require 'consult)

	(defcustom consult-ripgrep-all-args
	"rga --null --line-buffered --color=never --max-columns=1000 --path-separator /\ --smart-case --no-heading --with-filename --line-number"
	"Command line arguments for ripgrep, see `consult-ripgrep-all'.
	The dynamically computed arguments are appended.
	Can be either a string, or a list of strings or expressions."
	:type '(choice string (repeat (choice string expression))))
	#!/bin/env python

	'''

	Due to changes made towards pandoc2, at the moment mostly only the inversion of (some) citations and re-wrapping of lines into somewhat semantic units.
	I previously had some pandoc filters that also converted track changes to CriticMarkup, could accept or reject them, and merged comments to footnotes or html comments;
	due to the change in pandoc filters they don't work at the moment, so that functionality is not used for now (but it is implemented in the script).

	Usage:
	I've seen links around the web for a list of items and a single 'add to amazon cart' button.
	I don't know how to do this more easily through an amazon-provided UI, but it seems that you can hack it together through URL params easy enough, like so:

	https://www.amazon.com/gp/aws/cart/add.html?AssociateTag=your_tag&tag=your_tagQ&ASIN.1=B012NH05UW&Quantity.1=1&ASIN.2=B012M8LXQW&Quantity.2=1

	* ASINs are the string after /dp/ in amazon URLs. (amazon.com/dp/string_is_here)
	* Add as URL params w/ incrementing identifiers and quantity couplets (ASIN.1, Quantity.1, ASIN.2, Quantity.2…)
	<?
	/////////////////////
	// slack2html
	// by @levelsio
	/////////////////////
	//
	/////////////////////
	// WHAT DOES THIS DO?
	/////////////////////
	//
	<!doctype html>
	<html lang="en">
	<!-- ... -->
	<body>
	<div class="reveal">
	<div class="slides">
	<section data-markdown="slides.md"
	data-separator="^\n---\n$"
	data-vertical="^\n------\n$"
	data-notes="^Notes:"