Skip to content

Instantly share code, notes, and snippets.

View g-simmons's full-sized avatar

Gabriel Simmons g-simmons

View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@mpizenberg
mpizenberg / code-block-example.typ
Created April 3, 2023 07:28
Example for code blocks in Typst
#set par(justify: true)
*Goal*: being able to add line numbers, which are correct even in case of long lines that need wrapping.
*Strategy*: duplicate the code block, once for getting the line numbers correct, and the other for syntax highlighting. The idea is to split lines and prefix each line with its line number such that line wrapping should be respected.
#show raw.where(block: true): it => { set par(justify: false); grid(
columns: (100%, 100%),
column-gutter: -100%,
block(width: 100%, inset: 1em, for i, line in it.text.split("\n") {
@nullcline
nullcline / baka_trace.py
Created March 9, 2023 08:20
tsundere error traces
import traceback
import openai
import sys
# list models
models = openai.Model.list()
def baka(error, character="tsundere",):
exc_type, exc_value, exc_traceback = sys.exc_info()
traceback_list = traceback.extract_tb(exc_traceback)
@dbreunig
dbreunig / podcast-to-transcript-to-sqlite.py
Created February 15, 2023 19:05
Download podcasts from an XML feed, transcribe them with whisper, and insert the data into a sqlite db.
import feedparser
import whisper
import sqlite3
import requests
podcast_feed_url = "https://feeds.libsyn.com/92106/rss"
db_name = "podcast.db"
# Create the database and its tables.
con = sqlite3.connect(db_name)
@jthaman
jthaman / consult-ripgrep-all.el
Last active April 10, 2024 01:09
Call ripgrep-all in emacs with consult
;; Note: put `rga' in your PATH. -*- lexical-binding: t; -*-
(require 'consult)
(defcustom consult-ripgrep-all-args
"rga --null --line-buffered --color=never --max-columns=1000 --path-separator /\ --smart-case --no-heading --with-filename --line-number"
"Command line arguments for ripgrep, see `consult-ripgrep-all'.
The dynamically computed arguments are appended.
Can be either a string, or a list of strings or expressions."
:type '(choice string (repeat (choice string expression))))
@ckandoth
ckandoth / single_machine_slurm_on_ubuntu.md
Last active May 19, 2024 07:49
Install Slurm 19.05 on a standalone machine running Ubuntu 20.04

Use apt to install the necessary packages:

sudo apt install -y slurm-wlm slurm-wlm-doc

Load file:///usr/share/doc/slurm-wlm/html/configurator.html in a browser (or file://wsl%24/Ubuntu/usr/share/doc/slurm-wlm/html/configurator.html on WSL2), and:

  1. Set your machine's hostname in SlurmctldHost and NodeName.
  2. Set CPUs as appropriate, and optionally Sockets, CoresPerSocket, and ThreadsPerCore. Use command lscpu to find what you have.
  3. Set RealMemory to the number of megabytes you want to allocate to Slurm jobs,
  4. Set StateSaveLocation to /var/spool/slurm-llnl.
  5. Set ProctrackType to linuxproc because processes are less likely to escape Slurm control on a single machine config.
@chasset
chasset / invert-citeproc.py
Created July 9, 2019 14:17 — forked from mbroedl/invert-citeproc.py
Inverse pandoc citeproc from docx to markdown
#!/bin/env python
'''
Due to changes made towards pandoc2, at the moment mostly only the inversion of (some) citations and re-wrapping of lines into somewhat semantic units.
I previously had some pandoc filters that also converted track changes to CriticMarkup, could accept or reject them, and merged comments to footnotes or html comments;
due to the change in pandoc filters they don't work at the moment, so that functionality is not used for now (but it is implemented in the script).
Usage:
@ericdfields
ericdfields / gist:3d4ed9c7f7b559289a102207facd61a7
Created February 3, 2017 15:30
Add multiple items to an Amazon Cart with a single button
I've seen links around the web for a list of items and a single 'add to amazon cart' button.
I don't know how to do this more easily through an amazon-provided UI, but it seems that you can hack it together through URL params easy enough, like so:
https://www.amazon.com/gp/aws/cart/add.html?AssociateTag=your_tag&tag=your_tagQ&ASIN.1=B012NH05UW&Quantity.1=1&ASIN.2=B012M8LXQW&Quantity.2=1
* ASINs are the string after /dp/ in amazon URLs. (amazon.com/dp/string_is_here)
* Add as URL params w/ incrementing identifiers and quantity couplets (ASIN.1, Quantity.1, ASIN.2, Quantity.2…)
<?
/////////////////////
// slack2html
// by @levelsio
/////////////////////
//
/////////////////////
// WHAT DOES THIS DO?
/////////////////////
//
@elijahmanor
elijahmanor / index.html
Last active October 21, 2022 18:25
Reveal.js External Markdown
<!doctype html>
<html lang="en">
<!-- ... -->
<body>
<div class="reveal">
<div class="slides">
<section data-markdown="slides.md"
data-separator="^\n---\n$"
data-vertical="^\n------\n$"
data-notes="^Notes:"