Skip to content

Instantly share code, notes, and snippets.

View tljstewart's full-sized avatar
🟢
Human

Timothy Stewart tljstewart

🟢
Human
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@danielgross
danielgross / ai-plugin.json
Created March 23, 2023 22:59
ChatGPT Plugin for Twilio
{
"schema_version": "v1",
"name_for_model": "twilio",
"name_for_human": "Twilio Plugin",
"description_for_model": "Plugin for integrating the Twilio API to send SMS messages and make phone calls. Use it whenever a user wants to send a text message or make a call using their Twilio account.",
"description_for_human": "Send text messages and make phone calls with Twilio.",
"auth": {
"type": "user_http",
"authorization_type": "basic"
},
# STEP 1: Load
# Load documents using LangChain's DocumentLoaders
# This is from https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/csv.html
from langchain.document_loaders.csv_loader import CSVLoader
loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv')
data = loader.load()
@rvanbruggen
rvanbruggen / 1-contacttracing-import.cql
Last active August 18, 2022 16:11
Contact tracing example #cypher #neo4j
//environment: Neo4j Desktop 1.2.7, Neo4j Enteprise 3.5.17, apoc 3.5.0.9, gds 1.1.0
//or: Neo4j Enterprise 4.0.3, apoc 4.0.0.6 (NOT later! a bug in apoc.coll.max/apoc.coll.min needs to be resolved)
//contact tracing data import
//full spreadsheet with synthetic data
//https://docs.google.com/spreadsheets/d/1R-XVuynPsOWcXSderLpq3DacZdk10PZ8v6FiYGTncIE/edit#gid=0
// person sheet˝
// https://docs.google.com/spreadsheets/u/0/d/1R-XVuynPsOWcXSderLpq3DacZdk10PZ8v6FiYGTncIE/export?format=csv&id=1R-XVuynPsOWcXSderLpq3DacZdk10PZ8v6FiYGTncIE&gid=0
@qpwo
qpwo / monte_carlo_tree_search.py
Last active July 22, 2024 09:10
Monte Carlo tree search (MCTS) minimal implementation in Python 3, with a tic-tac-toe example gameplay
"""
A minimal implementation of Monte Carlo tree search (MCTS) in Python 3
Luke Harold Miles, July 2019, Public Domain Dedication
See also https://en.wikipedia.org/wiki/Monte_Carlo_tree_search
https://gist.github.com/qpwo/c538c6f73727e254fdc7fab81024f6e1
"""
from abc import ABC, abstractmethod
from collections import defaultdict
import math
@talmo
talmo / vid2hdf5.m
Created June 14, 2018 13:32
An example of converting video (MP4/AVI/MOV/etc) to HDF5
% Clean start!
clear all, clc
%% Parameters
% Path to input file
videoPath = '..\..\leap\data\examples\072212_163153.clip.mp4';
% Path to output file
% savePath = '..\..\leap\data\examples\072212_163153.clip.h5';
savePath = 'C:\tmp\072212_163153.clip.h5';
@mbinna
mbinna / effective_modern_cmake.md
Last active July 20, 2024 22:17
Effective Modern CMake

Effective Modern CMake

Getting Started

For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.

After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft

@nicolasdao
nicolasdao / open_source_licenses.md
Last active July 21, 2024 20:51
What you need to know to choose an open source license.
@sid24rane
sid24rane / udp.js
Created July 25, 2016 08:39
Simple UDP Client and Server in Node.js ==> ( Echo Server )
var udp = require('dgram');
// --------------------creating a udp server --------------------
// creating a udp server
var server = udp.createSocket('udp4');
// emits when any error occurs
server.on('error',function(error){
console.log('Error: ' + error);
@digitaljhelms
digitaljhelms / gist:4287848
Last active July 19, 2024 03:32
Git/GitHub branching standards & conventions

Branching

Quick Legend

Description, Instructions, Notes
Instance Branch