Skip to content

Instantly share code, notes, and snippets.

View d-wasserman's full-sized avatar

David Wasserman d-wasserman

View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

import tkinter
import customtkinter
from bs4 import BeautifulSoup
# Langchain loads:
from langchain.document_loaders import DirectoryLoader,PagedPDFSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS, Qdrant
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
@johnwbryant
johnwbryant / extract_aus.sh
Last active September 17, 2023 03:35
Extract Australia data in a usable format from Microsoft road detections data
#!/bin/bash
# see https://github.com/microsoft/RoadDetections
# 1) get the data
wget https://usaminedroads.blob.core.windows.net/road-detections/Oceania-Full.zip
# 2) extract the records for Australia, second column of tsv only
zgrep AUS Oceania-Full.zip | cut -f2 > AUS.geojson
# 3) convert to GPKG, works better in QGIS
@zanetaylor
zanetaylor / map.js
Created July 30, 2021 18:22
MBGL events & feature props
mbMap.on('click', 'prioritization-projects-update', function (e) {
var tmpl = $.templates("#project-popup");
var projectDetails = tmpl.render({
projID: e.features[0].properties.Join_ID,
name: e.features[0].properties.Street,
location: e.features[0].properties.Location,
pdesc1: e.features[0].properties.Primary_De,
sdesc1: e.features[0].properties.Secondary_,
@bhavikngala
bhavikngala / fast_ai_mooc_important_points.md
Last active January 6, 2023 23:04
This gist contains a list of important points from fast.ai "practical deep learning for coders" and "cutting edge deep learning for coders" MOOC

This gist contains a list of points I found very useful while watching the fast.ai "Practical deep learning for coders" and "Cutting edge deep learning for coders" MOOC by Jeremy Howard and team. This list may not be complete as I watched the video at 1.5x speed on marathon but I did write down as many things I found to be very useful to get a model working. A fair warning the points are in no particular order, you may find the topics are all jumbled up.

Before beginning, I want to thank Jeremy Howard, Rachel Thomas, and the entire fast.ai team in making this awesome practically oriented MOOC.

  1. Progressive image resolution training: Train the network on lower res first and then increase the resolution to get better performance. This can be thought of as transfer learning from the same dataset but at a different resolution. There is one paper by NVIDIA as well that used such an approach to train GANs.

  2. Cyclical learning rates: Gradually increasing the learning rate initially helps to avoid getting stuc

@kauffmanes
kauffmanes / install_anaconda.md
Last active July 18, 2024 21:15
Install Anaconda on Windows Subsystem for Linux (WSL)

Thanks everyone for commenting/contributing! I made this in college for a class and I no longer really use the technology. I encourage you all to help each other, but I probably won't be answering questions anymore.

This article is also on my blog: https://emilykauffman.com/blog/install-anaconda-on-wsl

Note: $ denotes the start of a command. Don't actually type this.

Steps to Install Anaconda on Windows Ubuntu Terminal

  1. Install WSL (Ubuntu for Windows - can be found in Windows Store). I recommend the latest version (I'm using 18.04) because there are some bugs they worked out during 14/16 (microsoft/WSL#785)
  2. Go to https://repo.continuum.io/archive to find the list of Anaconda releases
  3. Select the release you want. I have a 64-bit computer, so I chose the latest release ending in x86_64.sh. If I had a 32-bit computer, I'd select the x86.sh version. If you accidentally try to install the wrong one, you'll get a warning in the terminal. I chose `Anaconda3-5.2.0-Li
@chriswhong
chriswhong / RadiusMode.js
Created March 1, 2018 12:04
RadiusMode, a custom mode for mapbox-gl-draw for drawing a radius
// custom mapbopx-gl-draw mode that modifies draw_line_string
// shows a center point, radius line, and circle polygon while drawing
// forces draw.create on creation of second vertex
import MapboxDraw from 'mapbox-gl-draw';
import numeral from 'numeral';
import lineDistance from 'npm:@turf/line-distance';
const RadiusMode = MapboxDraw.modes.draw_line_string;
@jiewpeng
jiewpeng / add_anaconda_right_click_menu.md
Last active July 9, 2024 12:34
Add Anaconda Prompt to Windows right click context menu

This gist describes how to add Anaconda Prompt to the Windows right click context menu. This is because at some point, Anaconda no longer adds itself to the PATH after installation.

  1. Run regedit.exe
  2. Navigate to HKEY_CLASSES_ROOT > Directory > Background > shell
  3. Add a key named AnacondaPrompt and set its value to "Anaconda Prompt Here" (or anything you'd like it to appear as in the right click context menu)
  4. Add a key under this key, called command, and set its value to cmd.exe /K C:\Anaconda3\Scripts\activate.bat (may have to change the activate.bat file to whereever Anaconda is installed)
#!/usr/bin/python3.5
# -*-coding:Utf-8 -*
import random
import operator
import time
import matplotlib.pyplot as plt
temps1 = time.time()