Skip to content

Instantly share code, notes, and snippets.

View tarwn's full-sized avatar
🌮
Mm, taco

Eli Weinstock-Herman tarwn

🌮
Mm, taco
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@sleepyfox
sleepyfox / foxs-laws-of-software-development.md
Last active August 2, 2023 16:50
Fox's Laws of Software Development
author: @sleepyfox
title: Fox's laws of software development
date: 27 October 2021
preamble: A not entirely serious treatise on the immutable fundamental laws of software development activities

Fox's laws of software development

A not entirely serious treatise

@henhan
henhan / nginx_documentation.md
Last active June 27, 2024 16:47
Extend and override nginx config for Elastic Beanstalk Amazon Linux 2

Extend and override nginx config for Elastic Beanstalk Amazon Linux 2

Documentation on how to override or extend the default nginx config on Elastic Beanstalk running om Amazon Linux 2. Correct as of 2021-08-01. Also see the general documentation on how to extend linux servers.

All references to placing files refer to putting a file at this location in your application bundle. You should never modify a file directly at your Elastic Beanstalk server by ssh:ing to the server, since such a change will be wiped whenever your servers autoscale or if you rebuild your application completely.

Adding a new location block

The easiest extension is to add a new location block to the main server block. This can be done by placing a .conf file with a server block in the folder .platform/nginx/conf.d/elasticbeanstalk. Any such file is automatically included in the main server block.

Overriding the default location

@peltho
peltho / svelte.md
Last active May 16, 2024 03:44
Svelte cheatsheet
@davidwhitney
davidwhitney / setup.ps1
Last active April 10, 2024 10:54
Machine setup chocolatey script
Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
choco feature enable -n allowGlobalConfirmation
choco install notepad2
choco install notepadplusplus
choco install 7zip
choco install paint.net
choco install autoruns
choco install vcredist140
choco install dotnetfx
@SQLvariant
SQLvariant / Mmmm_Chocolatey.ps1
Last active September 9, 2023 12:15
Install SQL / Data Developer Desktop Tools from Chocolatey
Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
choco install chocolatey -y
choco install sql-server-management-studio -y
choco install azure-data-studio -y
choco install azuredatastudio-powershell -y
choco install git.install -y
choco install vscode -y
choco install vscode-powershell -y
choco install powerbi -y
@nissan
nissan / new-dotnetcore-react-storybook-jest-app.ps1
Created June 1, 2018 23:46
Setup a new .NET Core 2.1 app for Windows with React, Storybook, Jest
#Assumes you have the .NET SDK 2.1 for Windows already installed
#Use Chocolatey to install packages, run this script from Administrator Powershell
Set-ExecutionPolicy Bypass -Scope Process -Force
iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
#Use nvm to manage node versions
choco install -y nvm
nvm use default
#install yarn independent of version of node being used
choco install -y yarn
refreshenv
@aortbals
aortbals / squash-and-merge-cli.md
Last active January 16, 2024 10:46
Squash and Merge on the Command line

With the introduction of GitHub's Squash and Merge feature, this has become less prevelant, however it's still useful in scenarios where GitHub's interface is unavailable.

Let's talk through two ways to do a squash and merge on the command line.

Take 1: Interactive Rebase

When to use it

  • When you have not merged main into your feature branch
  • There are no merge conflicts
@mackwage
mackwage / windows_hardening.cmd
Last active July 21, 2024 11:26
Script to perform some hardening of Windows OS
:: Windows 10 Hardening Script
:: This is based mostly on my own personal research and testing. My objective is to secure/harden Windows 10 as much as possible while not impacting usability at all. (Think being able to run on this computer's of family members so secure them but not increase the chances of them having to call you to troubleshoot something related to it later on). References for virtually all settings can be found at the bottom. Just before the references section, you will always find several security settings commented out as they could lead to compatibility issues in common consumer setups but they're worth considering.
:: Obligatory 'views are my own'. :)
:: Thank you @jaredhaight for the Win Firewall config recommendations!
:: Thank you @ricardojba for the DLL Safe Order Search reg key!
:: Thank you @jessicaknotts for the help on testing Exploit Guard configs and checking privacy settings!
:: Best script I've found for Debloating Windows 10: https://github.com/Sycnex/Windows10Debloater
:

FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.


Effective Engineer - Notes

What's an Effective Engineer?