Skip to content

Instantly share code, notes, and snippets.

@kevinbird15
kevinbird15 / DiffEdit Implementation.ipynb
Last active December 3, 2022 00:14
diffedit implementation
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@aparrish
aparrish / spacy_intro.ipynb
Last active August 9, 2023 01:41
NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@acolyer
acolyer / jessfraz.md
Created November 19, 2017 13:39
Containers, operating systems and other fun things from The Morning Paper
@ndarville
ndarville / business-models.md
Last active January 13, 2024 17:27
Business models based on the compiled list at http://news.ycombinator.com/item?id=4924647. I find the link very hard to browse, so I made a simple version in Markdown instead.

Business Models

Advertising

Models Examples
Display ads Yahoo!
Search ads Google
@dferg
dferg / howto-tomato-install-entware.markdown
Last active January 22, 2024 04:40
HOWTO: Install entware on Shibby TomatoUSB

Introduction

This howto describes installing entware for the Tomato open-source router firmware.

Requirements

  • USB stick - 1G or more in size
  • USB-capable router running TomatoUSB.

This Howto Was Tested With

@peti
peti / README.md
Last active January 29, 2024 00:21
Make NixOS provide version-specific LOCALE_ARCHIVE environment variables

This NixOS code ensures that the system provide version-specific $LOCALE_ARCHIVE environment variables to mitigate the effects of NixOS/nixpkgs#38991.

To deploy it, copy the file into your /etc/nixos folder using a file name like multi-glibc-locale-paths.nix. Then edit your configuration.nix file to contain the attribute:

imports = [ ./multi-glibc-locale-paths.nix ];
@ahmetb
ahmetb / gcrgc.sh
Last active February 26, 2024 09:14
Script to clean up Google Container Registry images pushed before a particular date
#!/bin/bash
# Copyright © 2017 Google Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@jaseemabid
jaseemabid / git tutorials.md
Last active March 24, 2024 00:07 — forked from netroy/git tutorials.md
Awesome git tutorials I am finding here and there

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much