Skip to content

Instantly share code, notes, and snippets.

View veekaybee's full-sized avatar
💫
in the latent space

Vicki Boykis veekaybee

💫
in the latent space
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@0xDE57
0xDE57 / config.md
Last active March 9, 2026 02:42
Firefox about:config privacy settings

ABOUT

about:config settings to harden the Firefox browser. Privacy and performance enhancements.
To change these settings type 'about:config' in the url bar. Then search the setting you would like to change and modify the value. Some settings may break certain websites from functioning and rendering normally. Some settings may also make firefox unstable. I am not liable for any damages/loss of data.

Not all these changes are necessary and will be dependent upon your usage and hardware. Do some research on settings if you don't understand what they do. These settings are best combined with your standard privacy extensions (HTTPS Everywhere No longer required: Enable HTTPS-Only Mode, NoScript/Request Policy, uBlock origin, agent spoofing, Privacy Badger etc), and all plugins set to "Ask To Activate".

@mshafrir
mshafrir / states_hash.json
Created May 9, 2012 17:05
US states in JSON form
{
"AL": "Alabama",
"AK": "Alaska",
"AS": "American Samoa",
"AZ": "Arizona",
"AR": "Arkansas",
"CA": "California",
"CO": "Colorado",
"CT": "Connecticut",
"DE": "Delaware",
@bsweger
bsweger / useful_pandas_snippets.md
Last active March 4, 2026 19:59
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

Getting Started in Scala

This is my attempt to give Scala newcomers a quick-and-easy rundown to the prerequisite steps they need to a) try Scala, and b) get a standard project up and running on their machine. I'm not going to talk about the language at all; there are plenty of better resources a google search away. This is just focused on the prerequisite tooling and machine setup. I will not be assuming you have any background in JVM languages. So if you're coming from Python, Ruby, JavaScript, Haskell, or anywhere…  I hope to present the information you need without assuming anything.

Disclaimer It has been over a decade since I was new to Scala, and when I was new to Scala, I was coming from a Java and Ruby background. This has probably caused me to unknowingly make some assumptions. Please feel free to call me out in comments/tweets!

One assumption I'm knowingly making is that you're on a Unix-like platform. Sorry, Windows users.

Getting the JVM

@kalomaze
kalomaze / llm_samplers_explained.md
Last active February 11, 2026 10:21
LLM Samplers Explained

LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.

OpenAI Samplers

Temperature

  • Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
  • 1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
  • Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4

Moved

Now located at https://github.com/JeffPaine/beautiful_idiomatic_python.

Why it was moved

Github gists don't support Pull Requests or any notifications, which made it impossible for me to maintain this (surprisingly popular) gist with fixes, respond to comments and so on. In the interest of maintaining the quality of this resource for others, I've moved it to a proper repo. Cheers!

@BretFisher
BretFisher / .travis.yml
Created February 15, 2016 21:26
Travis-CI Docker Image Build and Push to AWS ECR
sudo: required #is required to use docker service in travis
language: php #can be any language, just php for example
services:
- docker # required, but travis uses older version of docker :(
install:
- echo "install nothing!" # put your normal pre-testing installs here
@tijptjik
tijptjik / wine.csv
Created March 7, 2014 09:47
Wine Dataset
Wine Alcohol Malic.acid Ash Acl Mg Phenols Flavanoids Nonflavanoid.phenols Proanth Color.int Hue OD Proline
1 14.23 1.71 2.43 15.6 127 2.8 3.06 .28 2.29 5.64 1.04 3.92 1065
1 13.2 1.78 2.14 11.2 100 2.65 2.76 .26 1.28 4.38 1.05 3.4 1050
1 13.16 2.36 2.67 18.6 101 2.8 3.24 .3 2.81 5.68 1.03 3.17 1185
1 14.37 1.95 2.5 16.8 113 3.85 3.49 .24 2.18 7.8 .86 3.45 1480
1 13.24 2.59 2.87 21 118 2.8 2.69 .39 1.82 4.32 1.04 2.93 735
1 14.2 1.76 2.45 15.2 112 3.27 3.39 .34 1.97 6.75 1.05 2.85 1450
1 14.39 1.87 2.45 14.6 96 2.5 2.52 .3 1.98 5.25 1.02 3.58 1290
1 14.06 2.15 2.61 17.6 121 2.6 2.51 .31 1.25 5.05 1.06 3.58 1295
1 14.83 1.64 2.17 14 97 2.8 2.98 .29 1.98 5.2 1.08 2.85 1045
@fritzprix
fritzprix / llm-commit.sh
Last active August 2, 2025 08:03
🤖 Generate git commit messages automatically using local LLM (Ollama). Simple bash script that analyzes your git diff and creates meaningful commit messages. No API keys, no cloud - runs locally with Ollama.
#!/bin/bash
# Get the git diff and save it to a temporary file
git diff --cached > /tmp/git_diff.txt
# If there's no diff, exit
if [ ! -s /tmp/git_diff.txt ]; then
echo "No staged changes to commit"
exit 1
fi