Skip to content

Instantly share code, notes, and snippets.

View fonylew's full-sized avatar
🐝
Go Jackets!

Kamolphan Lewprasert fonylew

🐝
Go Jackets!
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@yermulnik
yermulnik / config.yml
Last active February 10, 2024 21:42
GH CLI multi-account switch
git_protocol: ssh
aliases:
personal: '!cp ~/.config/gh/hosts.yml.personal ~/.config/gh/hosts.yml && gh auth status'
work: '!cp ~/.config/gh/hosts.yml.work ~/.config/gh/hosts.yml && gh auth status'
@joost-de-vries
joost-de-vries / akka-and-kotlin-coroutines.md
Last active February 27, 2024 07:00
Akka and kotlin coroutines

Akka and Kotlin coroutines: ♡

I've experimented with Kotlin and coroutines in programming Akka. And I must say, I really like the combination so far.
But before I go into it some brief preliminaries for those who don't know Akka and actors.

Actors and Akka

Actors are a programming model that fits cloud native architectures particularly well. Being highly available and scaling horizontally. All while embracing the realities of multiple servers collaborating, server instances coming and going and the network having hickups.

On the JVM Akka is the prominent actor framework. It's been around for a while now and as a result it's highly reliable, well thought out and offers a wide programming eco system. My own interest in Akka is because of its suitability for software systems that can only be built with business events as a key construct and thinking model. And then of course materialized views, CQRS and near real-time data streams play a big role in constructing those systems.

@akihikodaki
akihikodaki / README.en.md
Last active April 20, 2024 02:43
Linux Desktop on Apple Silicon in Practice

Linux Desktop on Apple Silicon in Practice

I bought M1 MacBook Air. It is the fastest computer I have, and I have been a GNOME/GNU/Linux user for long time. It is obvious conclusion that I need practical Linux desktop environment on Apple Silicon.

Fortunately, Linux already works on Apple Silicon/M1. But how practical is it?

  • Two native ports exist.
@martinsbruveris
martinsbruveris / tensorflow_model_surgery.ipynb
Created November 15, 2020 22:03
How to split tensorflow models into two at a given layer.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@antoniocachuan
antoniocachuan / message_generator_pubsub.py
Last active August 11, 2022 17:06
Simple Message generator for Google Cloud Pub/Sub
#!/usr/bin/env python
# Code modified from https://cloud.google.com/dataflow/docs/samples/join-streaming-data-with-sql#expandable-3
import datetime, json, os, random, time
# Set the `project` variable to a Google Cloud project ID.
project = 'GCP-PROJECT-ID'
BRANCH = ['LIM', 'BOG', 'SFO', 'LAX', 'PEK', 'ATL', 'CDG', 'AMS',
'HKG', 'ICN', 'FRA', 'MAD', 'SEA', 'LAS', 'SIN', 'BKK', 'DFW',
# Use Google Cloud Platform stackdriver with python structlog
from google.cloud.logging import Client
from google.cloud.logging import _helpers
from google.cloud.logging.handlers import CloudLoggingHandler
from google.cloud.logging.handlers.transports.background_thread import _Worker
# pip install python-json-logger
from pythonjsonlogger import jsonlogger
@tuffacton
tuffacton / streamlit_colab.ipynb
Last active March 7, 2024 05:47
Colaboratory Notebook that hosts a streamlit app and creates an ngrok https tunnel for access.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@tzvsi
tzvsi / gist:222b3b22a847004a729744f89fe31255
Last active September 21, 2023 06:37
Installing CUDA 10.2, CuDNN 7.6.5, TensorRT 7.0, Ubuntu 18.04

Step 1: Installing CUDA (~5.5 minutes)

You can also install CUDA directly from the offline installer, but this is a little easier.

sudo apt update
sudo apt upgrade -y

mkdir install ; cd install
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
@sdondley
sdondley / tmux split-window subcommand.md
Last active April 23, 2024 11:49
Super Guide to the split-window tmux Subcommand (and Beyond)

Super Guide to the split-window tmux Subcommand (and Beyond)

Guide overview

tmux, like other great software, is deceptive. On the one hand, it's fairly easy to get set up and start using right away. On the other hand, it's difficult to take advantage of tmux's adanced power features without spending some quality alone time with the manual. But the problem with manuals is that they aren't geared toward beginners. They are geared toward helping seasoned developers and computer enthusiasts quickly obtain the