Skip to content

Instantly share code, notes, and snippets.

View NISH1001's full-sized avatar
💭
Staring into the abyss...

Nish NISH1001

💭
Staring into the abyss...
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@cedrickchee
cedrickchee / llama-7b-m1.md
Last active July 13, 2024 04:59
4 Steps in Running LLaMA-7B on a M1 MacBook with `llama.cpp`

4 Steps in Running LLaMA-7B on a M1 MacBook

The large language models usability

The problem with large language models is that you can’t run these locally on your laptop. Thanks to Georgi Gerganov and his llama.cpp project, it is now possible to run Meta’s LLaMA on a single computer without a dedicated GPU.

Running LLaMA

There are multiple steps involved in running LLaMA locally on a M1 Mac after downloading the model weights.

@karpathy
karpathy / stablediffusionwalk.py
Last active July 17, 2024 06:48
hacky stablediffusion code for generating videos
"""
stable diffusion dreaming
creates hypnotic moving videos by smoothly walking randomly through the sample space
example way to run this script:
$ python stablediffusionwalk.py --prompt "blueberry spaghetti" --name blueberry
to stitch together the images, e.g.:
$ ffmpeg -r 10 -f image2 -s 512x512 -i blueberry/frame%06d.jpg -vcodec libx264 -crf 10 -pix_fmt yuv420p blueberry.mp4
@unrealwill
unrealwill / collisionLSH.py
Created August 8, 2021 10:20
Proof of Concept : generating collisions on a neural perceptual hash
import tensorflow as tf #We need tensorflow 2.x
import numpy as np
#The hashlength in bits
hashLength = 256
def buildModel():
#we can set the seed to simulate the fact that this network is known and doesn't change between runs
#tf.random.set_seed(42)
model = tf.keras.Sequential()
@0xabad1dea
0xabad1dea / copilot-risk-assessment.md
Last active September 11, 2023 10:21
Risk Assessment of GitHub Copilot

Risk Assessment of GitHub Copilot

0xabad1dea, July 2021

this is a rough draft and may be updated with more examples

GitHub was kind enough to grant me swift access to the Copilot test phase despite me @'ing them several hundred times about ICE. I would like to examine it not in terms of productivity, but security. How risky is it to allow an AI to write some or all of your code?

Ultimately, a human being must take responsibility for every line of code that is committed. AI should not be used for "responsibility washing." However, Copilot is a tool, and workers need their tools to be reliable. A carpenter doesn't have to

@yoavg
yoavg / stochastic-critique.md
Last active November 9, 2023 04:32
A criticism of Stochastic Parrots

A criticism of "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big"

Yoav Goldberg, Jan 23, 2021.

The FAccT paper "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big" by Bender, Gebru, McMillan-Major and Shmitchell has been the center of a controversary recently. The final version is now out, and, owing a lot to this controversary, would undoubtly become very widely read. I read an earlier draft of the paper, and I think that the new and updated final version is much improved in many ways: kudos for the authors for this upgrade. I also agree with and endorse most of the content. This is important stuff, you should read it.

However, I do find some aspects of the paper (and the resulting discourse around it and around technology) to be problematic. These weren't clear to me when initially reading the first draft several months ago, but they became very clear to me now. These points are for the most part

@onlurking
onlurking / programming-as-theory-building.md
Last active July 22, 2024 13:14
Programming as Theory Building - Peter Naur

Programming as Theory Building

Peter Naur

Peter Naur's classic 1985 essay "Programming as Theory Building" argues that a program is not its source code. A program is a shared mental construct (he uses the word theory) that lives in the minds of the people who work on it. If you lose the people, you lose the program. The code is merely a written representation of the program, and it's lossy, so you can't reconstruct

I was drawn to programming, science, technology and science fiction
ever since I was a little kid. I can't say it's because I wanted to
make the world a better place. Not really. I was simply drawn to it
because I was drawn to it. Writing programs was fun. Figuring out how
nature works was fascinating. Science fiction felt like a grand
adventure.
Then I started a software company and poured every ounce of energy
into it. It failed. That hurt, but that part is ok. I made a lot of
mistakes and learned from them. This experience made me much, much

last edited on 2020-05-04 (please see edit history for changes)

See the results here: https://guzey.com/science/sleep/14-day-sleep-deprivation-self-experiment/


This is the protocol for my sleep experiment that I will start on 2020-04-03. I will sleep 4 hours per night for 2 weeks and evaluate the effects of doing so on my cognition using psychomotor vigilance task (essentially, a reaction test), SAT (a 3 hour test that involves reading and math), and Aimgod custom scenario I called guzey_arena_0 (video) (this is a first-person shooter trainer game that allows the creation of custom scenarios. The scenario I created requires constant attention, eye-hand coordination, tactical thinking).

Methods