Skip to content

Instantly share code, notes, and snippets.

A::B is a system with 4 tokens: `A#`, `#A`, `B#` and `#B`.
An A::B program is a sequence of tokens. Example:
B# A# #B #A B#
To *compute* a program, we must rewrite neighbor tokens, using the rules:
A# #A ... becomes ... nothing
A# #B ... becomes ... #B A#
@veekaybee
veekaybee / normcore-llm.md
Last active July 30, 2024 00:54
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@JoaoLages
JoaoLages / RLHF.md
Last active July 26, 2024 01:10
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data
from __future__ import annotations
from contextlib import contextmanager
from typing import NamedTuple, Callable, Optional, Any
import numpy as np
Array = Any
class Node(NamedTuple):
vjp: Optional[Callable]
parents: List[Node]
# IDA (disassembler) and Hex-Rays (decompiler) plugin for Apple AMX
#
# WIP research. (This was edited to add more info after someone posted it to
# Hacker News. Click "Revisions" to see full changes.)
#
# Copyright (c) 2020 dougallj
# Based on Python port of VMX intrinsics plugin:
# Copyright (c) 2019 w4kfu - Synacktiv
@kumavis
kumavis / gist:ab0e6ab555362c5e479d6311c4540bbd
Created November 30, 2020 09:36
go-ethreum mainnet fast sync performance on digital ocean
syncing geth on digital ocean
- name: eth2-mainnet-00
- sync time: (failed to sync, bound by disk perf)
- region: fra1
- type: s-8vcpu-16gb
- primaryDb: attached volume
- ancientDb: attached volume
- price vps: $0.119/hr
- price volume: $0.052/hr 350gb
@MattPD
MattPD / analysis.draft.md
Last active July 26, 2024 00:29
Program Analysis Resources (WIP draft)
@sebbbi
sebbbi / FastUniformLoadWithWaveOps.txt
Last active May 11, 2024 07:37
Fast uniform load with wave ops (up to 64x speedup)
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.
Simplified HLSL code looks like this:
Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;
[numthreads(8, 8, 1)]
@nadavrot
nadavrot / Matrix.md
Last active July 29, 2024 00:58
Efficient matrix multiplication

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of