Wen-Ding Li xu3kev

## impossible_prompt.txt
A::B is a system with 4 tokens: `A#`, `#A`, `B#` and `#B`.

An A::B program is a sequence of tokens. Example:

    B# A# #B #A B#

To *compute* a program, we must rewrite neighbor tokens, using the rules:

    A# #A ... becomes ... nothing
    A# #B ... becomes ... #B A#

## normcore-llm.md

      
              1 file
            
          
              218 forks
            
          
              38 comments
            
          
              2784 stars
            
          
                veekaybee
                / normcore-llm.md
            
            
              Last active
              July 30, 2024 00:54
            
              
                Normcore LLM Reads
              
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models


## rl-for-llms.md

      
              1 file
            
          
              26 forks
            
          
              11 comments
            
          
              543 stars
            
          
                yoavg
                / rl-for-llms.md
            
            
              Last active
              July 23, 2024 08:52
            
          
    Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.
Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback".
I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

  
## RLHF.md

      
              1 file
            
          
              8 forks
            
          
              37 comments
            
          
              112 stars
            
          
                JoaoLages
                / RLHF.md
            
            
              Last active
              July 26, 2024 01:10
            
              
                Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation 
              
          
    Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.
We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈
RLHF is especially useful in two scenarios 🌟:

You can’t create a good loss function

Example: how do you calculate a metric to measure if the model’s output was funny?


You want to train with production data, but you can’t easily label your production data


## minididax.py
from __future__ import annotations
from contextlib import contextmanager
from typing import NamedTuple, Callable, Optional, Any
import numpy as np

Array = Any

class Node(NamedTuple):
  vjp: Optional[Callable]
  parents: List[Node]

## aarch64_amx.py
# IDA (disassembler) and Hex-Rays (decompiler) plugin for Apple AMX
#
# WIP research. (This was edited to add more info after someone posted it to
# Hacker News. Click "Revisions" to see full changes.)
#
# Copyright (c) 2020 dougallj


# Based on Python port of VMX intrinsics plugin:
# Copyright (c) 2019 w4kfu - Synacktiv

## gist:ab0e6ab555362c5e479d6311c4540bbd
syncing geth on digital ocean

- name: eth2-mainnet-00
- sync time: (failed to sync, bound by disk perf)
- region: fra1
- type: s-8vcpu-16gb
- primaryDb: attached volume
- ancientDb: attached volume
- price vps: $0.119/hr
- price volume: $0.052/hr 350gb

## analysis.draft.md

      
              1 file
            
          
              54 forks
            
          
              0 comments
            
          
              339 stars
            
          
                MattPD
                / analysis.draft.md
            
            
              Last active
              July 26, 2024 00:29
            
              
                Program Analysis Resources (WIP draft) 
              
          
    Program Analysis Resources

(draft; work in progress)
See also:

Compilers

correctness


Program analysis:
Dynamic analysis - instrumentation, translation, sanitizers


## FastUniformLoadWithWaveOps.txt
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.

Simplified HLSL code looks like this:

Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;

[numthreads(8, 8, 1)]

## Matrix.md

      
              7 files
            
          
              73 forks
            
          
              17 comments
            
          
              875 stars
            
          
                nadavrot
                / Matrix.md
            
            
              Last active
              July 29, 2024 00:58
            
              
                Efficient matrix multiplication
              
          
    High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix
multiplication program on modern processors. In this tutorial I will use a
single core of the Skylake-client CPU with AVX2, but the principles in this post
also apply to other processors with different instruction sets (such as AVX512).
Intro

Matrix multiplication is a mathematical operation that defines the product of
	A::B is a system with 4 tokens: `A#`, `#A`, `B#` and `#B`.

	An A::B program is a sequence of tokens. Example:

	B# A# #B #A B#

	To compute a program, we must rewrite neighbor tokens, using the rules:

	A# #A ... becomes ... nothing
	A# #B ... becomes ... #B A#
	from __future__ import annotations
	from contextlib import contextmanager
	from typing import NamedTuple, Callable, Optional, Any
	import numpy as np

	Array = Any

	class Node(NamedTuple):
	vjp: Optional[Callable]
	parents: List[Node]
	# IDA (disassembler) and Hex-Rays (decompiler) plugin for Apple AMX
	#
	# WIP research. (This was edited to add more info after someone posted it to
	# Hacker News. Click "Revisions" to see full changes.)
	#
	# Copyright (c) 2020 dougallj


	# Based on Python port of VMX intrinsics plugin:
	# Copyright (c) 2019 w4kfu - Synacktiv
	syncing geth on digital ocean

	- name: eth2-mainnet-00
	- sync time: (failed to sync, bound by disk perf)
	- region: fra1
	- type: s-8vcpu-16gb
	- primaryDb: attached volume
	- ancientDb: attached volume
	- price vps: $0.119/hr
	- price volume: $0.052/hr 350gb
	In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
	group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.

	Simplified HLSL code looks like this:

	Buffer<float4> lightDatas;
	Texture2D<uint2> lightStartCounts;
	RWTexture2D<float4> output;

	[numthreads(8, 8, 1)]