veekaybee/normcore-llm.md

## normcore-llm.md

      
    Raw
  

              normcore-llm.md
            
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models


The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)
Transformers as Support Vector Machines
Survey of LLMS
Deep Learning Systems
Fundamental ML Reading List

Building Blocks


What are embeddings
Concepts from Operating Systems that Found their way into LLMS
Talking about Large Language Models
Language Modeling is Compression
Vector Search - Long-Term Memory in AI
Eight things to know about large language models
The Bitter Lesson
The Hardware Lottery
The Scaling Hypothesis
Tokenization
LLM Course

Foundational Deep Learning Papers (in semi-chronological order)


Seq2Seq
Attention is all you Need
BERT
GPT-1
Scaling Laws for Neural Language Models
T5
GPT-2: Language Models are Unsupervised Multi-Task Learners
InstructGPT: Training Language Models to Follow Instructions
GPT-3: Language Models are Few-Shot Learners

The Transformer Architecture


Transformers from Scratch
Transformer Math
Five Years of GPT Progress
Lost in the Middle: How Language Models Use Long Contexts

Attention


Self-attention and transformer networks
Attention
Understanding and Coding the Attention Mechanism
Attention Mechanisms
Keys, Queries, and Values

GPT


What is ChatGPT doing and why does it work
My own notes from a few months back.
Karpathy's The State of GPT (YouTube)
OpenAI Cookbook

Significant OSS Models


Llama2
Mistral7B

Mixtral


Phi2
Falcon7B

LLMs in 2023


Catching up on the weird world of LLMS
How open are open architectures?
Building an LLM from Scratch
Large Language Models in 2023 and Slides
Timeline of Transformer Models
Large Language Model Evolutionary Tree

Training Data


What's in my Big Data
"The “it” in AI models is the dataset."
Extracting Training Data from ChatGPT

Pre-Training


Why host your own LLM?
How to train your own LLMs
Hugging Face Resources on Training Your Own
Training Compute-Optimal Large Language Models
Opt-175B Logbook

RLHF and DPO


RLHF

Supervised Fine-tuning
How Abilities in LLMs Are Affected by SFT


Instruction-tuning for LLMs: Survey
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
RLHF and DPO Compared

Fine-Tuning and Compression


The Complete Guide to LLM Fine-tuning
LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language - Really great overview of SOTA fine-tuning techniques
On the Structural Pruning of Large Language Models
Quantiztion

A Gentle Introduction to 8-bit matrix multiplication
Which Quantization Method is Right for You?
Survey of Quantization for Inference


PEFT

Fine-tuning with LoRA and QLoRA
Adapters
Motivation for Parameter-Efficient Fine-tuning


Small and Local LLMs


How is LlamaCPP Possible?
How to beat GPT-4 with a 13-B Model
Efficient LLM Inference on CPUs
Tiny Language Models Come of Age
Efficiency LLM Spectrum
TinyML at MIT

Deployment and Production


Building LLM Applications for Production
Challenges and Applications of Large Language Models
All the Hard Stuff Nobody talks about when building products with LLMs 
Scaling Kubernetes to run ChatGPT
Numbers every LLM Developer should know
Against LLM Maximalism
A Guide to Inference and Performance
(InThe)WildChat: 570K ChatGPT Interaction Logs In The Wild
The State of Production LLMs in 2023
Machine Learning Engineering for successful training of large language models and multi-modal models.
Fine-tuning RedPajama on Slack Data

LLM Inference and K-V Cache


LLM Inference Performance Engineering: Best Practices
How to Make LLMs go Fast
Transformer Inference Arithmetic
Which serving technology to use for LLMs?
Speeding up the K-V cache
Large Transformer Model Inference Optimization

Prompt Engineering and RAG


On Prompt Engineering
Prompt Engineering Versus Blind Prompting
Building RAG-Based Applications for Production
Full Fine-Tuning, PEFT, or RAG?
Prompt Engineering Guide

GPUs


The Best GPUS for Deep Learning 2023
Making Deep Learning Go Brr from First Principles
Everything about Distributed Training and Efficient Finetuning
Training LLMs at Scale with AMD MI250 GPUs
GPU Programming

Evaluation


Evaluating ChatGPT
ChatGPT: Jack of All Trades, Master of None
What's Going on with the Open LLM Leaderboard
Challenges in Evaluating AI Systems
LLM Evaluation Papers
Evaluating LLMs is a MineField

Eval Frameworks


HELM
LM Eval Harness
LmSys Chatbot Arena

UX


Generative Interfaces Beyond Chat (YouTube)
Why Chatbots are not the Future
The Future of Search is Boutique
As a Large Language Model, I
Natural Language is an Unnatural Interface

What's Next?

Thanks to everyone who added suggestions on Twitter, Mastodon, and Bluesky.