aagontuk/llm.md

## llm.md

      
    Raw
  

              llm.md
            
          
    Readings


Mastering LLM Techniques: Inference Optimization
LLM Inference Series: 1. Introduction
How Transformers Work: A Detailed Exploration of Transformer Architecture
DeepSpeed Deep Dive
Transformers Explained Visually 
The Illustrated Transformer
All you need to know about ‘Attention’ and ‘Transformers’ — In-depth Understanding — Part 1
All you need to know about ‘Attention’ and ‘Transformers’ — In-depth Understanding — Part 2
Build your own Transformer from scratch using Pytorch
A collection of resources to study Transformers in depth
Numbers every LLM Developer should know
LLM Inference Series: 3. KV caching explained

Tools and Libraries


NVIDIA Collective Communications Library (NCCL)