Skip to content

Instantly share code, notes, and snippets.

@veekaybee
Last active May 27, 2024 10:55
Show Gist options
  • Save veekaybee/be375ab33085102f9027853128dc5f0e to your computer and use it in GitHub Desktop.
Save veekaybee/be375ab33085102f9027853128dc5f0e to your computer and use it in GitHub Desktop.
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

Screenshot 2023-12-18 at 8 25 42 PM

Building Blocks

Foundational Deep Learning Papers (in semi-chronological order)

The Transformer Architecture

Screenshot 2023-12-18 at 8 37 44 PM

Attention

GPT

Screenshot 2023-12-18 at 8 37 44 PM

Significant OSS Models

LLMs in 2023

Screenshot 2023-12-18 at 10 07 57 PM

Training Data

Pre-Training

RLHF and DPO

Screenshot 2023-12-18 at 10 07 57 PM

Fine-Tuning and Compression

Small and Local LLMs

Deployment and Production

LLM Inference and K-V Cache

Prompt Engineering and RAG

GPUs

Screenshot 2023-12-18 at 10 02 48 PM

Evaluation

Eval Frameworks

UX

What's Next?

Thanks to everyone who added suggestions on Twitter, Mastodon, and Bluesky.

@tekumara
Copy link

Transformer Math 101

We present basic math related to computation and memory usage for transformers

@tekumara
Copy link

@san7988
Copy link

san7988 commented Aug 20, 2023

Posts from eugene yan are also pretty good read.

@butsugiri
Copy link

Hi, thank you for your great work!

Training Your Own

I wonder if we can add huggingface/llm_training_handbook: An open collection of methodologies to help with successful training of large language models. to this section?

@lcrmorin
Copy link

It seems that the gzip approach, altough really cool, was 'optimistic' and thus overhyped, see: https://kenschutte.com/gzip-knn-paper/ (basiccaly they confused k in k-nn and top-k accuracy, reporting top-2 accuracy). More recent studies found that it is, as expected, on 'bag of words' performance level Gzip versus bag-of-words for text classification.

I don't know if you intend to (or are even interested) but I am on the look out for "usecases for normies".

@wrhall
Copy link

wrhall commented Aug 22, 2023

Do you think it's worth annotating with dates of the articles / papers / videos?

@veekaybee
Copy link
Author

@veekaybee
Copy link
Author

It seems that the gzip approach, altough really cool, was 'optimistic' and thus overhyped, see: https://kenschutte.com/gzip-knn-paper/ (basiccaly they confused k in k-nn and top-k accuracy, reporting top-2 accuracy). More recent studies found that it is, as expected, on 'bag of words' performance level Gzip versus bag-of-words for text classification.

I don't know if you intend to (or are even interested) but I am on the look out for "usecases for normies".

Yeah that was my read on it as well but I'm also very interested in it as a general theoretical approach and baseline, even if this particular implementation doesn't work.

@veekaybee
Copy link
Author

Do you think it's worth annotating with dates of the articles / papers / videos?

Maybe would be helpful but I explicitly picked stuff that I thought wouldn't age and/or where the recency didn't matter because the fundamentals are timeless.

@janhesse53
Copy link

Patterns for Building LLM-based Systems & Products

In my opinion, this is a super in depth article that covers many of the categories and deserves a place in the reading list.

@rmitsch
Copy link

rmitsch commented Aug 25, 2023

Against LLM maximalism (disclaimer: I work at Explosion)

@Lykos2
Copy link

Lykos2 commented Aug 27, 2023

Patterns for Building LLM-based Systems & Products In my opinion, this is a super in depth article that covers many of the categories and deserves a place in the reading list.

detailed blog

@spmurrayzzz
Copy link

The Illustrated Transformer - maybe redundant from some of the other transformers content here, but is very well-written and strikes a good balance between prose and visual aids.

@satisfice
Copy link

ChatGPT Sucks at Being a Testing Expert
https://www.satisfice.com/download/appendix-chatgpt-sucks-at-being-a-testing-expert?wpdmdl=487569

This is a careful analysis of an attempt to demonstrate ChatGPT’s usefulness to help testers.

@davidzshi
Copy link

@Tulip4attoo
Copy link

My writing as an extension to "Why you should host your LLM?" article, with some adding on operation perspective: https://tulip4attoo.substack.com/p/why-you-should-host-your-llm-from

@ghosthamlet
Copy link

Maybe Foundational Papers should include the first instruction-tuned model FLAN (no RLHF):
Finetuned Language Models Are Zero-Shot Learners: https://arxiv.org/abs/2109.01652

@timbornholdt
Copy link

I gave a talk about prompt engineering for normal people and turned it into a pretty decent article, might be useful for the list too? https://timbornholdt.com/blog/prompt-engineering-how-to-think-like-an-ai

@emilymbender
Copy link

The Stochastic Parrots paper presents many things that anyone should be cognisant of when deciding whether or not to use an LLM:

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In Proceedings of FAccT 2021, pp.610-623.

@livc
Copy link

livc commented Aug 29, 2023

We are exploring the landing and commercialization scenarios of AI Agent at https://askgen.ie, and currently we think customer support is a good scenario

@umair-nasir14
Copy link

@livc Are you guys hiring?

@umair-nasir14
Copy link

@Sharrp
Copy link

Sharrp commented Aug 29, 2023

I found "Five years of GPT progress" to be a useful overview of the influential papers on GPT.
https://finbarr.ca/five-years-of-gpt-progress/
May work as a high-level summary for "Foundational Papers" section.

p.s. Thank you for compiling the list!

@hkniberg
Copy link

hkniberg commented Aug 30, 2023

Hi! This is great! Would be even more useful to write the year/month of publication next to each item, to get a sense of which links are more up-to-date and which are more historical.

@will-thompson-k
Copy link

will-thompson-k commented Aug 30, 2023

I really like this list, sad I just discovered this 😎 .

I am not sure if this would complement your Background section, but I wrote this as a primer on LLMs last month: https://willthompson.name/what-we-know-about-llms-primer.

But I don't know, might not be very orthogonal to your other sources here 🤷 .

@AnnthomyGILLES
Copy link

An overview of vector database. The author highlight the differences between the various vector databases out there as visually as possible.

https://thedataquarry.com/posts/vector-db-1/

@davidzshi
Copy link

An overview of vector database. The author highlight the differences between the various vector databases out there as visually as possible.

https://thedataquarry.com/posts/vector-db-1/

This is really helpful, thank you!

@tekumara
Copy link

@lcrmorin
Copy link

I keep coming back to this list. However I feel like it miss a good discussion about current stuff not working. I keep failling to implement working stuff, despite lenghty theoretical works, and when I scratch the veneer I keep getting the same answer: "technology is not ready yet".

@lcrmorin
Copy link

lcrmorin commented Dec 29, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment