TL;DR: We're building an LLM that can codegenerate efficient CUDA kernels in public. Today models like ChatGPT are terrible at systems programming because they don't seem to understand how GPUs work and frequently hallucinate. However projects like llm.c with a smart human in the loop with an LLM have shown us that it should be possible to make this happen. There's a lot we need to innovate on both in terms of how we create more kernel tokens, what are the right abstractions LLMs should use, how to scale test-time compute and considering how hard this we want to do everything in public in Discord. We will share infra, loss curves, chat messages all on Discord and try to include as many people as possible so we can actually crack this problem
We're distributed research effort so we mostly chat async on discord.gg/gpumode on the popcorn channel
If you prefer longer form content you can check out https://drive.google.com/drive/folders/1nt2KcRRKb8YdySxkRxUu5PR4c7UPM_rK