Skip to content

Instantly share code, notes, and snippets.

View manncodes's full-sized avatar
💭
<think> ... </think>

Mann Patel manncodes

💭
<think> ... </think>
View GitHub Profile
@Chillee
Chillee / softmax_quack.py
Created July 10, 2025 21:07
Random Kernel Microbenchmarks
import argparse
import time
from typing import Type
import torch
import torch.nn.functional as F
import torch._inductor.config
torch._inductor.config.triton.multi_kernel = True
@tokenbender
tokenbender / train_modal_standalone.py
Last active October 12, 2025 06:57
standalone serverless simple character level transformer
import os
import sys
import time
import math
import pickle
from contextlib import nullcontext
from pathlib import Path
import subprocess
from dataclasses import dataclass
import inspect

Learning LLMs in 2025

So you know how the transformer works, and you know basic ML/DL, and you want to learn more about LLMs. One way to go is looking into the various "algorithmic" stuff (optimization algorithms, RL, DPO, etc). Lot's of materials on that. But the interesting stuff is (in my opinion at least) not there.

This is an attempt to collect a list of academic (or academic-like) materials that explore LLMs from other directions, and focus on the non-ML-algorithmic aspects.

Courses

  • David Chiang's Theory of Neural Networks course.
  • This is not primarily LLMs, but does have substantial section on Transformers. Formal/Theory. More of a book than a course.
@qunash
qunash / grpo_qwen-0-5b_single_t4.ipynb
Last active October 15, 2025 03:21
grpo_qwen-0-5b_single_t4.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@willccbb
willccbb / grpo_demo.py
Last active October 25, 2025 16:39
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
"""
citation:
@misc{brown2025grpodemo,
title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
author={Brown, William},
Begin by enclosing all thoughts within <thinking> tags, exploring multiple angles and approaches.
Break down the solution into clear steps within <step> tags. Start with a 20-step budget, requesting more for complex problems if needed.
Use <count> tags after each step to show the remaining budget. Stop when reaching 0.
Continuously adjust your reasoning based on intermediate results and reflections, adapting your strategy as you progress.
Regularly evaluate progress using <reflection> tags. Be critical and honest about your reasoning process.
Assign a quality score between 0.0 and 1.0 using <reward> tags after each reflection. Use this to guide your approach:
0.8+: Continue current approach
0.5-0.7: Consider minor adjustments
Below 0.5: Seriously consider backtracking and trying a different approach
@nreHieW
nreHieW / transformer.py
Created July 9, 2024 13:36
2024 Noam Transformer
"""
The 2024 Transformer (the Noam Transformer):
- RMSNorm
- GQA or some combination
- Sliding window attention
- Swiglu
- RoPE (Rotary Positional Embedding)
LLM Arches:
hidden | MLP mult. | n_layers | rope_theta | GQA Group Size | GLU Act. | ops
@manncodes
manncodes / script.js
Last active April 27, 2024 09:26
data scrapper js script for extracting isin number from the site `https://www.isin.com/isin-database/`
// 1. login to https://www.isin.com/ and then head over to https://www.isin.com/isin-database/ page
// 2. open the chrome console by pressing F12, and paste the script inside the console.
// output: a json file named `all-data.json` containing list of dictionaries of info about the companies.
const org = ["nobiskrug", "swm", "vattenfall",
// and all the other companies you need to extract information about
];
function downloadJSON(data) {
const dataStr = JSON.stringify(data, null, 2);
const blob = new Blob([dataStr], { type: 'application/json' });
@manncodes
manncodes / fast-tensor-dataloader.py
Created September 12, 2022 17:52
20x speed up for tabular tensor data
import torch
class FastTensorDataLoader:
"""
A DataLoader-like object for a set of tensors that can be much faster than
TensorDataset + DataLoader because dataloader grabs individual indices of
the dataset and calls cat (slow).
Source: https://discuss.pytorch.org/t/dataloader-much-slower-than-manual-batching/27014/6
"""
def __init__(self, *tensors, batch_size=32, shuffle=False):
@parmentf
parmentf / GitCommitEmoji.md
Last active October 24, 2025 22:54
Git Commit message Emoji