Skip to content

Instantly share code, notes, and snippets.

@nadavrot
nadavrot / Matrix.md
Last active May 22, 2024 13:38
Efficient matrix multiplication

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of

import timeit
import torch
from torch import nn
from torch.nn import functional as F
class Flags:
pass
# referenced @chhillee's https://github.com/pytorch/functorch/blob/main/functorch/_src/nnc_compile.py
from typing import Callable, Dict, Any, List
from functools import partial
import numpy as np
import torch
import torch._C._te as te
from jax import core
@veekaybee
veekaybee / normcore-llm.md
Last active May 27, 2024 10:55
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models