Skip to content

Instantly share code, notes, and snippets.

@MattPD
MattPD / analysis.draft.md
Last active May 3, 2021
Program Analysis Resources (WIP draft)
View analysis.draft.md
@nadavrot
nadavrot / Matrix.md
Last active May 5, 2021
Efficient matrix multiplication
View Matrix.md

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of

@kevin-smets
kevin-smets / iterm2-solarized.md
Last active May 6, 2021
iTerm2 + Oh My Zsh + Solarized color scheme + Source Code Pro Powerline + Font Awesome + [Powerlevel10k] - (macOS)
View iterm2-solarized.md

Default

Default

Powerlevel10k

Powerlevel10k

@rygorous
rygorous / magic_ring.cpp
Created Jul 22, 2012
The magic ring buffer.
View magic_ring.cpp
#define _CRT_SECURE_NO_DEPRECATE
#include <stdio.h>
#include <string.h>
#include <Windows.h>
// This allocates a "magic ring buffer" that is mapped twice, with the two
// copies being contiguous in (virtual) memory. The advantage of this is
// that this allows any function that expects data to be contiguous in
// memory to read from (or write to) such a buffer. It also means that
@jboner
jboner / latency.txt
Last active May 5, 2021
Latency Numbers Every Programmer Should Know
View latency.txt
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD