Skip to content

Instantly share code, notes, and snippets.

View wjakob's full-sized avatar

Wenzel Jakob wjakob

View GitHub Profile
@wjakob
wjakob / avx512_transpose.cpp
Created May 2, 2024 07:24 — forked from nihui/avx512_transpose.cpp
avx512 16x24 16x16 16x12 16x8 16x4 16x2 8x24 8x16 8x12 8x8 8x4 8x2 matrix transpose
// g++ -mfma -mf16c -mavx512f -mavx512vnni -mavx512vl
#include <immintrin.h>
#include <stdio.h>
static void print(const __m512& _x)
{
__attribute__((aligned(64)))
float a[16];
@wjakob
wjakob / latency.markdown
Created September 5, 2017 19:15 — forked from hellerbarde/latency.markdown
Latency numbers every programmer should know

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs