Skip to content

Instantly share code, notes, and snippets.

@rogarg
rogarg / vectorized-atan2f.cpp
Created August 18, 2021 02:38 — forked from bitonic/vectorized-atan2f.cpp
Vectorized & branchless atan2f
// Branchless, vectorized `atan2f`. Various functions of increasing
// performance are presented. The fastest version is 50~ faster than libc
// on batch workloads, outputing a result every ~2 clock cycles, compared to
// ~110 for libc. The functions all use the same `atan` approximation, and their
// max error is around ~1/10000 of a degree.
//
// They also do not handle inf / -inf
// and the origin as an input as they should -- in our case these are a sign
// that something is wrong anyway. Moreover, manual_2 does not handle NaN
// correctly (it drops them silently), and all the auto_ functions do not
@hellerbarde
hellerbarde / latency.markdown
Created May 31, 2012 13:16 — forked from jboner/latency.txt
Latency numbers every programmer should know

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs