artkpv/latency.txt

## latency.txt
Latency Comparison Numbers Simplified (~2012)
----------------------------------   log2 log10
L1 cache reference                     0    0           0.5   ns
Branch mispredict                      3    1             5   ns
L2 cache reference                     3    1             7   ns
Mutex lock/unlock                      5    2            25   ns
Main memory reference                  8    2           100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy          14    4        10,000   ns       10 us
Send 1K bytes over 1 Gbps network     14    4        10,000   ns       10 us
Read 4K randomly from SSD*            18    5       150,000   ns      150 us          ~1GB/sec SSD
Read 1 MB sequentially from memory    18    6       250,000   ns      250 us
Round trip within same datacenter     19    6       500,000   ns      500 us
Read 1 MB sequentially from SSD*      20    6     1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
Disk seek                             24    7    10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps    24    7    10,000,000   ns   10,000 us   10 ms  40x memory, 10X SSD
Read 1 MB sequentially from disk      25    8    30,000,000   ns   30,000 us   30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA       28    9   150,000,000   ns  150,000 us  150 ms

Notes
-----
The log2 and log10 columns have logarithm computations (rounded to an integer power)
relative to the first item on the list (L1 cache reference). I find that the exact
numbers are not terribly important. It is much simpler to reason about relative
dimensions in this format---simply take two elements in a column and subtract
them. For example, using the log10 column, "Read 1 MB sequentially from memory"
is about 1000000 slower than "L1 cache reference" and 10000 slower than
"Main memory reference". For very rough back-of-the-envelope computations, I often
use 2^10 ~ 10^3 and 2^3 ~ 10^1, e.g., "Send 1K bytes over 1 Gbps network" is about
16000 slower than "L1 cache reference". The logarithms were computed from the
numbers found in https://gist.github.com/jboner/2841832.
	Latency Comparison Numbers Simplified (~2012)
	---------------------------------- log2 log10
	L1 cache reference 0 0 0.5 ns
	Branch mispredict 3 1 5 ns
	L2 cache reference 3 1 7 ns
	Mutex lock/unlock 5 2 25 ns
	Main memory reference 8 2 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 14 4 10,000 ns 10 us
	Send 1K bytes over 1 Gbps network 14 4 10,000 ns 10 us
	Read 4K randomly from SSD* 18 5 150,000 ns 150 us ~1GB/sec SSD
	Read 1 MB sequentially from memory 18 6 250,000 ns 250 us
	Round trip within same datacenter 19 6 500,000 ns 500 us
	Read 1 MB sequentially from SSD* 20 6 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
	Disk seek 24 7 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
	Read 1 MB sequentially from 1 Gbps 24 7 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
	Read 1 MB sequentially from disk 25 8 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
	Send packet CA->Netherlands->CA 28 9 150,000,000 ns 150,000 us 150 ms

	Notes
	-----
	The log2 and log10 columns have logarithm computations (rounded to an integer power)
	relative to the first item on the list (L1 cache reference). I find that the exact
	numbers are not terribly important. It is much simpler to reason about relative
	dimensions in this format---simply take two elements in a column and subtract
	them. For example, using the log10 column, "Read 1 MB sequentially from memory"
	is about 1000000 slower than "L1 cache reference" and 10000 slower than
	"Main memory reference". For very rough back-of-the-envelope computations, I often
	use 2^10 ~ 10^3 and 2^3 ~ 10^1, e.g., "Send 1K bytes over 1 Gbps network" is about
	16000 slower than "L1 cache reference". The logarithms were computed from the
	numbers found in https://gist.github.com/jboner/2841832.