-
-
Save jboner/2841832 to your computer and use it in GitHub Desktop.
Latency Comparison Numbers (~2012) | |
---------------------------------- | |
L1 cache reference 0.5 ns | |
Branch mispredict 5 ns | |
L2 cache reference 7 ns 14x L1 cache | |
Mutex lock/unlock 25 ns | |
Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
Compress 1K bytes with Zippy 3,000 ns 3 us | |
Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD | |
Read 1 MB sequentially from memory 250,000 ns 250 us | |
Round trip within same datacenter 500,000 ns 500 us | |
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory | |
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip | |
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD | |
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms | |
Notes | |
----- | |
1 ns = 10^-9 seconds | |
1 us = 10^-6 seconds = 1,000 ns | |
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns | |
Credit | |
------ | |
By Jeff Dean: http://research.google.com/people/jeff/ | |
Originally by Peter Norvig: http://norvig.com/21-days.html#answers | |
Contributions | |
------------- | |
'Humanized' comparison: https://gist.github.com/hellerbarde/2843375 | |
Visual comparison chart: http://i.imgur.com/k0t1e.png |
What about register access timings?
Markdown version :p
Operation | ns | µs | ms | note |
---|---|---|---|---|
L1 cache reference | 0.5 ns | |||
Branch mispredict | 5 ns | |||
L2 cache reference | 7 ns | 14x L1 cache | ||
Mutex lock/unlock | 25 ns | |||
Main memory reference | 100 ns | 20x L2 cache, 200x L1 cache | ||
Compress 1K bytes with Zippy | 3,000 ns | 3 µs | ||
Send 1K bytes over 1 Gbps network | 10,000 ns | 10 µs | ||
Read 4K randomly from SSD* | 150,000 ns | 150 µs | ~1GB/sec SSD | |
Read 1 MB sequentially from memory | 250,000 ns | 250 µs | ||
Round trip within same datacenter | 500,000 ns | 500 µs | ||
Read 1 MB sequentially from SSD* | 1,000,000 ns | 1,000 µs | 1 ms | ~1GB/sec SSD, 4X memory |
Disk seek | 10,000,000 ns | 10,000 µs | 10 ms | 20x datacenter roundtrip |
Read 1 MB sequentially from disk | 20,000,000 ns | 20,000 µs | 20 ms | 80x memory, 20X SSD |
Send packet CA -> Netherlands -> CA | 150,000,000 ns | 150,000 µs | 150 ms |
@jboner What do you think about adding cryptography numbers to the list? I feel like that would be a really valuable addition to the list for comparison. Especially as cryptography usage increases and becomes more common.
We could for instance add Ed25519 latency for cryptographic signing and verification. In a very rudimentary testing I did locally I got:
- Ed25519 Signing - 254.20µs
- Ed25519 Verification - 368.20µs
You can replicate the results with the following rust program:
fn main() {
println!("Hello, world!");
let msg = b"lfasjhfoihjsofh438948hhfklshfosiuf894y98s";
let sk = ed25519_zebra::SigningKey::new(rand::thread_rng());
let now = std::time::Instant::now();
let sig = sk.sign(msg);
println!("{:?}", sig);
let elapsed = now.elapsed();
println!("Elapsed: {:.2?}", elapsed);
let vk = ed25519_zebra::VerificationKey::from(&sk);
let now = std::time::Instant::now();
vk.verify(&sig, msg).unwrap();
let elapsed = now.elapsed();
println!("Elapsed: {:.2?}", elapsed);
}
What is "Zippy"? Is it a google internal compression software?
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
this seems misleading, since in common networking terminology 1 Gbps refers to throughput ("size of the pipe"), but this list is about "latency," which is generally independent of throughput - it takes the same amount of time to send 1K bytes over a 1 Mbps network and a 1 Gbps network
A better description of this measure sounds like "bit rate," or more specifically the "data signaling rate" (DSR) over some communications medium (like fiber). This also avoids the ambiguity of "over" the network (how much distance?) because DSR measures "aggregate rate at which data passes a point" instead of a segment.
Using this definition (which I just learned a minute ago), perhaps a better label would be:
- Send 1K bytes over 1 Gbps network 10,000 ns 10 us
+ Transfer 1K bytes over a point on a 1 Gbps fiber channel 10,000 ns 10 us
🤷 (also, I didn't check if the math is consistent with this labeling, but I did pull "fiber channel" from the table on the DSR wiki page)
Thanks for sharing your updates.
You could consider adding a context switch for threads right under disk seek:
computer context switches: 1e7 ns
I see "Read 1 MB sequentially from disk", but how about disk write?
the numbers are from Dr. Dean from Google reveals the length of typical computer operations in 2010. I hope someone could update them as it's 2023
The numbers should be still quite similar.
These numbers based on Physical limitation only significant technological leap can make a difference.
In any case, these are for estimates, not exact calculation. For example, 1MB read from SSD is different for each SSD, but it should be somewhere around the Millisecond range.
it could be useful to add a column with the sizes in the hierarchy. Also, a column of the minimal memory units sizes, the cache line sizes etc. Then you can also divide the sizes by the latencies, which would be some kind of limit for a simple algorithm throughput. Not really sure if this is useful though.
useful information & thanks