Skip to content

Instantly share code, notes, and snippets.

@jboner
Last active April 28, 2024 13:12
Show Gist options
  • Save jboner/2841832 to your computer and use it in GitHub Desktop.
Save jboner/2841832 to your computer and use it in GitHub Desktop.
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Credit
------
By Jeff Dean: http://research.google.com/people/jeff/
Originally by Peter Norvig: http://norvig.com/21-days.html#answers
Contributions
-------------
'Humanized' comparison: https://gist.github.com/hellerbarde/2843375
Visual comparison chart: http://i.imgur.com/k0t1e.png
@colin-scott
Copy link

Here's a tool to visualize these numbers over time: http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

@JensRantil
Copy link

I just created flash cards for this: https://ankiweb.net/shared/info/3116110484 They can be downloaded using the Anki application: http://ankisrs.net

@JensRantil
Copy link

I'm also missing something like "Send 1MB bytes over 1 Gbps network (within datacenter over TCP)". Or does that vary so much that it would be impossible to specify?

@kofemann
Copy link

kofemann commented Feb 9, 2013

If L1 access is a second, then:

L1 cache reference : 0:00:01
Branch mispredict : 0:00:10
L2 cache reference : 0:00:14
Mutex lock/unlock : 0:00:50
Main memory reference : 0:03:20
Compress 1K bytes with Zippy : 1:40:00
Send 1K bytes over 1 Gbps network : 5:33:20
Read 4K randomly from SSD : 3 days, 11:20:00
Read 1 MB sequentially from memory : 5 days, 18:53:20
Round trip within same datacenter : 11 days, 13:46:40
Read 1 MB sequentially from SSD : 23 days, 3:33:20
Disk seek : 231 days, 11:33:20
Read 1 MB sequentially from disk : 462 days, 23:06:40
Send packet CA->Netherlands->CA : 3472 days, 5:20:00

@kofemann
Copy link

kofemann commented Feb 9, 2013

You can add LTO4 tape seek/access time, ~ 55 sec, or 55.000.000.000 ns

@metakeule
Copy link

I'm missing things like sending 1K via Unix pipe/ socket / tcp to another process.
Has anybody numbers about that?

@shiplunc
Copy link

@metakeule its easily measurable.

@mnem
Copy link

mnem commented Jan 9, 2014

Related page from "Systems Performance" with similar second scaling mentioned by @kofemann: https://twitter.com/rzezeski/status/398306728263315456/photo/1

@izard
Copy link

izard commented May 29, 2014

L1D hit on a modern Intel CPU (Nehalem+) is at least 4 cycles. For a typical server/desktop at 2.5Ghz it is at least 1.6ns.
Fastest L2 hit latency is 11 cycles(Sandy Bridge+) which is 2.75x not 14x.
May be the numbers by Norwig were true at some time, but at least caches latency numbers are pretty constant since Nehalem which was 6 years ago.

@richa03
Copy link

richa03 commented Aug 21, 2014

Please note that Peter Norvig first published this expanded version (at that location - http://norvig.com/21-days.html#answers) ~JUL2010 (see wayback machine). Also, note that it was "Approximate timing for various operations on a typical PC".

@pdjonov
Copy link

pdjonov commented Oct 3, 2014

One light-nanosecond is roughly a foot, which is considerably less than the distance to my monitor right now. It's kind of surprising to realize just how much a CPU can get done in the time it takes light to traverse the average viewing distance...

@junhe
Copy link

junhe commented Jan 16, 2015

@jboner, I would like to cite some numbers in a formal publication. Who is the author? Jeff Dean? Which url should I cite? Thanks.

@weidagang
Copy link

I'd like to see the number for "Append 1 MB to file on disk".

@dhartford
Copy link

The "Send 1K bytes over 1 Gbps network" doesn't feel right, if you were comparing the 1MB sequential read of memory, SSD, Disk, the Gbps network for 1MB would be faster than disk (x1024), that doesn't feel right.

@leotm
Copy link

leotm commented May 2, 2015

A great solar system type visualisation: http://joshworth.com/dev/pixelspace/pixelspace_solarsystem.html

@misgeatgit
Copy link

Can you update the the Notes section with the following
1 ns = 10^-9 seconds
1 ms = 10^-3 seconds

Thanks.

@jboner
Copy link
Author

jboner commented Dec 13, 2015

@misgeatgit Updated

@juhovuori
Copy link

Zippy is nowadays called snappy. Might be worth updating. Tx for the gist.

@georgevreilly
Copy link

Several of the recent comments are spam. The links lead to sites in India which have absolutely nothing to do with latency.

@wenjianhn
Copy link

Are there any numbers about latency between NUMA nodes?

@vitaut
Copy link

vitaut commented Jan 31, 2016

Sequential SSD speed is actually more like 500 MB/s, not 1000 MB/s for SATA drives (http://www.tomshardware.com/reviews/ssd-recommendation-benchmark,3269.html).

@BruceGooch
Copy link

You really should cite the folks at Berkeley. Their site is interactive, has been up for 20 years, and it is where you "sourced" your visualization. http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

@julianeden
Copy link

Question~ do these numbers not vary from one set of hardware to the next? How can these be accurate for all different types of RAM, CPU, motherboard, hard drive, etc?

(I am primarily a front-end JS dev, I know little-to-nothing about this side of programming, where one must consider numbers involving RAM and CPU. Forgive me if I'm missing something obvious.)

@jlleblanc
Copy link

The link to the animated presentation is broken, here's the correct one: http://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development

@will-hu-0
Copy link

Love this one.

@profuel
Copy link

profuel commented Oct 5, 2016

Mentioned gist : https://gist.github.com/2843375 is private or was removed.
can someone restore it?
Thanks!

@trans
Copy link

trans commented Oct 9, 2016

It would be nice to be able to compare this to computation times -- How long to do an add, xor, multiply, or branch operation?

@mpron
Copy link

mpron commented Oct 12, 2016

Last year, I came up with this concept for an infographic illustrating these latency numbers with time analogies (if 1 CPU cycle = 1 second). Here was the result: http://imgur.com/8LIwV4C

@pawel-dubiel
Copy link

pawel-dubiel commented Jan 29, 2017

Most of these number were valid in 2000-2001, right now some of these numbers are wrong by an order of magnitude. ( especially reading from main memory, as DRAM bandwidth doubles every 3 years )

@maranomynet
Copy link

µs, not us

@GLMeece
Copy link

GLMeece commented Jan 31, 2017

I realize this was published some time ago, but the following URLs are no longer reachable/valid:

However, the second URL should now be: https://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/

Oh, and @mpron - nice!

@JustinNazari
Copy link

Thank you @jboner

@GLMeece
Copy link

GLMeece commented Jan 31, 2017

Note: I created my own "fork" of this.

@ValerieAnne563
Copy link

Thank you @GLMeece

@orestotel
Copy link

Google it

@knbknb
Copy link

knbknb commented Jun 24, 2017

Median human reaction time (to some stimulus showing up on a screen): 270 ms
(value probably increases with age)
https://www.humanbenchmark.com/tests/reactiontime/statistics

@SonalJha
Copy link

Awesome info. Thanks!

@keynan
Copy link

keynan commented Sep 22, 2017

Could you please add printf & fprintf to this list

@awilkins
Copy link

Heh, imagine this transposed into human distances.

1ns = 1 step, or 2 feet.

L1 cache reference = reaching 1 foot across your desk to pick something up
Datacentre roundtrip = 94 mile hike.
Internet roundtrip (California to Netherlands) = Walk around the entire earth. Wait! You're not done. Then walk from London, to Havana. Oh, and then to Jacksonville, Florida. Then you're done.

@benirule
Copy link

The last link is giving a 404

@ahartmetz
Copy link

The numbers "Read 1 MB sequentially from memory" mean a memory bandwidth of 4 GB/s. That is a very old number. Can you update it? The time should be roughly 1/5th - one core can do about 20 GB/s today, all cores of a 4 or 8 core about 40 GB/s together. I remember seeing 18-19 GB/s in memtest86 for single core on my Ryzen 1800X and there are several benchmarks floating around where all cores do about 40 GB/s. It is very hard to find anything on the web about single core memory bandwidth...

@jamalahmedmaaz
Copy link

Good information, thanks.

@ryazo
Copy link

ryazo commented Jan 28, 2018

@ldavide
Copy link

ldavide commented Feb 14, 2018

there is an updated version of the latency table?

@rcosnita
Copy link

Nice gist. Thanks @jboner.

@calimeroteknik
Copy link

calimeroteknik commented Apr 9, 2018

https://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/

This prezi presentation is reversed: the larger numbers are inside the smaller ones, instead of the logical opposite.

@achiang
Copy link

achiang commented Apr 17, 2018

Humanized version can be found: https://gist.github.com/hellerbarde/2843375

@jboner
Copy link
Author

jboner commented Apr 22, 2018

Thanks. Updated.

@amirouche
Copy link

Where is the xkcd version?

@amirouche
Copy link

@negrinho
Copy link

@eleztian
Copy link

Thanks

@AnatoliiStepaniuk
Copy link

Is there any resources when one can test himself with a tasks involving these numbers?
E.g. calculate how much time will it take to read 5Mb from DB in another datacenter and get it back?
That would be a great test of applying those numbers in some real use cases.

@bhaavanmerchant
Copy link

I think given increased use of GPUs / TPUs it might be interesting numbers to add here now. Like: 1MB over PCIexpress to GPU memory, Computing 100 prime numbers per core of CPU compared to CPU, reading 1 MB from GPU memory to GPU etc.

@sergekukharev
Copy link

@binbinlau
Copy link

useful information & thanks

@bpmf
Copy link

bpmf commented Feb 22, 2019

Some data of the Berkeley interactive version (https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html ) is estimated, eg: 4 µs in 2019 to read 1 MB sequentially from memory; it seems too fast.

@speculatrix
Copy link

this is a great idea.
how about the time to complete a DNS request - UDP packet request and response with a DNS server having, say, 1ms response time, with the DNS server being 5ms packet time-of-flight away?

@schemacs
Copy link

@joelkraehemann
Copy link

What effect on latency has the use multiple native threads on doing operations possible due to proper mutex locking? Assumed you have:

  • an operation 1024 ns operation in 1st level cache
  • 2 x lock unlock mutex (50 ns)
  • move it from/to main memory (200 ns)

Now, I wonder about malloc latency, can you tell about it? It is definitely missing because I can compute data without any lock as owning the data.

@haai
Copy link

haai commented Sep 4, 2019

interesting when you see in a glance. but would't it be good to use one unit in the comparison e.g. memory page 4k?

@acuariano
Copy link

Nanoseconds

It's an excellent explanation. I had to search the video because the account was closed. Here's the result I got: https://www.youtube.com/watch?v=9eyFDBPk4Yw

@KevinZhou92
Copy link

Send 1K bytes over 1 Gbps network 10,000 ns 10 us
This doesn't look right to me. 1 Gbps = 125, 000 KB/s, the time should be 1 / 125,000 = 8 * 10^-6 seconds which is 8000ns

@andaru
Copy link

andaru commented Apr 4, 2020

Send 1K bytes over 1 Gbps network 10,000 ns 10 us
This doesn't look right to me. 1 Gbps = 125, 000 KB/s, the time should be 1 / 125,000 = 8 * 10^-6 seconds which is 8000ns

For a direct host-to-host connection with 1000BaseT interfaces, a wire latency of 8µs is correct.

However, if the hosts are connected using SGMII, the Serial Gigabit Media Independent Interface, data is 8b10b encoded, meaning 10 bits are sent for every 8 bits of data, leading to a latency of 10µs.

Jeff may also have been referring to the fact that in a large cluster you'll have a few switches between the hosts, so even where 1000BaseT is in use, the added switching latency (even for switches operating in cut-through mode) for, say, 2 switches can approach 2µs.

In any event, the main thing to take away from these numbers are the orders of magnitude differences between latency for various methods of I/O.

@arunkumaras10
Copy link

Are these numbers still relevant in 2020? Or this needs an update?

@maning711
Copy link

Are these numbers still relevant in 2020? Or this needs an update?

I think hardwares are so expensive that can't update them~

@vladimirvs
Copy link

One thing that is misleading is that different units are used for send over 1Gbps versus read 1 MB from RAM. RAM is at least x20 times faster, but it ranks below send over network which is misleading. They should have used the same 1MB for network and RAM.

@amresht
Copy link

amresht commented Aug 6, 2020

need a solar system type visualization for this, so we can really appreciate the change of scale.

Hi
I liked your request and made an comparison. One unit is Mass of earth not radius.

Operation Time in Nano Seconds Astronomical Unit of Weight
L1 cache reference 0.5 ns 1/2 Earth or Five times Mars
Branch mispredict 5 ns 5 Earths
L2 cache reference 7 ns 7 Earths
Mutex lock/unlock 25 ns Roughly [Uranus +Neptune]
Main memory reference 100 ns Roughly Saturn + 5 Earths
Compress 1K bytes with Zippy 3,000 ns 10 Jupiters
Send 1K bytes over 1 Gbps network 10,000 ns 20 Times All the Planets of the Solar System
Read 4K randomly from SSD* 150,000 ns 1.6 times Red Dwarf Wolf 359
Read 1 MB sequentially from memory 250,000 ns Quarter of the Sun
Round trip within same datacenter 500,000 ns Half of the Mass of Sun
Read 1 MB sequentially from SSD* 1,000,000 ns Sun
Disk seek 10,000,000 ns 10 Suns
Read 1 MB sequentially from disk 20,000,000 ns Red Giant R136a2
Send packet CA->Netherlands->CA 150,000,000 ns An Intermediate Sized Black Hole

https://docs.google.com/spreadsheets/d/13R6JWSUry3-TcCyWPbBhD2PhCeAD4ZSFqDJYS1SxDyc/edit?usp=sharing

@asimilon
Copy link

asimilon commented Oct 4, 2020

need a solar system type visualization for this, so we can really appreciate the change of scale.

Hi
I liked your request and made an comparison. One unit is Mass of earth not radius.

For me the best way of making this "more human relatable" would be to treat nanoseconds as seconds and then convert the large values.

eg. 150,000,000s = ~4.75 years

@sirupsen
Copy link

sirupsen commented Jan 8, 2021

I've been doing some more work inspired by this, surfacing more numbers, and adding throughput:

https://github.com/sirupsen/napkin-math

@sachin-j-joshi
Copy link

Is there a 2021 updated edition?

@ellingtonjp
Copy link

ellingtonjp commented Apr 15, 2021

@sirupsen I love your project and I'm signed up for the newsletter. Currently making Anki flashcards :)

There are some large discrepancies between your numbers and the ones found here (not sure where these numbers came from):
https://colin-scott.github.io/personal_website/research/interactive_latency.html

I'm curious what's causing them. Specifically, 1MB sequential memory read: 100us vs 3us.

@sirupsen
Copy link

@ellingtonjp My program is getting ~100 us, and this one says 250 us (from 2012). Lines up to me with some increases in performance since :) Not sure how you got 3 us

@ellingtonjp
Copy link

ellingtonjp commented Apr 15, 2021

@sirupsen I was referring to the numbers here https://colin-scott.github.io/personal_website/research/interactive_latency.html

The 2020 version of "Read 1,000,000 bytes sequentially from memory" shows 3us. Not sure where that comes from though. Yours seems more realistic to me

@sirupsen
Copy link

sirupsen commented Apr 17, 2021

Ahh, sorry I read your message too quick. Yeah, unclear to me how someone would get 3us. The code I use for this is very simple. It took reading the x86 a few times to ensure that the compiler didn't optimize it out. I do summing, which is one of the lightest workloads you could do in a loop like that. So I think it's quite realistic. Maybe that person's script it was optimized out? 🤷

@ellingtonjp
Copy link

To everyone interested in numbers like this:

@sirupsen 's project is really good. He gave an excellent talk on the "napkin math" skill and has a newsletter with monthly challenges for practicing putting these numbers to use.

Newsletter: https://sirupsen.com/napkin/
Github: https://github.com/sirupsen/napkin-math
Talk: https://www.youtube.com/watch?v=IxkSlnrRFqc

@awsles
Copy link

awsles commented Jun 9, 2021

:)
Light to reach the moon 2,510,000,000 ns 2,510,000 us 2,510 ms 2.51 s

@invisiblethings
Copy link

invisiblethings commented Nov 24, 2021

Heh, imagine this transposed into human distances.

1ns = 1 step, or 2 feet.

L1 cache reference = reaching 1 foot across your desk to pick something up
Datacentre roundtrip = 94 mile hike.
Internet roundtrip (California to Netherlands) = Walk around the entire earth. Wait! You're not done. Then walk from London, to Havana. Oh, and then to Jacksonville, Florida. Then you're done.

@apimaker001
Copy link

useful information & thanks

@eduard93
Copy link

eduard93 commented Jan 3, 2022

What about register access timings?

@crazydogen
Copy link

crazydogen commented Apr 6, 2022

Markdown version :p

Operation ns µs ms note
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 µs
Send 1K bytes over 1 Gbps network 10,000 ns 10 µs
Read 4K randomly from SSD* 150,000 ns 150 µs ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 µs
Round trip within same datacenter 500,000 ns 500 µs
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 µs 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 µs 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 µs 20 ms 80x memory, 20X SSD
Send packet CA -> Netherlands -> CA 150,000,000 ns 150,000 µs 150 ms

@LuisOsta
Copy link

@jboner What do you think about adding cryptography numbers to the list? I feel like that would be a really valuable addition to the list for comparison. Especially as cryptography usage increases and becomes more common.

We could for instance add Ed25519 latency for cryptographic signing and verification. In a very rudimentary testing I did locally I got:

  1. Ed25519 Signing - 254.20µs
  2. Ed25519 Verification - 368.20µs

You can replicate the results with the following rust program:

fn main() {
    println!("Hello, world!");
    let msg = b"lfasjhfoihjsofh438948hhfklshfosiuf894y98s";
    let sk = ed25519_zebra::SigningKey::new(rand::thread_rng());

    let now = std::time::Instant::now();
    let sig = sk.sign(msg);
    println!("{:?}", sig);
    let elapsed = now.elapsed();
    println!("Elapsed: {:.2?}", elapsed);

    let vk = ed25519_zebra::VerificationKey::from(&sk);
    let now = std::time::Instant::now();
    vk.verify(&sig, msg).unwrap();
    let elapsed = now.elapsed();
    println!("Elapsed: {:.2?}", elapsed);
}

@bob333
Copy link

bob333 commented Sep 15, 2022

What is "Zippy"? Is it a google internal compression software?

@Yrwein
Copy link

Yrwein commented Oct 4, 2022

@milesrichardson
Copy link

Send 1K bytes over 1 Gbps network 10,000 ns 10 us

this seems misleading, since in common networking terminology 1 Gbps refers to throughput ("size of the pipe"), but this list is about "latency," which is generally independent of throughput - it takes the same amount of time to send 1K bytes over a 1 Mbps network and a 1 Gbps network

A better description of this measure sounds like "bit rate," or more specifically the "data signaling rate" (DSR) over some communications medium (like fiber). This also avoids the ambiguity of "over" the network (how much distance?) because DSR measures "aggregate rate at which data passes a point" instead of a segment.

Using this definition (which I just learned a minute ago), perhaps a better label would be:

- Send 1K bytes over 1 Gbps network       10,000   ns       10 us
+ Transfer 1K bytes over a point on a 1 Gbps fiber channel       10,000   ns       10 us

🤷 (also, I didn't check if the math is consistent with this labeling, but I did pull "fiber channel" from the table on the DSR wiki page)

@nking
Copy link

nking commented Jun 8, 2023

Thanks for sharing your updates.

You could consider adding a context switch for threads right under disk seek:
computer context switches: 1e7 ns

@VTrngNghia
Copy link

I see "Read 1 MB sequentially from disk", but how about disk write?

@SergeSEA
Copy link

SergeSEA commented Dec 20, 2023

the numbers are from Dr. Dean from Google reveals the length of typical computer operations in 2010. I hope someone could update them as it's 2023

@VTrngNghia
Copy link

The numbers should be still quite similar.

These numbers based on Physical limitation only significant technological leap can make a difference.

In any case, these are for estimates, not exact calculation. For example, 1MB read from SSD is different for each SSD, but it should be somewhere around the Millisecond range.

@xealits
Copy link

xealits commented Jan 31, 2024

it could be useful to add a column with the sizes in the hierarchy. Also, a column of the minimal memory units sizes, the cache line sizes etc. Then you can also divide the sizes by the latencies, which would be some kind of limit for a simple algorithm throughput. Not really sure if this is useful though.

@robertknight
Copy link

As an updated point of reference for the first few numbers, Apple give a table in their Apple Silicon CPU Optimization guide. You can see they are extremely similar to the original figures:

Apple Silicon CPU latency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment