Latency Comparison Numbers (~2012) | |
---------------------------------- | |
L1 cache reference 0.5 ns | |
Branch mispredict 5 ns | |
L2 cache reference 7 ns 14x L1 cache | |
Mutex lock/unlock 25 ns | |
Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
Compress 1K bytes with Zippy 3,000 ns 3 us | |
Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD | |
Read 1 MB sequentially from memory 250,000 ns 250 us | |
Round trip within same datacenter 500,000 ns 500 us | |
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory | |
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip | |
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD | |
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms | |
Notes | |
----- | |
1 ns = 10^-9 seconds | |
1 us = 10^-6 seconds = 1,000 ns | |
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns | |
Credit | |
------ | |
By Jeff Dean: http://research.google.com/people/jeff/ | |
Originally by Peter Norvig: http://norvig.com/21-days.html#answers | |
Contributions | |
------------- | |
'Humanized' comparison: https://gist.github.com/hellerbarde/2843375 | |
Visual comparison chart: http://i.imgur.com/k0t1e.png |
This comment has been minimized.
This comment has been minimized.
I agree, would be fun to see. :-) |
This comment has been minimized.
This comment has been minimized.
useful information & thanks |
This comment has been minimized.
This comment has been minimized.
Looks nice kudos ! For SSD's would be something like: |
This comment has been minimized.
This comment has been minimized.
Latency numbers between large cities: https://wondernetwork.com/pings/ |
This comment has been minimized.
This comment has been minimized.
@preinheimer Asia & Australasia have it bad. |
This comment has been minimized.
This comment has been minimized.
From the same author: http://videolectures.net/wsdm09_dean_cblirs/ |
This comment has been minimized.
This comment has been minimized.
"Latency numbers every programmer should know" - yet naturally, it has no information about humans! |
This comment has been minimized.
This comment has been minimized.
maybe you want to incorporate some of this: https://gist.github.com/2843375 |
This comment has been minimized.
This comment has been minimized.
Curious to see numbers for SSD read time |
This comment has been minimized.
This comment has been minimized.
I think the reference you want to cite is here: http://norvig.com/21-days.html#answers |
This comment has been minimized.
This comment has been minimized.
This remind me of this Grace Hopper's video about Nanoseconds. Really worthy. |
This comment has been minimized.
This comment has been minimized.
I find comparisons much more useful than raw numbers: https://gist.github.com/2844130 |
This comment has been minimized.
This comment has been minimized.
I'm surprised that mechanical disk reads are only 80x the speed of main memory reads. |
This comment has been minimized.
This comment has been minimized.
my version : https://gist.github.com/2842457 includes SSD number, would love some more |
This comment has been minimized.
This comment has been minimized.
Does L1 and L2 cache latency depends on processor type? and what about L3 cache. |
This comment has been minimized.
This comment has been minimized.
Ofc it does ... those are averages I think. |
This comment has been minimized.
This comment has been minimized.
Would be nice to right-align the numbers so people can more easily compare orders of magnitude. |
This comment has been minimized.
This comment has been minimized.
Good idea. Fixed. |
This comment has been minimized.
This comment has been minimized.
And expanded even a bit more: https://gist.github.com/2845836 (SSD numbers, relative comparisons, more links) |
This comment has been minimized.
This comment has been minimized.
TLB misses would be nice to list too, so people see the value of large pages... Context switches (for various OSes), ... Also, regarding packet sends, that must be latency from send initiation to send completion -- I assume. If you're going to list mutex lock/unlock, how about memory barriers? Thanks! This is quite useful, particularly for flogging at others. |
This comment has been minimized.
This comment has been minimized.
Quick pie chart of data with scales in time (1 sec -> 9.5 years) for fun. |
This comment has been minimized.
This comment has been minimized.
"Read 1 MB sequentially from disk - 20,000,000 ns". Is this with or without disk seek time? |
This comment has been minimized.
This comment has been minimized.
I made a fusion table for this at: Maybe be helpful for graphing, etc. Thanks for putting this together |
This comment has been minimized.
This comment has been minimized.
Cool. Thanks. |
This comment has been minimized.
This comment has been minimized.
Here is a chart version. It's a bit hard to read, but I hope it conveys the perspective. |
This comment has been minimized.
This comment has been minimized.
It would also be very interesting to add memory allocation timings to that : ) |
This comment has been minimized.
This comment has been minimized.
How long does it take before this shows up in XKCD? |
This comment has been minimized.
This comment has been minimized.
You guys are talking about is the powers of ten http://vimeo.com/819138 |
This comment has been minimized.
This comment has been minimized.
If it does show up on xkcd it will be next to a gigantic "How much time it takes for a human to react to any results", hopefully with the intent to show people that any USE of this knowledge should be tempered with an understanding of what it will be used for--possibly showing how getting a bit from the cache is pretty much identical to getting a bit from china when it comes to a single fetch of information to show a human being. |
This comment has been minimized.
This comment has been minimized.
@BillKress yes, this is specifically for Programmers, to make sure they have an understanding about the bottlenecks involved in programming. If you know these numbers, you know that you need to cut down on disk access before cutting down on in-memory shuffling. |
This comment has been minimized.
This comment has been minimized.
@BillKress If we were only concerned with showing information to a single human being at a time we could just as well shut down our development machines and go out into the sun and play. This is about scalability. |
This comment has been minimized.
This comment has been minimized.
this is getting out of hand, how do i unsubscribe from this gist? |
This comment has been minimized.
This comment has been minimized.
Saw this via @smashingmag . While you guys debate the fit for purpose, here is another visualization of your quick reference latency data with Prezi ow.ly/bnB7q |
This comment has been minimized.
This comment has been minimized.
Does anybody know how to stop receiving notifications from a gist's activity? |
This comment has been minimized.
This comment has been minimized.
Here's a tool to visualize these numbers over time: http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html |
This comment has been minimized.
This comment has been minimized.
I just created flash cards for this: https://ankiweb.net/shared/info/3116110484 They can be downloaded using the Anki application: http://ankisrs.net |
This comment has been minimized.
This comment has been minimized.
I'm also missing something like "Send 1MB bytes over 1 Gbps network (within datacenter over TCP)". Or does that vary so much that it would be impossible to specify? |
This comment has been minimized.
This comment has been minimized.
If L1 access is a second, then: L1 cache reference : 0:00:01 |
This comment has been minimized.
This comment has been minimized.
You can add LTO4 tape seek/access time, ~ 55 sec, or 55.000.000.000 ns |
This comment has been minimized.
This comment has been minimized.
I'm missing things like sending 1K via Unix pipe/ socket / tcp to another process. |
This comment has been minimized.
This comment has been minimized.
@metakeule its easily measurable. |
This comment has been minimized.
This comment has been minimized.
Related page from "Systems Performance" with similar second scaling mentioned by @kofemann: https://twitter.com/rzezeski/status/398306728263315456/photo/1 |
This comment has been minimized.
This comment has been minimized.
L1D hit on a modern Intel CPU (Nehalem+) is at least 4 cycles. For a typical server/desktop at 2.5Ghz it is at least 1.6ns. |
This comment has been minimized.
This comment has been minimized.
Please note that Peter Norvig first published this expanded version (at that location - http://norvig.com/21-days.html#answers) ~JUL2010 (see wayback machine). Also, note that it was "Approximate timing for various operations on a typical PC". |
This comment has been minimized.
This comment has been minimized.
One light-nanosecond is roughly a foot, which is considerably less than the distance to my monitor right now. It's kind of surprising to realize just how much a CPU can get done in the time it takes light to traverse the average viewing distance... |
This comment has been minimized.
This comment has been minimized.
@jboner, I would like to cite some numbers in a formal publication. Who is the author? Jeff Dean? Which url should I cite? Thanks. |
This comment has been minimized.
This comment has been minimized.
I'd like to see the number for "Append 1 MB to file on disk". |
This comment has been minimized.
This comment has been minimized.
The "Send 1K bytes over 1 Gbps network" doesn't feel right, if you were comparing the 1MB sequential read of memory, SSD, Disk, the Gbps network for 1MB would be faster than disk (x1024), that doesn't feel right. |
This comment has been minimized.
This comment has been minimized.
A great solar system type visualisation: http://joshworth.com/dev/pixelspace/pixelspace_solarsystem.html |
This comment has been minimized.
This comment has been minimized.
I turned this into a set of flashcards on Quizlet: https://quizlet.com/_1iqyko |
This comment has been minimized.
This comment has been minimized.
Can you update the the Notes section with the following Thanks. |
This comment has been minimized.
This comment has been minimized.
@misgeatgit Updated |
This comment has been minimized.
This comment has been minimized.
Zippy is nowadays called snappy. Might be worth updating. Tx for the gist. |
This comment has been minimized.
This comment has been minimized.
Several of the recent comments are spam. The links lead to sites in India which have absolutely nothing to do with latency. |
This comment has been minimized.
This comment has been minimized.
Are there any numbers about latency between NUMA nodes? |
This comment has been minimized.
This comment has been minimized.
Sequential SSD speed is actually more like 500 MB/s, not 1000 MB/s for SATA drives (http://www.tomshardware.com/reviews/ssd-recommendation-benchmark,3269.html). |
This comment has been minimized.
This comment has been minimized.
You really should cite the folks at Berkeley. Their site is interactive, has been up for 20 years, and it is where you "sourced" your visualization. http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html |
This comment has been minimized.
This comment has been minimized.
Question~ do these numbers not vary from one set of hardware to the next? How can these be accurate for all different types of RAM, CPU, motherboard, hard drive, etc? (I am primarily a front-end JS dev, I know little-to-nothing about this side of programming, where one must consider numbers involving RAM and CPU. Forgive me if I'm missing something obvious.) |
This comment has been minimized.
This comment has been minimized.
The link to the animated presentation is broken, here's the correct one: http://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development |
This comment has been minimized.
This comment has been minimized.
Love this one. |
This comment has been minimized.
This comment has been minimized.
Mentioned |
This comment has been minimized.
This comment has been minimized.
It would be nice to be able to compare this to computation times -- How long to do an add, xor, multiply, or branch operation? |
This comment has been minimized.
This comment has been minimized.
Last year, I came up with this concept for an infographic illustrating these latency numbers with time analogies (if 1 CPU cycle = 1 second). Here was the result: http://imgur.com/8LIwV4C |
This comment has been minimized.
This comment has been minimized.
Most of these number were valid in 2000-2001, right now some of these numbers are wrong by an order of magnitude. ( especially reading from main memory, as DRAM bandwidth doubles every 3 years ) |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
I realize this was published some time ago, but the following URLs are no longer reachable/valid:
However, the second URL should now be: https://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/ Oh, and @mpron - nice! |
This comment has been minimized.
This comment has been minimized.
Thank you @jboner |
This comment has been minimized.
This comment has been minimized.
Note: I created my own "fork" of this. |
This comment has been minimized.
This comment has been minimized.
Thank you @GLMeece |
This comment has been minimized.
This comment has been minimized.
Google it |
This comment has been minimized.
This comment has been minimized.
Median human reaction time (to some stimulus showing up on a screen): 270 ms |
This comment has been minimized.
This comment has been minimized.
Awesome info. Thanks! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Could you please add printf & fprintf to this list |
This comment has been minimized.
This comment has been minimized.
Heh, imagine this transposed into human distances. 1ns = 1 step, or 2 feet. L1 cache reference = reaching 1 foot across your desk to pick something up |
This comment has been minimized.
This comment has been minimized.
The last link is giving a 404 |
This comment has been minimized.
This comment has been minimized.
The numbers "Read 1 MB sequentially from memory" mean a memory bandwidth of 4 GB/s. That is a very old number. Can you update it? The time should be roughly 1/5th - one core can do about 20 GB/s today, all cores of a 4 or 8 core about 40 GB/s together. I remember seeing 18-19 GB/s in memtest86 for single core on my Ryzen 1800X and there are several benchmarks floating around where all cores do about 40 GB/s. It is very hard to find anything on the web about single core memory bandwidth... |
This comment has been minimized.
This comment has been minimized.
Good information, thanks. |
This comment has been minimized.
This comment has been minimized.
http://ram.userbenchmark.com/ Edit: I was wrong. https://developers.redhat.com/blog/2016/03/01/reducing-memory-access-times-with-caches/ |
This comment has been minimized.
This comment has been minimized.
there is an updated version of the latency table? |
This comment has been minimized.
This comment has been minimized.
Nice gist. Thanks @jboner. |
This comment has been minimized.
This comment has been minimized.
Links are dead.. https://gist.github.com/2843375 @jboner lets remove them |
This comment has been minimized.
This comment has been minimized.
This prezi presentation is reversed: the larger numbers are inside the smaller ones, instead of the logical opposite. |
This comment has been minimized.
This comment has been minimized.
Humanized version can be found: https://gist.github.com/hellerbarde/2843375 |
This comment has been minimized.
This comment has been minimized.
Thanks. Updated. |
This comment has been minimized.
This comment has been minimized.
Where is the xkcd version? |
This comment has been minimized.
This comment has been minimized.
This one is nice https://gist.github.com/hellerbarde/2843375#gistcomment-1896153 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Just use logarithms directly: https://gist.github.com/negrinho/8a8b45a8958a8653054aa2b349b4cb05 |
This comment has been minimized.
This comment has been minimized.
Thanks |
This comment has been minimized.
This comment has been minimized.
Is there any resources when one can test himself with a tasks involving these numbers? |
This comment has been minimized.
This comment has been minimized.
I think given increased use of GPUs / TPUs it might be interesting numbers to add here now. Like: 1MB over PCIexpress to GPU memory, Computing 100 prime numbers per core of CPU compared to CPU, reading 1 MB from GPU memory to GPU etc. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
useful information & thanks |
This comment has been minimized.
This comment has been minimized.
Some data of the Berkeley interactive version (https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html ) is estimated, eg: 4 µs in 2019 to read 1 MB sequentially from memory; it seems too fast. |
This comment has been minimized.
This comment has been minimized.
this is a great idea. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
What effect on latency has the use multiple native threads on doing operations possible due to proper mutex locking? Assumed you have:
Now, I wonder about malloc latency, can you tell about it? It is definitely missing because I can compute data without any lock as owning the data. |
This comment has been minimized.
This comment has been minimized.
interesting when you see in a glance. but would't it be good to use one unit in the comparison e.g. memory page 4k? |
This comment has been minimized.
This comment has been minimized.
It's an excellent explanation. I had to search the video because the account was closed. Here's the result I got: https://www.youtube.com/watch?v=9eyFDBPk4Yw |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
For a direct host-to-host connection with 1000BaseT interfaces, a wire latency of 8µs is correct. However, if the hosts are connected using SGMII, the Serial Gigabit Media Independent Interface, data is 8b10b encoded, meaning 10 bits are sent for every 8 bits of data, leading to a latency of 10µs. Jeff may also have been referring to the fact that in a large cluster you'll have a few switches between the hosts, so even where 1000BaseT is in use, the added switching latency (even for switches operating in cut-through mode) for, say, 2 switches can approach 2µs. In any event, the main thing to take away from these numbers are the orders of magnitude differences between latency for various methods of I/O. |
This comment has been minimized.
This comment has been minimized.
Fancy unicode version:
|
This comment has been minimized.
This comment has been minimized.
Are these numbers still relevant in 2020? Or this needs an update? |
This comment has been minimized.
This comment has been minimized.
I think hardwares are so expensive that can't update them~ |
This comment has been minimized.
This comment has been minimized.
One thing that is misleading is that different units are used for send over 1Gbps versus read 1 MB from RAM. RAM is at least x20 times faster, but it ranks below send over network which is misleading. They should have used the same 1MB for network and RAM. |
This comment has been minimized.
This comment has been minimized.
Hi
https://docs.google.com/spreadsheets/d/13R6JWSUry3-TcCyWPbBhD2PhCeAD4ZSFqDJYS1SxDyc/edit?usp=sharing |
This comment has been minimized.
This comment has been minimized.
For me the best way of making this "more human relatable" would be to treat nanoseconds as seconds and then convert the large values. eg. 150,000,000s = ~4.75 years |
This comment has been minimized.
This comment has been minimized.
I've been doing some more work inspired by this, surfacing more numbers, and adding throughput: |
This comment has been minimized.
need a solar system type visualization for this, so we can really appreciate the change of scale.