- 2011 - A trip through the Graphics Pipeline 2011
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
- 2020 - GPU ARCHITECTURE RESOURCES
Latency Comparison Numbers | |
-------------------------- | |
L1 cache reference/hit 1.5 ns 4 cycles | |
Floating-point add/mult/FMA operation 1.5 ns 4 cycles | |
L2 cache reference/hit 5 ns 12 ~ 17 cycles | |
Branch mispredict 6 ns 15 ~ 20 cycles | |
L3 cache hit (unshared cache line) 16 ns 42 cycles | |
L3 cache hit (shared line in another core) 25 ns 65 cycles | |
Mutex lock/unlock 25 ns | |
L3 cache hit (modified in another core) 29 ns 75 cycles |
(by @andrestaltz)
If you prefer to watch video tutorials with live-coding, then check out this series I recorded with the same contents as in this article: Egghead.io - Introduction to Reactive Programming.
For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.
After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft
FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.
- By Edmond Lau
- Highly Recommended 👍
- http://www.theeffectiveengineer.com/
This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).
Matrix multiplication is a mathematical operation that defines the product of
What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.
In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is classified as undefined behavior(UB). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.
Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we th
Moved to git-repository: https://github.com/denji/awesome-http-benchmark
Located in alphabetical order (not prefer)
- ab – slow and single threaded, written in
C
- apib – most of the features of ApacheBench (
ab
), also designed as a more modern replacement, written inC
- autocannon – fast HTTP/1.1 benchmarking tool written in Node.js
- baloo – Expressive end-to-end HTTP API testing made easy, written in Go (
golang
)