- AMD enhanced their Performance Monitoring Unit. https://www.phoronix.com/scan.php?page=news_item&px=AMD-Zen-4-IBS-Linux - better logging of L3 cache misses to help Kernel on page caching.
- rdpmc - a small C++ header-only library to count CPU level events. https://github.com/rodgarrison/rdpmc - x86 only. More lightweight than PAPI
- Tachyum launches their Cloud/AI/HPC chip called Prodigy. https://www.hpcwire.com/off-the-wire/tachyum-launches-prodigy-universal-processor/ - how to run QEMU simulator
- Mysterious memset, a blog post about pointer aliasing preventing compiler optimizations. https://vector-of-bool.github.io/2022/05/11/char8-memset.html
- Mark's blog post
- Performance myth 5 - tool experise (perf,vtune,jmeter...) is not highly corrolated with performance engineering skill.
- Use NIC timestamps - bpftrace - use link shims to intercept system calls without need for administrator privlidges,
- Myth 4 on memory usage - memory contoller overload matters, not just total RAM. pcm-memory intel_RTD
- Experiment with pinning cores and keeping IO caches hot to lower latency of L3 misses.
- Myth 3 - Sampling Profilers Work Great for Multithreaded Apps - need tool like coz.
- Myth 2 - CPU Clock Speed Is Paramount - hasn't been informative for years.
- Myth 1 - Big O Complexity == Performance - only a factor for large problem sizes - cache oblivious scaling more important.
- Denis' big 5 - not knowing the application stack - blindly relying on big O - blindly optimizing - creating bad benchmarks
- Algorithms 4th Edition
- x/y/z (t) plots What is the chance of X at timescale T experiencing a fault rate Y for Z time?
- Look at CPU pipeline - much wider/deeper.
- Hacking QEMU
- Abusing SOUPER to slow down code like COZ?
- egraphs for superoptimization and modeling.
- https://github.com/opcm/pcm - https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_rdt/README.md
- Tracing with qemu