- 2011 - A trip through the Graphics Pipeline 2011
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
- 2020 - GPU ARCHITECTURE RESOURCES
- 2020 - All the pipelines - journey through the GPU
- Emil Persson @Humus
- Matt Pettineo @mynameismjp
- Louis Bavoil @louisbavoil
- D3D11 Vendor Hacks
- 2018 - The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload
- 2018 - Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs (Presented by NVIDIA)
- 2019 - Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method (Presented by NVIDIA)
- 2020 - Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling
- 2021 - Dana Elifaz - The Next Level of Optimization Advice with Nsight Graphics: GPU Trace
- 2022 - (GDC Paywall) Optimizing Ray Tracing GPU Workloads using Nsight Graphics: GPU Trace and Nsight Systems
- Rys Sommefeldt @ryszu
- Michal Drobot @michaldrobot
- Kostas Anagnostou @KostasAAA
- Blog
- 2018 - DD2018: Kostas Anagnostou - Experiments in GPU occlusion culling
- 2020 - GPU ARCHITECTURE RESOURCES
- 2020 - GPU ARCHITECTURE RESOURCES (twitter thread)
- 2020 - WHAT IS SHADER OCCUPANCY AND WHY DO WE CARE ABOUT IT?
- 2020 - TO Z-PREPASS OR NOT TO Z-PREPASS
- 2022 - SHADER TIPS AND TRICKS
- 2023 - LOW-LEVEL THINKING IN HIGH-LEVEL SHADING LANGUAGES 2023
- Matthäus G. Chajdas @NIV_Anteru
- Blog
- 2018 - Introduction to compute shaders
- 2018 - More compute shaders
- 2018 - Even more compute shaders
- Matthijs De Smedt @anji_nl
- 2016 - PC GPU Performance Hot Spots
- Maurizio Cerrato @speedwago
- 2019 - GPU Architectures
- Sebastian Aaltonen @SebAaltonen
- Layla Mah @MissQuickstep
- Sven Andersson @andsve
- Blog
- 2014 - Real-time Rendering Blogs
- Fabian Giesen @rygorous
- Timothy Lottes @NOTimothyLottes
- Robert Menzel @renderpipeline
- Blog
- 2012 - Low-Level GPU Documentation
- RasterGrid @rastergrid
- Blog
- 2021 - Understanding GPU caches
- Adam Sawicki @Reg__
- Matías N. Goldberg @matiasgoldberg
- Francesco Cifariello Ciardi @FCifaCiar
- Blog
- 2018 - INTRO TO GPU SCALARIZATION
- Sébastien Lagarde @SebLagarde
- Bart Wronski @BartWronsk
- Elizabeth Baumel @Icetigris
- Anton Schreiner @antonschrein
- Jendrik Illner @jendrikillner
- Blog
- Graphics Programming Weekly Article Database Not specifically on optimization. Have a search bar.
- Hans-Kristian @Themaister
- Graham Wihlidal @gwihlidal
- AMD
- GPU Open, ROCm™ Blogs
- Events Presentations
- AMD GPU architecture programming documentation (Instruction Set Architecture)
- 2014 - Vertex Shader Tricks
- 2016 - Leveraging asynchronous queues for concurrent execution
- 2016 - AMD GCN Assembly: Cross-Lane Operations
- 2017 - Wave Programming in D3D12 and Vulkan
- 2017 - D3D12 and Vulkan Done Right
- 2017 - Deep Dive: Asynchronous Compute
- 2018 - Optimize your engine using compute @ 4C Prague 2018 | (Youtube)
- 2018 - Optimization with Radeon GPU Profiler - A Vulkan Case Study
- 2019 - DirectX 12 Optimization Techniques in Capcom’s RE ENGINE
- 2019 - A BLEND OF GCN OPTIMIZATION AND COLOR PROCESSING
- 2019 - AMD GPU Performance Revealed
- 2019 - Triangles Are Precious
- 2020 - Let’s build
- AMD Ryzen™ Processor Software Optimization
- Optimizing for the Radeon™ RDNA Architecture
- From Source to ISA: A Trip Down the Shader Compiler Pipeline
- A Review of GPUOpen Effects
- Curing Amnesia and Other GPU Maladies With AMD Developer Tools
- Radeon™ ProRender Full Spectrum Rendering 2.0: The Universal Rendering API
- 2020 - CONCURRENCY MODEL IN EXPLICIT GRAPHICS APIS
- 2021 - Understanding Graphs in Radeon GPU Profiler and GPUView
- 2022 - Let's talk about (GPU) crashes
- 2022 - Compute Shaders @ GIC
- 2023 - Occupancy explained
- 2024 - Mesh shaders: optimization and best practices
- 2024 - Occupancy explained through Insert picture the AMD RDNA architecture
- GCN
- 2013 - GCN3 Instruction Set Architecture
- 2019 - AMD GCN ISA: a first dive
- 2020 - Understanding AMD GPU ISA Video
- RDNA
- OpenCL
- Radeon GPU Analyzer / Radeon Raytracing Analyzer
- 2017 - Live VGPR Analysis with Radeon™ GPU Analyzer
- 2019 - USING RADEON™ GPU ANALYZER WITH DIRECTX®12 GRAPHICS
- 2019 - USING RADEON™ GPU ANALYZER WITH DIRECT3D®12 COMPUTE
- 2022 - Visualizing VGPR Pressure with Radeon™ GPU Analyzer 2.6
- 2022 - Improving raytracing performance with the Radeon™ Raytracing Analyzer (RRA)
- Driver Stack
- GPU Open, ROCm™ Blogs
- Nvidia
- Developer Blog and Talks
- Advanced API Performance on various topics
- 2012 - GPU Performance Analysis and Optimization
- 2015 - Constant Buffers without Constant Pain
- 2016 - Practical DirectX 12
- 2016 - Reading Between The Threads: Shader Intrinsics
- 2016 - DX12 Do's And Don'ts
- 2016 - High-Performance, Low-Overhead Rendering with OpenGL and Vulkan
- 2019 - Tips and Tricks: Ray Tracing Best Practices
- 2020 - Optimizing Graphics Applications using Nsight Systems and Nsight Graphics
- 2020 - RTX Ray Tracing Best Practices
- 2021 - Advanced API Performance
- 2022 - Best Practices for Using NVIDIA RTX Ray Tracing (Updated)
- 2023 - Practical Tips for Optimizing Ray Tracing
- 2023 - Avoiding Stalls and Hitches in DirectX 12
- 2023 - How to Improve Shader Performance by Resolving LDC Divergence
- 2024 - Shader Debugging Made Easy with NVIDIA Nsight Graphics
- Pascal
- Turing
- Ampere
- Ada
- CUDA
- 2014 - CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics
- 2017 - CUDA kernel-level experiments in NVIDIA Nsight on Issue Efficiency, Memory Statistics, Pipe Utilization, etc.
- Driver Stack
- Misc
- Developer Blog and Talks
- Apple
- Intel
- Arm
- Microsoft
- Khronos Group
- GDC
- Advanced Graphics Summit Not specifically on optimization
- Digital Dragon
- Video Not specifically on optimization
- (JP) CEDEC
- 2016 - GPU最適化入門
- (Book) マンガとイラストでわかる! GPU最適化入門
- 2016 - GPU最適化入門
- SIGGRAPH
- Advances in Real-Time Rendering in Games Not specifically on optimization
- 2009 - From Shader Code to a Teraflop: How Shader Cores Work
- 2020 - LOW-LEVEL OPTIMIZATIONS IN THE LAST OF US PART II
- CMU
- 2018 - Aftermath: Advances in GPU Crash Debugging
- 2020 - (JP) Device Removal の処方箋, 補足資料
- 2023 - GPU Crash Debugging in Unreal Engine: Tools, Techniques, and Best Practices | Unreal Fest 2023
- GPU Specs Database by techpowerup
- GPU database by Matthäus G. Chajdas
- GPUInfo by Sascha Willems For Vulkan, OpenGL, OpenGL ES
- D3d12infoDB by Dmytro Bulatov Database based on D3d12info in Tools section below
- (JP) GPU Spec Database by HYPERでんち
- Online Shader Compiler
- Compiler Explorer (godbolt) Support DXC, AMD RGA
- Shader Playground Support DXC, FXC, glslang, hlsl2glsl, hlslparser, IntelShaderAnalyzer, AMD RGA, slang, XShaderCompiler
- Microsoft
- Nvidia
- AMD
- Radeon Developer Tool Suite
- Radeon GPU Profiler (RGP) Low-level optimization tool
- Radeon Memory Visualizer (RMV)
- Radeon Developer Panel (RDP)
- Driver Experiments Low-level control of the AMD Adrenalin driver
- Radeon GPU Analyzer (RGA) Offline compiler and performance analysis tool
- Radeon Raytracing Analyzer (RRA)
- Radeon GPU Detective (RGD) Post-mortem analysis of GPU crashes
- 2024 - Game Optimization with The Radeon Developer Tool Suite
- GPU Reshape On-the-fly instrumentation of GPU operations with instruction level validation of potentially undefined behavior
- 2024 - Introducing GPU Reshape - Video
- Radeon Developer Tool Suite
- Intel
- Other related tools
- RenderDoc Graphics debugger that allows quick and easy single-frame capture and detailed introspection
- APITrace Trace OpenGL, Direct3D, and DirectDraw APIs calls to a file and replay
- PerfTest A simple GPU shader memory operation performance test tool. Results on a wide range of GPUs are already available
- D3d12info by Adam Sawicki Get GPU information through DXGI and Direct3D 12 (D3D12) + AMD AGS, NVAPI, WinAPI, and some other sources
Thanks JoseEmilio-ARM for ARM part.
Thanks! Updated.