Hardware Speed-of-Light Analysis
The following numbers are based on NVIDIA's Volta microarchitecture. To perform a similar analysis for a newer architecture, I recommend changing the numbers below based on device_query
CUDA sample or wikipedia page.
CUDA Cores = SM * Cores per SM (SM = 80, Cores/SM = 64)
Maximum Clock Rate = Clock Rate (KHz) * 1e-6 (GHz)
Maximum Throughput (type == floats, doubles or half) =
CUDA Cores * Maximum Clock Rate * Type Ratio (device properties) (GFLOP/s)
Maximum Memory Bandwidth =