You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The following numbers are based on NVIDIA's Volta microarchitecture. To perform a similar analysis for a newer architecture, I recommend changing the numbers below based on device_query CUDA sample or wikipedia page.
CUDA Cores = SM * Cores per SM (SM = 80, Cores/SM = 64)
Maximum Clock Rate = Clock Rate (KHz) * 1e-6 (GHz)
Maximum Throughput (type == floats, doubles or half) =
CUDA Cores * Maximum Clock Rate * Type Ratio (device properties) (GFLOP/s)
Maximum Memory Bandwidth =
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
How crazy is it to imagine a keyword (NVCC-supported), something like __ignore__, where if you use that in front of an expression (function, variable, object, etc.), it is ignored on the device side (in __device__ and __global__). This solves the issue where complicated containers that support host and device code, and their constructors/destructors that run on host code are all just ignored on device when they are passed as a member of larger class or struct. For example;
__global__ voidkernel(foo_t foo) {
auto idx = threadIdx.x;
auto ptr = foo.get_ptr();
ptr[idx] = idx;
}
We have a top-level object that the user wants to interact with, such as a pixel on the screen. But given the contents within that pixel, it may choose to color/shade it differently. If that pixel is representing a cloth, it may have a texture and color of a cloth, if it is representing metal, it may be shiny and metal-like... you get the point.
To represent this object in c++, we have number of options. The most obvious one is to have a function that colors (or applies some sort of texture) to the pixel, and has the different specializations for the materials/colors within that function.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.