-
-
Save dhermes/f17fc85999f79ae2f304 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://stackoverflow.com/questions/427477/fastest-way-to-clamp-a-real-fixed-floating-point-value | |
https://devtalk.nvidia.com/default/topic/514408/min-max-and-sign-functions-in-cuda-do-they-exist-if-so-where-/ | |
https://en.wikipedia.org/wiki/Algorithm_%28C%2B%2B%29 | |
https://en.wikipedia.org/wiki/C_mathematical_functions#Overview_of_functions | |
http://en.cppreference.com/w/c/numeric/math/fmax | |
$ find /usr/ | grep 'algorithm\.h$' | |
/usr/include/CGAL/algorithm.h | |
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#mathematical-functions-appendix | |
Amazing! using fmin and fmax cut my computation time from 4.1-4.2 ms | |
to 3.1-3.2, and their use isn't the major part of the computations! | |
http://stackoverflow.com/questions/16584558/the-difference-between-max-and-fmax-cross-platform-compiling | |
The actual difference is, that fmin and fmax are mathematical functions | |
working on floating point numbers and originating from C99 (and might be | |
implemented intrisically by actual specialized CPU instructions where possible), | |
while min and max are general algorithms usable on any type | |
supporting < (and are probably just a simple (b<a) ? b : a instead of a | |
floating point instruction, though an implementation could even do that | |
with a specialization of min and max, but I doubt this). | |
http://gpuray.blogspot.com/2009/07/cuda-warps-and-branching.html | |
http://www.informit.com/articles/article.aspx?p=2103809&seqNum=4 | |
Some conditional operations are so common that they are supported natively | |
by the hardware. Minimum and maximum operations are supported for both | |
integer and floating-point operands and are translated to a single | |
instruction. Additionally, floating-point instructions include modifiers | |
that can negate or take the absolute value of a source operand. | |
The compiler does a good job of detecting when min/max operations | |
are being expressed, but if you want to take no chances, call the | |
min()/max() intrinsics for integers or fmin()/fmax() | |
for floating-point values. | |
====================================================== | |
https://devtalk.nvidia.com/default/topic/496548/are-max-a-b-and-min-a-b-divergent-/ | |
The standard CPU implementation seems to be: | |
(b<a) ? a : b; | |
which is clearly divergent, but I'd like to know if CUDA does anything | |
clever to get around it. | |
====================================================== | |
http://stackoverflow.com/a/16659263/1068170 | |
maxsd %xmm0, %xmm1 # d, min | |
movapd %xmm2, %xmm0 # max, max | |
minsd %xmm1, %xmm0 # min, max | |
ret | |
maxsd %xmm0, %xmm1 | |
minsd %xmm1, %xmm2 | |
movaps %xmm2, %xmm0 | |
ret | |
GENERATED ASSEMBLY (sm_1x, sm_2x) | |
====================================================== | |
https://gist.github.com/dhermes/c79846c6074b938b2e10 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment